Identified - We've confirmed several reports of slow performance of /staging, /project directories. Users may encounter slow file transfers to/from /staging, /projects, and commands that query files in those directories may be slow or hang up entirely.

This issue is related to heavy disk usage in these spaces as a side effect of the ongoing data recovery process. Unfortunately there is not a good workaround at this time, but users are encouraged to move or remove any recovered data (located in /recovery).

We ask that users be patient while we work to resolve this issue.

Jan 23, 2025 - 16:59 CST
Update - Please plan to move or remove your data from the recovery paths to your new /staging or /projects directory by Monday, February 17.

More details here: https://chtc.cs.wisc.edu/uw-research-computing/data-recovery-fall2024

Jan 21, 2025 - 16:41 CST
Monitoring - The data recovery process for /projects is complete. We believe we have recovered close to 100% of the data that was originally present in these directories. Some of the metadata for files (like file creation date) may be incorrect; we strongly recommend validating any data that you copy from the recovered file system.

See our website for more information on recovering files: https://chtc.cs.wisc.edu/uw-research-computing/data-recovery-fall2024

Jan 13, 2025 - 16:32 CST
Update - The data recovery process for /staging and HTC /software is complete. We believe we have recovered about 50% of the data that was originally present in these directories. Some of the metadata for files (like file creation date) may be incorrect; we strongly recommend validating any data that you copy from the recovered file system.

See our website for more information on recovering files: https://chtc.cs.wisc.edu/uw-research-computing/data-recovery-fall2024

Dec 12, 2024 - 12:13 CST
Update - We are nearly finished recovering data from the /staging directory. We will provide more information in the next day or so as we confirm the recovery process.
We are still working on recovering data from the /projects directory and anticipate it will be several weeks before it is ready for users to access.

Dec 10, 2024 - 15:48 CST
Update - We have created new /staging, /projects and /software data spaces. Please email us if you need your group /staging directories, /projects, or /software directories re-created. If any aspects of your jobs relied on these directories and you are currently having issues running jobs, contact us at chtc@cs.wisc.edu.
Dec 06, 2024 - 11:15 CST
Update - All HTC users should now have access to a new staging directory with a default quota of 100GB / 1000 items. This space can be used exactly like the previous /staging directories to run jobs.

Group directories will need to be created manually -- contact the facilitators at chtc@cs.wisc.edu or fill out our quota request form https://chtc.cs.wisc.edu/uw-research-computing/quota-request to have a group directory created.

Nov 26, 2024 - 16:11 CST
Identified - We have identified the issue that was causing file system problems on Thursday. We are able to prevent it from recurring; however, it resulted in significant data loss in /staging, /projects, HTC /software and /squid before CHTC personnel were able to react.

All data in /squid is unrecoverable. Any remaining data in /projects and /staging is currently inaccessible as we work to recover whatever additional data we can. We hope to recover at least 50% of /staging and 60% of /projects.

This week (Nov 25-27), we will create a new data store to serve the “/staging” and “/projects” directories. Initially, there will be no data inside these directories. This new data backend for the /staging and /projects directories will be used for CHTC data storage moving forward and will be usable in jobs once available. We will post on this status page when these directories are available.

Our full plan to bring systems and data back online is described on this page on the CHTC website: https://chtc.cs.wisc.edu/uw-research-computing/data-recovery-fall2024

This status page incident will continue to be updated as we go through steps of the recovery process.

Nov 25, 2024 - 18:03 CST

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Operational
90 days ago
98.44 % uptime
Today
Access Points Operational
90 days ago
99.71 % uptime
Today
CHTC Pool Operational
90 days ago
99.29 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
94.76 % uptime
Today
File Transfers Operational
90 days ago
98.44 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
99.8 % uptime
Today
Login Nodes Operational
90 days ago
99.99 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
99.94 % uptime
Today
Central Software Installations Operational
90 days ago
99.29 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
100.0 % uptime
Today
Data Transfer Tools Operational
90 days ago
99.99 % uptime
Today
Globus Endpoint Operational
90 days ago
99.99 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
100.0 % uptime
Today
Tiger Cluster ? Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

[HTC] Downtime for voyles2000 Apr 8, 2025 07:00 - Apr 9, 2025 07:00 CDT

A downtime is scheduled for voyles2000 to address hardware issues.
Posted on Mar 25, 2025 - 11:44 CDT
Mar 28, 2025

No incidents reported today.

Mar 27, 2025

No incidents reported.

Mar 26, 2025

No incidents reported.

Mar 25, 2025
Completed - The scheduled maintenance has been completed.
Mar 25, 13:27 CDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Mar 25, 12:00 CDT
Scheduled - Downtime is scheduled for voyles2000.chtc.wisc.edu to update its BIOS.
Mar 25, 11:37 CDT
Mar 24, 2025

No incidents reported.

Mar 23, 2025

No incidents reported.

Mar 22, 2025

No incidents reported.

Mar 21, 2025

No incidents reported.

Mar 20, 2025

No incidents reported.

Mar 19, 2025

No incidents reported.

Mar 18, 2025

No incidents reported.

Mar 17, 2025

No incidents reported.

Mar 16, 2025

No incidents reported.

Mar 15, 2025

No incidents reported.

Mar 14, 2025
Resolved - This incident has been resolved.
Mar 14, 15:49 CDT
Monitoring - A fix has been implemented and we are monitoring the results.
Mar 13, 12:54 CDT
Identified - We have received error reports from users that run jobs with licensed software that rely on license servers hosted at CHTC. These errors include messages such as "Failed to connect to license server", or other connection-related errors. We have identified the underlying issue and are working on a fix.
Mar 13, 10:51 CDT