Update - We've mostly recovered from the cooling outage, but there are still a few lingering issues that we won't be able to address before the weekend. This should have minimal impact on users, however.
Jun 19, 2026 - 15:19 CDT
Identified - Multiple CHTC services are being impacted by the campus chilled water outage. We are working to restore services.
Jun 18, 2026 - 12:26 CDT
Monitoring - The HPC system is mostly back online. There are a couple of nodes that we are still working to power on.
Jun 19, 2026 - 13:09 CDT
Identified - Many of our HPC worker nodes are down after a cooling outage in one of our server rooms last night. We will work to bring these nodes back up as soon as we know that cooling has stabilized.
Jun 18, 2026 - 08:41 CDT
Identified - Some jobs running on gpulab2001 or gpulab2003 may fail with an error "CUDA error: failed call to cuInit: CUDA_ERROR_UNKNOWN". We are working to resolve the issue.
Jun 02, 2026 - 16:53 CDT
Resolved -
We restarted the service and the Globus interface appears to be operating again.
However, we don't know yet what is causing the issue, so it may reoccur. Please let us know at chtc@cs.wisc.edu if you encounter the issue again.
Jun 16, 15:00 CDT
Resolved -
OSDF transfers should be operational. If you encounter errors, please let us know at chtc@cs.wisc.edu
Jun 16, 13:46 CDT
Investigating -
The OSDF system has been having trouble over the weekend. This is causing OSDF transfers to fail with a message like "error while querying the director at https://osdf-director.osg-htc.org: Transfer.DirectorTimeout Error".
We are investigating the problem.
Jun 8, 09:01 CDT
Resolved -
This incident has been resolved.
Jun 15, 12:17 CDT
Investigating -
Confirmed user reports of being unable to launch a BadgerCompute instance. The loading screen hangs on "Your server is starting up" and eventually times out with "Spawn failed".