Update - We've mostly recovered from the cooling outage, but there are still a few lingering issues that we won't be able to address before the weekend.
This should have minimal impact on users, however.

Jun 19, 2026 - 15:19 CDT
Identified - Multiple CHTC services are being impacted by the campus chilled water outage.
We are working to restore services.

Jun 18, 2026 - 12:26 CDT
Monitoring - The HPC system is mostly back online. There are a couple of nodes that we are still working to power on.
Jun 19, 2026 - 13:09 CDT
Identified - Many of our HPC worker nodes are down after a cooling outage in one of our server rooms last night.
We will work to bring these nodes back up as soon as we know that cooling has stabilized.

Jun 18, 2026 - 08:41 CDT
Identified - Some jobs running on gpulab2001 or gpulab2003 may fail with an error "CUDA error: failed call to cuInit: CUDA_ERROR_UNKNOWN". We are working to resolve the issue.
Jun 02, 2026 - 16:53 CDT

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Degraded Performance
90 days ago
97.76 % uptime
Today
Access Points Operational
90 days ago
96.79 % uptime
Today
CHTC Pool Degraded Performance
90 days ago
97.06 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
98.62 % uptime
Today
Staging and Projects Space Operational
90 days ago
99.91 % uptime
Today
File Transfers Operational
90 days ago
96.43 % uptime
Today
High Performance Computing (HPC) System Degraded Performance
90 days ago
99.12 % uptime
Today
Login Nodes Operational
90 days ago
98.26 % uptime
Today
Cluster Nodes and Jobs Degraded Performance
90 days ago
98.24 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
100.0 % uptime
Today
Data Transfer Tools Operational
90 days ago
95.32 % uptime
Today
Globus Endpoint Operational
90 days ago
95.32 % uptime
Today
BadgerCompute Partial Outage
90 days ago
95.41 % uptime
Today
BadgerCompute Partial Outage
90 days ago
95.41 % uptime
Today
CHTC Internal Infrastructure Degraded Performance
90 days ago
99.6 % uptime
Today
Tiger Cluster Degraded Performance
90 days ago
99.62 % uptime
Today
RT Email/Ticket Support System Operational
90 days ago
99.71 % uptime
Today
User App Operational
90 days ago
95.2 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Jun 23, 2026

No incidents reported today.

Jun 22, 2026

No incidents reported.

Jun 21, 2026

No incidents reported.

Jun 20, 2026

No incidents reported.

Jun 19, 2026

Unresolved incidents: Campus cooling outage impacting CHTC services, Cooling outage impacting HPC cluster.

Jun 18, 2026
Jun 17, 2026

No incidents reported.

Jun 16, 2026
Resolved - We restarted the service and the Globus interface appears to be operating again.

However, we don't know yet what is causing the issue, so it may reoccur. Please let us know at chtc@cs.wisc.edu if you encounter the issue again.

Jun 16, 15:00 CDT
Investigating - Confirmed user reports of being unable to connect to /staging or /projects via the Globus interface (as described in our guide here: https://chtc.cs.wisc.edu/uw-research-computing/globus).

We are investigating the issue. In the meantime, you can access files via the transfer server (transfer.chtc.wisc.edu).

Jun 16, 13:49 CDT
Resolved - OSDF transfers should be operational. If you encounter errors, please let us know at chtc@cs.wisc.edu
Jun 16, 13:46 CDT
Investigating - The OSDF system has been having trouble over the weekend.
This is causing OSDF transfers to fail with a message like "error while querying the director at https://osdf-director.osg-htc.org: Transfer.DirectorTimeout Error".

We are investigating the problem.

Jun 8, 09:01 CDT
Jun 15, 2026
Resolved - This incident has been resolved.
Jun 15, 12:17 CDT
Investigating - Confirmed user reports of being unable to launch a BadgerCompute instance. The loading screen hangs on "Your server is starting up" and eventually times out with "Spawn failed".

We are investigating the issue.

Jun 15, 10:47 CDT
Jun 14, 2026

No incidents reported.

Jun 13, 2026

No incidents reported.

Jun 12, 2026

No incidents reported.

Jun 11, 2026

No incidents reported.

Jun 10, 2026

No incidents reported.

Jun 9, 2026

No incidents reported.