All Systems Operational

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Operational
90 days ago
99.92 % uptime
Today
Access Points Operational
90 days ago
99.86 % uptime
Today
CHTC Pool Operational
90 days ago
100.0 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
100.0 % uptime
Today
File Transfers Operational
90 days ago
99.77 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
99.79 % uptime
Today
Login Nodes Operational
90 days ago
99.57 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
99.79 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
99.81 % uptime
Today
Data Transfer Tools Operational
90 days ago
100.0 % uptime
Today
Globus Endpoint Operational
90 days ago
100.0 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
100.0 % uptime
Today
Tiger Cluster Operational
90 days ago
100.0 % uptime
Today
RT Email/Ticket Support System Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

[HTC] Downtime for mrudolphgpu4001 Mar 26, 2026 06:00 - Apr 2, 2026 06:00 CDT

We will be powering down mrudolphgpu4001 to perform maintenance. Jobs will not match to this machine during this time.
Posted on Mar 20, 2026 - 15:30 CDT
Mar 24, 2026

No incidents reported today.

Mar 23, 2026

No incidents reported.

Mar 22, 2026

No incidents reported.

Mar 21, 2026

No incidents reported.

Mar 20, 2026
Resolved - Looks back to normal.
Mar 20, 13:09 CDT
Monitoring - We believe we've fixed the problem. We'll keep an eye on things to make sure it sticks.
Mar 20, 09:04 CDT
Identified - A system-wide update to our GPU machines has resulted in a mixup in the configuration of GPU machines prioritized for researchers. We are working to address the problem.
In the meantime, you may see reduced or unusually matchmaking behavior for GPU jobs that target prioritized machines, including backfill jobs.

Mar 19, 12:34 CDT
Mar 19, 2026
Mar 18, 2026

No incidents reported.

Mar 17, 2026

No incidents reported.

Mar 16, 2026
Resolved - Most modern worker nodes should be back up - cluster is at normal operation.
Mar 16, 13:33 CDT
Identified - The HPC cluster login node (spark-login) is back up. We are bringing the worker nodes of the cluster up now.
Mar 16, 12:00 CDT
Investigating - The HPC cluster (accessed via spark-login.chtc.wisc.edu) went down over the weekend due to a power outage. We will update this incident at the cluster comes back online.
Mar 16, 07:57 CDT
Mar 15, 2026

No incidents reported.

Mar 14, 2026

No incidents reported.

Mar 13, 2026
Resolved - This incident has been resolved.
Mar 13, 09:09 CDT
Monitoring - A fix has been implemented and we are monitoring the results.
Mar 12, 16:53 CDT
Investigating - Users may be experiencing login issues to spark-login, including hanging after entering the ssh command or a repeating message upon successful login ( kernel: watchdog: BUG: soft lockup ). We are currently investigating.
Mar 12, 16:16 CDT
Mar 12, 2026
Mar 11, 2026

No incidents reported.

Mar 10, 2026
Completed - The scheduled maintenance has been completed.
Mar 10, 13:12 CDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Mar 10, 12:55 CDT
Scheduled - ap2001 and ap2002 will be unavailable for a short period of time (~1 minute) at 1 PM today to perform maintenance.
Mar 10, 12:29 CDT