We will be powering down mrudolphgpu4001 to perform maintenance. Jobs will not match to this machine during this time. Posted on
Mar 20, 2026 - 15:30 CDT
Resolved -
Looks back to normal.
Mar 20, 13:09 CDT
Monitoring -
We believe we've fixed the problem. We'll keep an eye on things to make sure it sticks.
Mar 20, 09:04 CDT
Identified -
A system-wide update to our GPU machines has resulted in a mixup in the configuration of GPU machines prioritized for researchers. We are working to address the problem. In the meantime, you may see reduced or unusually matchmaking behavior for GPU jobs that target prioritized machines, including backfill jobs.
Mar 19, 12:34 CDT
Resolved -
Most modern worker nodes should be back up - cluster is at normal operation.
Mar 16, 13:33 CDT
Identified -
The HPC cluster login node (spark-login) is back up. We are bringing the worker nodes of the cluster up now.
Mar 16, 12:00 CDT
Investigating -
The HPC cluster (accessed via spark-login.chtc.wisc.edu) went down over the weekend due to a power outage. We will update this incident at the cluster comes back online.
Mar 16, 07:57 CDT
Resolved -
This incident has been resolved.
Mar 13, 09:09 CDT
Monitoring -
A fix has been implemented and we are monitoring the results.
Mar 12, 16:53 CDT
Investigating -
Users may be experiencing login issues to spark-login, including hanging after entering the ssh command or a repeating message upon successful login ( kernel: watchdog: BUG: soft lockup ). We are currently investigating.
Mar 12, 16:16 CDT