Monitoring - A fix has been implemented and we are monitoring the results.
Oct 27, 2025 - 17:41 CDT
Investigating - Users of wright-ap.chtc.wisc.edu are unable to log into wright-ap.chtc.wisc.edu. We are investigating the issue.
Oct 27, 2025 - 15:49 CDT
Identified - Jobs that use multiple GPUs and Pytorch may run into an error where GPUs are not detected. This is occurring on multiple GPU machines after applying driver updates.
We have identified the issue and are actively working to roll out fixes to our GPU machines between 10/27-10/31.
If you encounter this issue, here are some options: * Wait until next week to submit multi-GPU jobs using Pytorch * Request alternative resources, such as requesting a single GPU for your jobs, using CPU-only workflows, or non-Pytorch workflows.
We understand this incident is disruptive to researchers' workflows - please reach out to us at chtc@cs.wisc.edu with any concerns.
Oct 24, 2025 - 10:58 CDT
Maintenance of the datacenter requires that the HPC system is powered off. We may take the opportunity to install some system updates after it is powered back on.
No jobs will run or be accepted during this time. Queued jobs should continue once the maintenance downtime has completed. Jobs submitted with a runtime that intersects with the maintenance window will not start, with the reason "ReqNodeNotAvail, Reserved for maintenance". Posted on
Oct 10, 2025 - 13:42 CDT
Past Incidents
Oct 27, 2025
Unresolved incident: [HTC] Unable to access wright-ap.
Resolved -
This incident has been resolved.
Oct 24, 10:59 CDT
Update -
We are continuing to monitor for any further issues.
Oct 16, 16:22 CDT
Monitoring -
Users should be able to login again. We have not yet identified the cause, however, so the issue may reoccur.
We will continue to investigate and monitor the situation.
Oct 16, 16:21 CDT
Investigating -
Confirmed reports that users are not able to login to spark-login.chtc.wisc.edu at this time. We are investigating and will provide updates as they become available.
Oct 16, 16:00 CDT