All Systems Operational

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Operational
90 days ago
99.92 % uptime
Today
Access Points Operational
90 days ago
99.92 % uptime
Today
CHTC Pool Operational
90 days ago
100.0 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
100.0 % uptime
Today
File Transfers Operational
90 days ago
99.71 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
99.99 % uptime
Today
Login Nodes Operational
90 days ago
99.98 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
100.0 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
100.0 % uptime
Today
Data Transfer Tools Operational
90 days ago
100.0 % uptime
Today
Globus Endpoint Operational
90 days ago
100.0 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
100.0 % uptime
Today
Tiger Cluster ? Operational
90 days ago
100.0 % uptime
Today
RT Email/Ticket Support System Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Nov 10, 2025

No incidents reported today.

Nov 9, 2025

No incidents reported.

Nov 8, 2025

No incidents reported.

Nov 7, 2025

No incidents reported.

Nov 6, 2025
Resolved - We believe this issue has been resolved.
Nov 6, 15:03 CST
Monitoring - The caching service is back online and users should no longer see this error. If you do, please let us know at chtc@cs.wisc.edu .
Nov 6, 10:53 CST
Identified - Confirmed user reports of an error along the lines of "GET https://dockercache-cs-2360.chtc.wisc.edu/v2/: unexpected status code 503 Service Unavailable".
We've identified that our local Docker caching service is offline and are working to bring it back.

Nov 6, 10:33 CST
Nov 5, 2025

No incidents reported.

Nov 4, 2025

No incidents reported.

Nov 3, 2025
Resolved - This incident has been resolved.
Nov 3, 16:16 CST
Monitoring - A fix has been implemented and we are monitoring the results.
Nov 3, 14:59 CST
Investigating - Users of learn.chtc.wisc.edu are unable to log into or access the Access Point. The incident started on Friday, Oct 31. We are currently investigating the issue.
Nov 3, 09:55 CST
Nov 2, 2025

No incidents reported.

Nov 1, 2025

No incidents reported.

Oct 31, 2025
Resolved - A fix has been deployed to the system to address the problem.
If you encounter this or a similar issue again, please let us know at chtc@cs.wisc.edu .

Oct 31, 14:50 CDT
Identified - Jobs that use multiple GPUs and Pytorch may run into an error where GPUs are not detected. This is occurring on multiple GPU machines after applying driver updates.

We have identified the issue and are actively working to roll out fixes to our GPU machines between 10/27-10/31.

If you encounter this issue, here are some options:
* Wait until next week to submit multi-GPU jobs using Pytorch
* Request alternative resources, such as requesting a single GPU for your jobs, using CPU-only workflows, or non-Pytorch workflows.

We understand this incident is disruptive to researchers' workflows - please reach out to us at chtc@cs.wisc.edu with any concerns.

Oct 24, 10:58 CDT
Resolved - This incident has been resolved.
Oct 31, 11:48 CDT
Monitoring - A fix has been implemented and we are monitoring the results.
Oct 27, 17:41 CDT
Investigating - Users of wright-ap.chtc.wisc.edu are unable to log into wright-ap.chtc.wisc.edu. We are investigating the issue.
Oct 27, 15:49 CDT
Oct 30, 2025
Completed - The scheduled maintenance has been completed.
Oct 30, 15:13 CDT
Verifying - Maintenance was completed, and most of the cluster came up last night, including the ability to log into the login node, access files, and submit jobs. We will be addressing any cluster nodes that are still down later today.
Oct 30, 07:45 CDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 29, 15:12 CDT
Scheduled - Maintenance of the datacenter requires that the HPC system is powered off.
We may take the opportunity to install some system updates after it is powered back on.

No jobs will run or be accepted during this time. Queued jobs should continue once the maintenance downtime has completed. Jobs submitted with a runtime that intersects with the maintenance window will not start, with the reason "ReqNodeNotAvail, Reserved for maintenance".

Oct 29, 15:12 CDT
Oct 29, 2025
Completed - The scheduled maintenance has been completed.
Oct 29, 15:00 CDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 29, 07:00 CDT
Scheduled - Maintenance of the datacenter requires that the HPC system is powered off.
We may take the opportunity to install some system updates after it is powered back on.

No jobs will run or be accepted during this time. Queued jobs should continue once the maintenance downtime has completed. Jobs submitted with a runtime that intersects with the maintenance window will not start, with the reason "ReqNodeNotAvail, Reserved for maintenance".

Oct 10, 13:42 CDT
Oct 28, 2025

No incidents reported.

Oct 27, 2025