All Systems Operational

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Operational
90 days ago
99.91 % uptime
Today
Access Points Operational
90 days ago
99.89 % uptime
Today
CHTC Pool Operational
90 days ago
100.0 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
99.99 % uptime
Today
File Transfers Operational
90 days ago
99.67 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
99.99 % uptime
Today
Login Nodes Operational
90 days ago
99.98 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
100.0 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
100.0 % uptime
Today
Data Transfer Tools Operational
90 days ago
100.0 % uptime
Today
Globus Endpoint Operational
90 days ago
100.0 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
100.0 % uptime
Today
Tiger Cluster Operational
90 days ago
100.0 % uptime
Today
RT Email/Ticket Support System Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Dec 9, 2025

No incidents reported today.

Dec 8, 2025

No incidents reported.

Dec 7, 2025

No incidents reported.

Dec 6, 2025

No incidents reported.

Dec 5, 2025
Resolved - Updates have completed across the pool, so Docker jobs should be operating normally again.
Dec 5, 09:03 CST
Update - We've pushed a fix for the Docker issue. It will take the system a couple of hours for the change to percolate, but behavior should be back to normal later this evening.
Dec 4, 14:56 CST
Identified - A problem pulling Docker images requires that we update Docker on our machines.
Said updates will require restarting Docker and will thus interrupt running Docker jobs.

Once the updates are complete, however, users should no longer encounter the "Error ... Cannot pull image ..." error in their Docker jobs.

Dec 4, 10:25 CST
Dec 4, 2025
Resolved - This incident has been resolved.
Dec 4, 17:29 CST
Investigating - Users are unable to log into or access learn.chtc.wisc.edu. Users may be prompted for their password three times before getting a "Permission denied" error. We are investigating.
Dec 4, 15:18 CST
Resolved - The underlying issue with the OSDF should now be resolved.
Dec 4, 16:53 CST
Monitoring - OSDF transfers should be working again, but the underlying issue has not yet been resolved and so the symptoms may reappear.
Dec 4, 14:57 CST
Investigating - An issue with the OSDF may cause file transfers to fail with the error "Contact.Director Error: Error code 3001: 404"
Dec 4, 11:49 CST
Dec 3, 2025
Resolved - This incident has been resolved.
Dec 3, 16:14 CST
Monitoring - A fix has been implemented and we are monitoring the results.
Dec 3, 13:09 CST
Investigating - Users of learn.chtc.wisc.edu are unable to access the /staging filesystem and may receive the message, "Transport endpoint is not connected". We are currently investigating.
Dec 3, 13:08 CST
Resolved - This incident has been resolved.
Dec 3, 13:01 CST
Investigating - HPC users may receive a message when using SLURM commands, saying, "error: NodeNames=spark-a[237-262] CPUs=128 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs."
This message does not affect jobs. We are investigating.

Nov 26, 16:49 CST
Dec 2, 2025

No incidents reported.

Dec 1, 2025

No incidents reported.

Nov 30, 2025

No incidents reported.

Nov 29, 2025

No incidents reported.

Nov 28, 2025

No incidents reported.

Nov 27, 2025

No incidents reported.

Nov 26, 2025
Nov 25, 2025
Resolved - This incident has been resolved.
Nov 25, 15:16 CST
Monitoring - We've implemented a fix and are monitoring the issue.
Nov 25, 14:37 CST
Investigating - Users are unable to log into or access learn.chtc.wisc.edu. When attempting to log in, users are prompted for their password three times before getting a "Permission denied" message. We are investigating.
Nov 25, 14:27 CST