All Systems Operational

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Operational
90 days ago
99.92 % uptime
Today
Access Points Operational
90 days ago
99.84 % uptime
Today
CHTC Pool Operational
90 days ago
100.0 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
100.0 % uptime
Today
File Transfers Operational
90 days ago
99.75 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
99.78 % uptime
Today
Login Nodes Operational
90 days ago
99.52 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
99.79 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
99.81 % uptime
Today
Data Transfer Tools Operational
90 days ago
100.0 % uptime
Today
Globus Endpoint Operational
90 days ago
100.0 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
99.85 % uptime
Today
Tiger Cluster Operational
90 days ago
100.0 % uptime
Today
RT Email/Ticket Support System Operational
90 days ago
99.71 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

[HTC] Downtime for xhuanggpu4001 Apr 20, 2026 00:00 - Apr 21, 2026 00:00 CDT

We are re-installing a GPU into xhuanggpu4001. The machine will be unavailable at this time.
Posted on Apr 13, 2026 - 08:32 CDT
Apr 16, 2026

No incidents reported today.

Apr 15, 2026
Resolved - A fix has been implemented and confirmed to work. Users with idle GPU jobs should remove their jobs (`condor_rm`) and resubmit the jobs, due to an incorrect expression in the jobs' attributes. Newly submitted jobs should match normally.
Apr 15, 10:48 CDT
Monitoring - A fix has been implemented. Users with idle GPU jobs should remove their jobs (`condor_rm`) and resubmit the jobs, due to an incorrect expression in the jobs' attributes. We are monitoring the situation.
Apr 14, 15:24 CDT
Investigating - We've received reports of GPU jobs failing to match and start up, staying stuck in the IDLE state. We're investigating the cause and will update this statuspage as more information or a solution is implemented.
Apr 13, 16:50 CDT
Resolved - We identified the cause of the issue. When all licenses are checked out, any new jobs requesting licenses will fail with the "Failed to connect to token server" message. We have contact users who are using a majority of the licenses.

All Gurobi users must use `concurrency_limits = GUROBI:1` in their Gurobi jobs' submit files. This ensures that when all licenses are checked out, jobs will remain in idle instead of failing.

Apr 15, 10:47 CDT
Investigating - Some users are reporting that their Gurobi jobs are failing with the message, "Failed to connect to token server". We are currently investigating. We encourage users using Gurobi to double-check that they are using `concurrency_limits = GUROBI:1` in their submit file.
Apr 13, 14:29 CDT
Apr 14, 2026
Apr 13, 2026
Apr 12, 2026

No incidents reported.

Apr 11, 2026

No incidents reported.

Apr 10, 2026
Resolved - We've fixed the issue and are working through the backlog of requests.
Apr 10, 13:30 CDT
Identified - An issue with our accounting system is preventing us from creating new CHTC accounts or modifying existing CHTC accounts.
This does not affect anyone with an existing account nor their ability to login, but will delay us from creating accounts for new users or giving existing users access to new resources.

Apr 9, 16:56 CDT
Apr 9, 2026
Apr 8, 2026
Resolved - The Gurobi license has been renewed. Users may now submit Gurobi jobs.
Apr 8, 13:45 CDT
Identified - The Gurobi license for CHTC has expired. We are working with campus IT to renew the license.
In the meantime, user jobs attempting to use the Gurobi license will likely fail due to a "license expired" error.

Apr 2, 15:13 CDT
Completed - The scheduled maintenance has been completed.
Apr 8, 11:19 CDT
Scheduled - We will be migrating townsend-ap to new hardware. The Access Point will be unavailable during this time.
Apr 7, 15:08 CDT
Apr 7, 2026

No incidents reported.

Apr 6, 2026
Resolved - This incident has been resolved.
Apr 6, 13:19 CDT
Investigating - Jobs using the osdf:// file transfer plugin may go on hold with the message "Details: failed to get namespace information for remote URL ... error while querying the director... Error code 3001: 404: No sources found for the requested path: no origins found for the requested namespace". We are currently investigating.
Apr 6, 12:05 CDT
Apr 5, 2026

No incidents reported.

Apr 4, 2026

No incidents reported.

Apr 3, 2026

No incidents reported.

Apr 2, 2026
Resolved - This incident has been resolved.
Apr 2, 10:37 CDT
Monitoring - We identified the cause of the problem and have applied a fix. Initial tests appear successful. Let us know at chtc@cs.wisc.edu if you continue to encounter problems.
Apr 2, 09:28 CDT
Investigating - Confirmed user reports that job submission on ap2002.chtc.wisc.edu is failing.
We are investigating the issue and will provide updates as they become available.

Apr 2, 09:09 CDT