Identified - The MATLAB license for CHTC has expired and needs to be renewed. We are working with campus IT to do so.
In the meantime, users submitting MATLAB jobs will encounter the error "License checkout failed".

Feb 05, 2026 - 09:16 CST
Update - We are continuing to investigate this issue.
Feb 04, 2026 - 16:38 CST
Investigating - We have received reports of issues with the CUDA_VISIBLE_DEVICES environment variable being set incorrectly on certain GPU jobs. We are investigating the issue and will update this page once more information is known.
Feb 02, 2026 - 13:57 CST
Update - We are continuing to investigate this issue.
Jan 29, 2026 - 10:20 CST
Investigating - Users of the licensed software CST may encounter this error: "modeler_AMD64: line 154: Aborted (core dumped) "${CST_REGSVR32}"". This occurs on most Execution Points, with the exception of build machines.

We are currently investigating.

Jan 29, 2026 - 10:19 CST

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Degraded Performance
90 days ago
99.98 % uptime
Today
Access Points Operational
90 days ago
99.95 % uptime
Today
CHTC Pool Degraded Performance
90 days ago
100.0 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
99.99 % uptime
Today
File Transfers Operational
90 days ago
99.95 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
100.0 % uptime
Today
Login Nodes Operational
90 days ago
100.0 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
100.0 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
100.0 % uptime
Today
Data Transfer Tools Operational
90 days ago
100.0 % uptime
Today
Globus Endpoint Operational
90 days ago
100.0 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
100.0 % uptime
Today
Tiger Cluster Operational
90 days ago
100.0 % uptime
Today
RT Email/Ticket Support System Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Feb 5, 2026
Completed - The scheduled maintenance has been postponed. Jobs should be executing as usual in the HTC system.
Feb 5, 11:07 CST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 5, 09:30 CST
Scheduled - OSDF central services will be taken down for maintenance on February 5, 2026. During this time, all jobs using the OSDF (e.g. using osdf:/// or pelican:// in "transfer_input_files" or jobs using pelican commands) will fail with timeouts to connect to the OSDF director.

Any jobs with file transfer failures due to this maintenance should remain in the queue and can be released or will rerun after the maintenance.

In advance of the maintenance, we plan to prevent jobs from starting if they require data via the OSDF. If your jobs use the OSDF and they are not starting within a few hours of the maintenance, this is expected. They should run once the maintenance is complete.

Feb 4, 16:02 CST
Feb 4, 2026
Resolved - This incident has been resolved.
Feb 4, 16:38 CST
Update - The affected GPU is still offline. We are working with the vendor to address hardware issues.
Jan 22, 09:43 CST
Update - We are continuing to investigate this issue.
Jan 14, 11:57 CST
Investigating - Some jobs landing on mrudolphgpu4001 may fail with the message, "uncorrectable ECC error encountered". We have narrowed the issue to a specific GPU. We plan to bring the affected GPU offline for further investigation and fixes, which may require a brief downtime. The machine will be brought back up with the unaffected GPUs available for use.
Jan 14, 11:56 CST
Resolved - This incident has been resolved.
Feb 4, 16:31 CST
Update - The machine is offline for running diagnostics.
Jan 27, 16:45 CST
Update - The affected GPU is still offline. We are working with the vendor to address hardware issues.
Jan 22, 09:43 CST
Update - The affected GPU has been removed from the available pool for testing. Unaffected GPUs on the machine are available for use. We are still investigating.
Jan 9, 10:35 CST
Investigating - Some jobs landing on xhuanggpu4001 may fail with the message, "uncorrectable ECC error encountered". We have narrowed the issue to a specific GPU. We plan to bring the affected GPU offline for further investigation and fixes, which may require a brief downtime. The machine will be brought back up with the unaffected GPUs available for use.
Jan 7, 13:23 CST
Feb 3, 2026
Resolved - This incident has been resolved.
Feb 3, 12:55 CST
Monitoring - We've implemented a fix and are monitoring the issue.
Feb 3, 09:43 CST
Investigating - Users are unable to log into CHTC Access Points, receiving three "permission denied" errors. HTC Access Points are affected, including ap2001, ap2002, townsend-ap, and oconnor-ap. We are investigating.
Feb 3, 09:23 CST
Feb 2, 2026
Feb 1, 2026

No incidents reported.

Jan 31, 2026

No incidents reported.

Jan 30, 2026

No incidents reported.

Jan 29, 2026

Unresolved incident: [HTC] CST error.

Jan 28, 2026

No incidents reported.

Jan 27, 2026
Resolved - A power issue temporarily took down some of the storage servers and our team was able to bring them back online shortly after. We do not anticipate further issues at this time.
Jan 27, 10:12 CST
Monitoring - We believe we've fixed the underlying cause of the issue, and are monitoring the effect.
Jan 26, 16:29 CST
Investigating - Our service monitoring has alerted us to degraded performance of the data system backing /staging. File transfers and other interactions with /staging may be slow and may result in job failures.
This may also affect /projects, /software spaces, as well as the OSDF (osdf:///) and UWDF (pelican://chtc.wisc.edu/) file transfers.

Jan 26, 16:15 CST
Jan 26, 2026
Jan 25, 2026

No incidents reported.

Jan 24, 2026

No incidents reported.

Jan 23, 2026

No incidents reported.

Jan 22, 2026