Center for High Throughput Computing Status

Center for High Throughput Computing

[HTC] Unable to access wright-ap

Monitoring - A fix has been implemented and we are monitoring the results.
Oct 27, 2025 - 17:41 CDT

Investigating - Users of wright-ap.chtc.wisc.edu are unable to log into wright-ap.chtc.wisc.edu. We are investigating the issue.
Oct 27, 2025 - 15:49 CDT

[HTC] Issues with multi-GPU jobs using Pytorch

Identified - Jobs that use multiple GPUs and Pytorch may run into an error where GPUs are not detected. This is occurring on multiple GPU machines after applying driver updates.

We have identified the issue and are actively working to roll out fixes to our GPU machines between 10/27-10/31.

If you encounter this issue, here are some options:
* Wait until next week to submit multi-GPU jobs using Pytorch
* Request alternative resources, such as requesting a single GPU for your jobs, using CPU-only workflows, or non-Pytorch workflows.

We understand this incident is disruptive to researchers' workflows - please reach out to us at chtc@cs.wisc.edu with any concerns.
Oct 24, 2025 - 10:58 CDT

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

Uptime over the past 90 days. View historical uptime.

High Throughput Computing (HTC) System Degraded Performance

90 days ago

99.94 % uptime

Today

Access Points Operational

90 days ago

100.0 % uptime

Today

CHTC Pool Degraded Performance

90 days ago

100.0 % uptime

Today

External Pools (OSPool, Campus HTCondor Pools) Operational

90 days ago

100.0 % uptime

Today

Staging and Projects Space Operational

90 days ago

100.0 % uptime

Today

File Transfers Operational

90 days ago

99.71 % uptime

Today

High Performance Computing (HPC) System Operational

90 days ago

99.99 % uptime

Today

90 days ago

99.98 % uptime

Today

Cluster Nodes and Jobs Operational

90 days ago

100.0 % uptime

Today

Central Software Installations Operational

90 days ago

100.0 % uptime

Today

Home and Scratch File Systems Operational

90 days ago

100.0 % uptime

Today

Data Transfer Tools Operational

90 days ago

100.0 % uptime

Today

Globus Endpoint Operational

90 days ago

100.0 % uptime

Today

CHTC Internal Infrastructure Operational

90 days ago

100.0 % uptime

Today

Tiger Cluster Operational

90 days ago

100.0 % uptime

Today

RT Email/Ticket Support System Operational

90 days ago

100.0 % uptime

Today

Operational

Degraded Performance

Partial Outage

Major Outage

Maintenance

Scheduled Maintenance

[HPC] Outage, maintenance of the HPC system Oct `29`, `2025` `07:00`-`15:00` CDT

Maintenance of the datacenter requires that the HPC system is powered off.
We may take the opportunity to install some system updates after it is powered back on.

No jobs will run or be accepted during this time. Queued jobs should continue once the maintenance downtime has completed. Jobs submitted with a runtime that intersects with the maintenance window will not start, with the reason "ReqNodeNotAvail, Reserved for maintenance".
Posted on Oct 10, 2025 - 13:42 CDT

Past Incidents

Oct 27, 2025

Unresolved incident: [HTC] Unable to access wright-ap.

Oct 26, 2025

No incidents reported.

Oct 25, 2025

No incidents reported.

Oct 24, 2025

[HPC] Unable to login to spark-login

Resolved - This incident has been resolved.
Oct 24, 10:59 CDT

Update - We are continuing to monitor for any further issues.
Oct 16, 16:22 CDT

Monitoring - Users should be able to login again.
We have not yet identified the cause, however, so the issue may reoccur.

We will continue to investigate and monitor the situation.
Oct 16, 16:21 CDT

Investigating - Confirmed reports that users are not able to login to spark-login.chtc.wisc.edu at this time.
We are investigating and will provide updates as they become available.
Oct 16, 16:00 CDT

Oct 23, 2025

No incidents reported.

Oct 22, 2025

No incidents reported.

Oct 21, 2025

No incidents reported.

Oct 20, 2025

No incidents reported.

Oct 19, 2025

No incidents reported.

Oct 18, 2025

No incidents reported.

Oct 17, 2025

No incidents reported.

Oct 16, 2025

Oct 15, 2025

No incidents reported.

Oct 14, 2025

No incidents reported.

Oct 13, 2025

No incidents reported.