All Systems Operational

About This Site

This page provides information about unplanned downtimes and scheduled maintenance for services offered by the Center for High Throughput Computing

High Throughput Computing (HTC) System Operational
90 days ago
99.82 % uptime
Today
Access Points Operational
90 days ago
99.91 % uptime
Today
CHTC Pool Operational
90 days ago
99.21 % uptime
Today
External Pools (OSPool, Campus HTCondor Pools) Operational
90 days ago
100.0 % uptime
Today
Staging and Projects Space Operational
90 days ago
100.0 % uptime
Today
File Transfers Operational
90 days ago
100.0 % uptime
Today
High Performance Computing (HPC) System Operational
90 days ago
99.68 % uptime
Today
Login Nodes Operational
90 days ago
99.4 % uptime
Today
Cluster Nodes and Jobs Operational
90 days ago
99.44 % uptime
Today
Central Software Installations Operational
90 days ago
100.0 % uptime
Today
Home and Scratch File Systems Operational
90 days ago
99.87 % uptime
Today
Data Transfer Tools Operational
90 days ago
100.0 % uptime
Today
Globus Endpoint Operational
90 days ago
100.0 % uptime
Today
CHTC Internal Infrastructure Operational
90 days ago
100.0 % uptime
Today
Tiger Cluster ? Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Past Incidents
Jul 27, 2024

No incidents reported today.

Jul 26, 2024

No incidents reported.

Jul 25, 2024

No incidents reported.

Jul 24, 2024

No incidents reported.

Jul 23, 2024

No incidents reported.

Jul 22, 2024

No incidents reported.

Jul 21, 2024

No incidents reported.

Jul 20, 2024

No incidents reported.

Jul 19, 2024
Resolved - The old HPC cluster (hpclogin3.chtc.wisc.edu and available partitions) is back up.
Jul 19, 08:34 CDT
Investigating - Following the July 16-17 maintenance, due to hardware issues, the old HPC cluster login node (hpclogin3.chtc.wisc.edu) remains down and partitions accessed through through hpclogin3 are unavailable for running jobs.

Updates will be made to this page and major announcements may also be sent to the chtc-users mailing list.

The new HPC cluster login node (spark-login.chtc.wisc.edu) and related partitions are fully operational.

Jul 17, 16:23 CDT
Jul 18, 2024
Resolved - This incident has been resolved.
Jul 18, 14:56 CDT
Identified - /scratch is now available on the HPC Cluster.

For those using the /projects directory from the HPC Cluster, it is currently unavailable but should be back up soon.

Jul 18, 11:28 CDT
Investigating - The /scratch directory is currently unavailable on the HPC cluster.

If you try to access it, you may see the following error:

cd /scratch/user: cannot access '/scratch/user': Permission denied

We are investigating the issue.

Jul 18, 09:42 CDT
Jul 17, 2024
Completed - Maintenance is complete. Some components of our HPC Cluster infrastructure remain down. We will post an ongoing outage notification momentarily.
Jul 17, 16:13 CDT
Verifying - ap2002.chtc.wisc.edu is now back up.
hpclogin3.chtc.wisc.edu is still down.

Jul 17, 14:01 CDT
Update - Scheduled maintenance is still in progress. We will provide updates as necessary.
Jul 16, 16:53 CDT
Update - The maintenance planned for today (July 16) is partially complete.

* HTC users of ap2001.chtc.wisc.edu should be able to log in and submit jobs.
* HPC users can use the new login node (spark-login.chtc.wisc.edu) and submit jobs to the partitions available from that node.
* Access to ap2002.chtc.wisc.edu and hpclogin3.chtc.wisc.edu remains unavailable.

Jul 16, 16:53 CDT
Update - Maintenance is wrapping up for ap2001.chtc.wisc.edu and we are verifying that it is working as expected.

ap2002.chtc.wisc.edu, hpclogin3.chtc.wisc.edu and spark-login.chtc.wisc.edu remain under maintenance.

Jul 16, 14:33 CDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jul 15, 13:00 CDT
Scheduled - Impacts:
* Login to servers ap2001.chtc.wisc.edu, ap2002.chtc.wisc.edu, hpclogin3.chtc.wisc.edu and spark-login.chtc.wisc.edu will be unavailable starting on the afternoon of Monday, July 15.
* Jobs from ap2001 and ap2002, and on all HPC partitions will be interrupted as well. They should remain in the queue and re-run when the servers are back up.

This extended maintenance will address a bug that has likely been causing, among other things, the periodic unexpected shutdowns of hpclogin3, ap2001 and ap2002. We are also using this maintenance period for outstanding network updates.

Jul 11, 08:49 CDT
Jul 16, 2024
Jul 15, 2024
Jul 14, 2024

No incidents reported.

Jul 13, 2024

No incidents reported.