Investigating - The OSDF system has been having trouble over the weekend. This is causing OSDF transfers to fail with a message like "error while querying the director at https://osdf-director.osg-htc.org: Transfer.DirectorTimeout Error".
We are investigating the problem.
Jun 08, 2026 - 09:01 CDT
Identified - Some jobs running on gpulab2001 or gpulab2003 may fail with an error "CUDA error: failed call to cuInit: CUDA_ERROR_UNKNOWN". We are working to resolve the issue.
Jun 02, 2026 - 16:53 CDT
Resolved -
This incident has been resolved.
Jun 15, 12:17 CDT
Investigating -
Confirmed user reports of being unable to launch a BadgerCompute instance. The loading screen hangs on "Your server is starting up" and eventually times out with "Spawn failed".
We are investigating the issue.
Jun 15, 10:47 CDT
Jun 14, 2026
No incidents reported.
Jun 13, 2026
No incidents reported.
Jun 12, 2026
No incidents reported.
Jun 11, 2026
No incidents reported.
Jun 10, 2026
No incidents reported.
Jun 9, 2026
No incidents reported.
Jun 8, 2026
Unresolved incident: [HTC] Problems with OSDF transfers.
Resolved -
We identified the specific cause and are addressing it. Condor commands on ap2002 should be working again, though the issue may reoccur in the future.
Jun 4, 10:08 CDT
Investigating -
We're seeing reports of condor commands, such as condor_submit and condor_q, hanging or failing. We are investigating the cause and will update this Status Page as more information becomes available.
Jun 3, 16:32 CDT
Jun 3, 2026
Jun 2, 2026
Unresolved incident: [HTC] GPU issues on gpulab2001, gpulab2003.
The issue was caused by a highly specific and rare bug. We are working to address the underlying cause.
Jun 1, 11:27 CDT
Monitoring -
The issue has been fixed, at least temporarily. We're still not clear on the cause, though, so there is a chance it may reoccur over the weekend.
May 29, 14:13 CDT
Update -
We are still investigating this issue. Our attempted fix did not work so we need to dig deeper into the cause.
May 29, 11:32 CDT
Investigating -
The HTCondor queue on ap2001 is currently down. If you run a command like `condor_q` or `condor_submit`, you'll see a message like:
> Error: Can't find address for schedd ap2001.chtc.wisc.edu
> ERROR: Can't find address of local schedd
We are investigating why the queue is down.
May 29, 08:04 CDT