Investigating - The OSDF system has been having trouble over the weekend. This is causing OSDF transfers to fail with a message like "error while querying the director at https://osdf-director.osg-htc.org: Transfer.DirectorTimeout Error".
We are investigating the problem.
Jun 08, 2026 - 09:01 CDT
Identified - Some jobs running on gpulab2001 or gpulab2003 may fail with an error "CUDA error: failed call to cuInit: CUDA_ERROR_UNKNOWN". We are working to resolve the issue.
Jun 02, 2026 - 16:53 CDT
Resolved -
We identified the specific cause and are addressing it. Condor commands on ap2002 should be working again, though the issue may reoccur in the future.
Jun 4, 10:08 CDT
Investigating -
We're seeing reports of condor commands, such as condor_submit and condor_q, hanging or failing. We are investigating the cause and will update this Status Page as more information becomes available.
Jun 3, 16:32 CDT
Jun 3, 2026
Jun 2, 2026
Unresolved incident: [HTC] GPU issues on gpulab2001, gpulab2003.
The issue was caused by a highly specific and rare bug. We are working to address the underlying cause.
Jun 1, 11:27 CDT
Monitoring -
The issue has been fixed, at least temporarily. We're still not clear on the cause, though, so there is a chance it may reoccur over the weekend.
May 29, 14:13 CDT
Update -
We are still investigating this issue. Our attempted fix did not work so we need to dig deeper into the cause.
May 29, 11:32 CDT
Investigating -
The HTCondor queue on ap2001 is currently down. If you run a command like `condor_q` or `condor_submit`, you'll see a message like:
> Error: Can't find address for schedd ap2001.chtc.wisc.edu
> ERROR: Can't find address of local schedd
We are investigating why the queue is down.
May 29, 08:04 CDT