Resolved -
A fix has been implemented and confirmed to work. Users with idle GPU jobs should remove their jobs (`condor_rm`) and resubmit the jobs, due to an incorrect expression in the jobs' attributes. Newly submitted jobs should match normally.
Apr 15, 10:48 CDT
Monitoring -
A fix has been implemented. Users with idle GPU jobs should remove their jobs (`condor_rm`) and resubmit the jobs, due to an incorrect expression in the jobs' attributes. We are monitoring the situation.
Apr 14, 15:24 CDT
Investigating -
We've received reports of GPU jobs failing to match and start up, staying stuck in the IDLE state. We're investigating the cause and will update this statuspage as more information or a solution is implemented.
Apr 13, 16:50 CDT
Resolved -
We identified the cause of the issue. When all licenses are checked out, any new jobs requesting licenses will fail with the "Failed to connect to token server" message. We have contact users who are using a majority of the licenses.
All Gurobi users must use `concurrency_limits = GUROBI:1` in their Gurobi jobs' submit files. This ensures that when all licenses are checked out, jobs will remain in idle instead of failing.
Apr 15, 10:47 CDT
Investigating -
Some users are reporting that their Gurobi jobs are failing with the message, "Failed to connect to token server". We are currently investigating. We encourage users using Gurobi to double-check that they are using `concurrency_limits = GUROBI:1` in their submit file.
Apr 13, 14:29 CDT
Resolved -
We've fixed the issue and are working through the backlog of requests.
Apr 10, 13:30 CDT
Identified -
An issue with our accounting system is preventing us from creating new CHTC accounts or modifying existing CHTC accounts. This does not affect anyone with an existing account nor their ability to login, but will delay us from creating accounts for new users or giving existing users access to new resources.
Apr 9, 16:56 CDT
Resolved -
The Gurobi license has been renewed. Users may now submit Gurobi jobs.
Apr 8, 13:45 CDT
Identified -
The Gurobi license for CHTC has expired. We are working with campus IT to renew the license. In the meantime, user jobs attempting to use the Gurobi license will likely fail due to a "license expired" error.
Apr 2, 15:13 CDT
Resolved -
This incident has been resolved.
Apr 6, 13:19 CDT
Investigating -
Jobs using the osdf:// file transfer plugin may go on hold with the message "Details: failed to get namespace information for remote URL ... error while querying the director... Error code 3001: 404: No sources found for the requested path: no origins found for the requested namespace". We are currently investigating.
Apr 6, 12:05 CDT
Resolved -
This incident has been resolved.
Apr 2, 10:37 CDT
Monitoring -
We identified the cause of the problem and have applied a fix. Initial tests appear successful. Let us know at chtc@cs.wisc.edu if you continue to encounter problems.
Apr 2, 09:28 CDT
Investigating -
Confirmed user reports that job submission on ap2002.chtc.wisc.edu is failing. We are investigating the issue and will provide updates as they become available.
Apr 2, 09:09 CDT