Resolved -
Updates have completed across the pool, so Docker jobs should be operating normally again.
Dec 5, 09:03 CST
Update -
We've pushed a fix for the Docker issue. It will take the system a couple of hours for the change to percolate, but behavior should be back to normal later this evening.
Dec 4, 14:56 CST
Identified -
A problem pulling Docker images requires that we update Docker on our machines. Said updates will require restarting Docker and will thus interrupt running Docker jobs.
Once the updates are complete, however, users should no longer encounter the "Error ... Cannot pull image ..." error in their Docker jobs.
Dec 4, 10:25 CST
Resolved -
This incident has been resolved.
Dec 4, 17:29 CST
Investigating -
Users are unable to log into or access learn.chtc.wisc.edu. Users may be prompted for their password three times before getting a "Permission denied" error. We are investigating.
Dec 4, 15:18 CST
Resolved -
The underlying issue with the OSDF should now be resolved.
Dec 4, 16:53 CST
Monitoring -
OSDF transfers should be working again, but the underlying issue has not yet been resolved and so the symptoms may reappear.
Dec 4, 14:57 CST
Investigating -
An issue with the OSDF may cause file transfers to fail with the error "Contact.Director Error: Error code 3001: 404"
Dec 4, 11:49 CST
Resolved -
This incident has been resolved.
Dec 3, 16:14 CST
Monitoring -
A fix has been implemented and we are monitoring the results.
Dec 3, 13:09 CST
Investigating -
Users of learn.chtc.wisc.edu are unable to access the /staging filesystem and may receive the message, "Transport endpoint is not connected". We are currently investigating.
Dec 3, 13:08 CST
Resolved -
This incident has been resolved.
Dec 3, 13:01 CST
Investigating -
HPC users may receive a message when using SLURM commands, saying, "error: NodeNames=spark-a[237-262] CPUs=128 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs." This message does not affect jobs. We are investigating.
Nov 26, 16:49 CST
Resolved -
This incident has been resolved.
Nov 25, 15:16 CST
Monitoring -
We've implemented a fix and are monitoring the issue.
Nov 25, 14:37 CST
Investigating -
Users are unable to log into or access learn.chtc.wisc.edu. When attempting to log in, users are prompted for their password three times before getting a "Permission denied" message. We are investigating.
Nov 25, 14:27 CST