At around 0845 GMT on 6 February 2025, our engineering team identified an increased error rate when the platform attempted to launch new workers to provide Live Authoring sessions and run executions requested by customers. Shortly after this time we confirmed that there was an issue pulling images from the Docker Hub registry, which meant we could not launch additional capacity. This resulted in a degradation of experience until the upstream issue was corrected at 0913, after which capacity returned to normal as the backlog of work was cleared.
We have conducted an investigation into how we may insulate ourselves from similar incidents in the future. After reviewing our infrastructure configuration, we have identified that some monitoring “sidecar” containers that we use to collect logging information were fetching images from the public Docker Hub registry. This meant that while our application images were available, the incident meant that our containers could not launch correctly. We will shortly be deploying a change to switch these monitoring containers over to the same registry the core Virtuoso services use to prevent this from recurring.