On Wednesday 27th between 07:15 and 9:15 UTC we had an outage on the job submission and processing systems on the EU production environment. Alarms at 07:28 notified the team of increased error rate connected to this component leading to immediately started investigating, and at 08:05 after finishing the initial triage we took actions to contain the problem. Of this we highlight that at 08:15 we disabled temporarily job creation to allow the system to process the pending requests, and re-enabled it near the end of the outage window once the system stabilized.
Between 07:15 and 08:15, some jobs submitted were not correctly populated with the journeys to execute, and some API requests connected to execution details took significantly more time to run. Job affected by this will automatically timeout, and will not execute. We advise customers to retry any jobs that are in this state.
Live authoring executions remained available during the outage, although slightly impacted by the API slowdown.