Bridge service unavailability

Incident Report for Virtuoso

Postmortem

Our team was alerted to the unavailability of the bridge service for a large number of our customers. Upon this discovery, our team immediately started investigating the problem and identified the problem to be due to the discovery of some of our bridge services.

Prior to this, our team had made a number of improvements and refactoring that involved changing part of the discovery logic, which inadvertently caused a number of older bridge servers not to be discovered by our bot, and as a result, led to execution failures.

Although we do have extensive tests for both our bridge service and the discovery logic (e.g., including even full end-to-end validation of creating and using a bridge), this issue escaped our testing as it impacted only a subset of the services to be discovered.

As part of this, upon discovery of the root cause, we immediately shipped a fix to the logic that resolves this discovery. We would like to apologize for any trouble this caused for the subset of customers using our bridge service, and we are putting mitigations in place to ensure this issue would not occur again.

Posted Jan 25, 2023 - 12:25 UTC

Resolved

This incident has been resolved.

Posted Jan 25, 2023 - 11:00 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Jan 25, 2023 - 10:40 UTC

Update

We are continuing to work on a fix for this issue.

Posted Jan 25, 2023 - 09:54 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Jan 24, 2023 - 15:00 UTC

This incident affected: Bridge.