On Friday 16th September 2022 at 00:00 BST, Amazon Web Services began a maintenance window impacting one of our AWS Direct Connect lines. At 01:32 BST, the AWS Direct Connect line went down, making AWS services unreachable from our data centre. This caused an outage on:
• Portal Hub and the Monitoring Portal • Interactive Dashboard • Notification Service (mobile application push notifications and webhooks) • Alerting (SMS, voice and e-mail)
The on-call engineer responded to alerts at 01:35 BST and began an investigation. At 02:16 BST, the on-call engineer raised a code amber alert to request help from colleagues. By 02:42 BST, web portals had recovered as the AWS Direct Connect line had been restored following completion of the maintenance. It took until 05:00 BST for all push notifications and webhooks to be fully processed and sent to customers.
No tests were missed.
Posted Sep 16, 2022 - 05:00 BST
Investigating
We are investigating a possible issue with MI. Impact and scope not yet understood. More updates to follow.