All times are shown in UTC
Connections to us-east-1 are currently all failing. We are investigating why.
EU and AP datacenters are operating as normal; client libraries using fallbacks should automatically retry to one of those, so experience only elevated latency.
AWS are still not reporting any underlying issues. This pattern is not unusual, we frequently identify problems .
We have identified the problem as a TLS termination issue in the load balancing layers of the US East clusters. We are reporting this to AWS now, but will continue to redirect all traffic away from US East 1 until AWS resolve the underlying issue.10th Feb 08:16 PM
After investigating the issue in more depth, we know that the load balancing TLS issue is isolated to a single availability zone. We continue to liaise with AWS on the issue, and will continue to direct traffic to other regions to ensure stability for all customers.10th Feb 08:29 PM
AWS have finally acknowledge the issue with their elastic load balancers, more than 1 hour after we detected the problem and took action to address the fault.
Given this is now acknowledged, we will continue to direct traffic away from US East 1.
Given the AWS load balancers in US East have been stable for a few hours, we've progressively migrated all traffic back to the US East region.
During this time we discovered some material performance issues for some customers consuming from the Reactor Queues in US East 1.
We are now starting to compile a full post mortem and will post an update as soon as possible.
There have been no recurrences of the AWS NLB issue and traffic to us-east is fully enabled. Our investigation is ongoing and we will publish our incident report early next week.19th Feb 09:42 AM
The report summarising our investigation and conclusions for this incident is now available: https://gist.github.com/paddybyers/47e3f4490330b3c8735f643e8e5ed923
Resolvedin about 6 hours