Incident Details

All times are shown in UTC

1st October 2018 04:46:00 PM

Increased error rates in eu-west-1 and eu-central-1 regions

This is a continuation if incident at https://status.ably.io/incidents/561 which was mistakenly marked as resolved. All notes from the previous issue below.

We are investigating increased error rates in our two European data centers.

1st Oct 04:42 PM

Whilst AWS is not yet reporting any issues on the status or personal health dashboards, we believe this issue is caused by a fault in the eu-central-1 and possibly eu-west-1 AWS regions. We are led to believe this as dedicated clusters, isolated from the global traffic in those regions, are also affected.

We will continue to investigate the issue and take action to minimise the impact.

1st Oct 04:52 PM

AWS continues to report no issues, yet Twitter confirms the issues are widespread.

We have seen that eu-west-1 continues to exhibit problems, so we are now routing traffic away from eu-west-1 for now.

1st Oct 05:08 PM

We are seeing stability return in eu-west-1 and eu-central-1. All eu-west-1 traffic is still being routed to other regions. Once eu-west-1 settles fully we'll redirect traffic back.

We are now investigating any residual issues caused by the partitions and instability to ensure no longer term impact on customer traffic.

1st Oct 05:23 PM

Error rates have returned to normal in all regions apart from eu-central-1.

We are investigating the errors in eu-central-1 now, although the error rate in that region is now very low.

1st Oct 05:52 PM

As AWS issues in eu-west-1 and eu-central-1 continue (they have now confirmed networking issues), we are re-routing all traffic for the global cluster away from all EU regions. Please note that any traffic routed to the EU only (EU-only storage options, compliance reasons etc) unfortunately will continue to be routed to these clusters.

1st Oct 06:47 PM

We believe the EU regions are now reaching stability and intend to route traffic back to EU in the next 15 minutes once some final testing is complete.

1st Oct 07:22 PM

All global traffic is now being routed back to both EU datacenters, and everything appears to be normal. We'll continue to monitor closely now.


in about 2 hours