Incident Details

All times are shown in UTC

29th February 2020 09:23:00 PM

Performance issues in all regions due to database layer issues

We are experiencing elevated error rates and latencies in all regions, due to continued intermittent performance issues we're experiencing with our database layer.

29th Feb 09:36 PM

As yesterday, the incident resolved itself after 9 minutes. We continue to investigate as a top priority.

1st Mar 11:31 PM

We have now identified the root cause of the recent latency issues we've experienced in the global persistence layer, and have rolled out updates in the global persistence layer that have ensured that the latencies are consistently low.

The primary cause of the problem was an inadequate rate limiter in one area of our system, which allowed our persistence layer to be overloaded and thus impact the global service latencies for operations that rely on the persistence layer (primarily history, push registrations, and persisted tokens).

A full post mortem will follow soon.

10th Mar 08:23 PM

Our engineering and ops team have completed the post mortem of this incident and summarised all actions we have taken to ensure we can avoid any future disruption to our service.

See https://gist.github.com/pauln-ably/03098db1095f4ef61aac801ae987dac2


in 11 minutes