All times are shown in UTC
We are experiencing elevated error rates and latencies in all regions, due to continued intermittent performance issues we're experiencing with our database layer.
As yesterday, the incident resolved itself after 9 minutes. We continue to investigate as a top priority.
1st Mar 11:31 PMWe have now identified the root cause of the recent latency issues we've experienced in the global persistence layer, and have rolled out updates in the global persistence layer that have ensured that the latencies are consistently low.
The primary cause of the problem was an inadequate rate limiter in one area of our system, which allowed our persistence layer to be overloaded and thus impact the global service latencies for operations that rely on the persistence layer (primarily history, push registrations, and persisted tokens).
A full post mortem will follow soon.
10th Mar 08:23 PMOur engineering and ops team have completed the post mortem of this incident and summarised all actions we have taken to ensure we can avoid any future disruption to our service.
See https://gist.github.com/pauln-ably/03098db1095f4ef61aac801ae987dac2
Resolved
in 11 minutes