Incident Details

All times are shown in UTC

17th September 2018 02:40:39 PM

Increased error rates in California (us-west-1)

At 14:40 today an instance came online in the us-west-1 region.

Shortly after bootstrapping, the performance of the instance degraded significantly, but continued to service most requests. From a consensus and health checking perspective, this is hugely problematic as the bulk of the work it was supposed to perform it did, but an unacceptable number of requests failed.

We have since fixed the issue, but will now need to revisit how we manage our automated health checks to detect similar failures in the future.

We will continue to investigate and post in updates, along with a post mortem in due course.

17th Sep 04:00 PM

Error rates have returned back to normal.

We will continue to investigate the root cause and update with a post mortem.


in about 1 hour