Incident log archive

All times are shown in UTC

February 2020

27th February 2020 09:23:00 PM

Performance issues in all regions due to database layer issues

We are investigating performance issues in all regions due to an issue with our database layer (Cassandra)

27th Feb 10:41 PM

We had elevanted cassandra latencies for 9 minutes between 21:23 and 21:32 UTC. Essentially the same issue as was happening earlier today; we are still investigating the root cause.

10th Mar 08:24 PM

Please see https://status.ably.io/incidents/695 for the post mortem of this disruption.


in 9 minutes
27th February 2020 11:05:42 AM

Performance issues in all regions due to database layer issues

We are investigating performance issues in all regions due to an issue with our database layer (Cassandra)

27th Feb 11:27 AM

Error rates have dropped back to normal. We are continuing to investigate.

27th Feb 01:11 PM

Error rates are back to normal. A small segment of the keyspace was unable to achieve quorum for a two hour period; sufficient replicates are now back online to achieve quorum for the entire keyspace, and several more instances are in the process of being brought online. We will review our global replication strategy for this persistence layer as part of a post-mortem.

10th Mar 08:24 PM

Please see https://status.ably.io/incidents/695 for the post mortem of this disruption.


in about 2 hours

December 2019

3rd December 2019 04:30:00 PM

Minor transient disruption to channel lifecycle webhooks over the next day or two

Customers using channel lifecycle webhooks may experience some brief transient disruption (which in some cases may very briefly include duplicate or lost channel lifecycle webhooks) at some point over the next day or two, while we transition channel lifecycle webhooks over to a new architecture (message rules on the channel lifecycle metachannel). The result will be a and more dependable channel lifecycle webhooks, as they will now get the reliability benefits of running on top of Ably's robust, globally distributed channels, rather than (as they were previously) all lifecycle events for an app being funnelled through a single point.


in 2 days

September 2019

30th September 2019 05:40:00 AM

Capacity issues in ap-southeast-1 (Singapore) region

Since 0540UTC today, the cluster in the ap-southeast-1 region was unable to obtain sufficient capacity to meet demand. As a result, slightly higher latencies are being experienced by connections in the region.

Until more capacity is available, we are diverting traffic to ap-southeast-2 (Sydney).

30th Sep 03:46 PM

AWS capacity has now come online in the Singapore region (ap-southeast-1). All traffic is being routed back to this region now.


in about 10 hours
25th September 2019 11:54:00 AM

Elevate rate of 5xx errors in US-East-1

We had a higher than normal level of 5xx errors from our routing layer in us-east-1 between 11:54 and 13:17 UTC. We believe we have identified the issue, have instituted a workaround, and are working on a fix. Service should be generally unaffected as rejected requests will have been rerouted to other regions by our client library fallback functionality.


in about 1 hour
25th September 2019 10:45:20 AM

EU performance issues

In both EU West and EU Central at 10:45 UTC there was a sharp rise in load, which has subsided at 10:49 UTC (4 minutes).

We have manually intervened to accelerate capacity provision, and our monitoring systems indicate traffic is being routed to other regions as expected whilst the capacity issue remains.


in 4 minutes

July 2019

25th July 2019 10:00:00 AM

Issues in ap-southeast-2 (Sydney) due to data center connectivity issues

From 10:00 to 10:05 UTC, our ap-southeast-2 (Sydney) data center experienced some connectivity issues between it and other datacenters. After five minutes full connectivity was restored. Other datacenters were unaffected.


in 5 minutes
24th July 2019 01:13:22 AM

Our automated health check system has reported an issue with realtime cluster health in ap-southeast-1-a

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

24th Jul 01:14 AM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

24th Jul 10:14 AM

Due to a load spike, message transit latencies for messages through the Asia Singapore datacenter may have been slower than normal for a period of around 10 minutes. The issue resolved itself automatically through autoscaling.


in 5 minutes
2nd July 2019 01:50:05 PM

Cloudflare issues affecting fallback hosts, CDN, and website

Fallback realtime hosts (*.ably-realtime.com), the Ably website and CDN (affecting website assets and library access) are having availability problems due to Cloudflare issues: https://www.cloudflarestatus.com/incidents/tx4pgxs6zxdr .

The primary realtime hosts (rest.ably.io, realtime.ably.io) do not use cloudflare and are still working fine, so the service is still up.

We are in the process of bypassing Cloudflare on selected high-priority hosts (the website, status site, and CDN).

Update 14:05 UTC: Cloudflare has been bypassed for the website, status site, and CDN. Fallback hosts are still going through cloudflare, but as primary hosts are all up (and have been the whole time), this should have no effect on service status.

2nd Jul 02:25 PM

Cloudflare is back up, so fallback hosts are now responding as normal.


in 36 minutes

June 2019

24th June 2019 11:35:30 AM

Ably Website and CDN availability issues

The Ably website and CDN (affecting website assets and library access) are having availability problems due to the global Cloudflare outage. We are redirecting away from Cloudflare and service should resume shortly.

24th Jun 11:46 AM

All Cloudflare-mediated endpoints have been moved away from Cloudflare.


in 11 minutes
24th June 2019 10:42:20 AM

Fallback endpoints unavailable globally: Cloudflare issue

We are re-routing fallback endpoints at the moment.

More information as we have it.

24th Jun 11:08 AM

Fallback endpoints are now restored, circumventing Cloudflare


in about 1 hour

April 2019

20th April 2019 08:21:35 AM

Network timeouts in us-west-1 datacenter

We are seeing a high number of timeouts in the us-west-1 datacenter at present.

20th Apr 08:57 AM

We are investigating the root cause of the issue. If this issue is not resolved soon we will temporarily redirect traffic away from the us-west-1 datacenter until the underlying issues are resolved

20th Apr 09:19 AM

All intermittent timeouts in the us-west-1 region have stopped since 09:21 UTC.

We believe the underlying issue was a networking issue in the underlying AWS datacenter, but have not been able to confirm that. However, for the last hour, the datacenter appears to be healthy.

Clients closest to us-west-1 experiencing timeouts would have been automatically reconnected to an alternative datacenter with our automatic fallback capability. See https://support.ably.io/a/solutions/articles/3000044636 for more details.


in about 1 hour
14th April 2019 10:30:00 AM

Error rates climbing in us-east-1

We have routed traffic away temporarily from our us-east-1 datacenter whilst we investigate the cause of the increased errors in our us-east-1 datacenter. All traffic is being routed automatically to the nearest datacenters.

14th Apr 11:49 AM

The us-east-1 (North Virginia) datacenter is healthy and active again. Traffic is now being routed to this datacenter as normal.


in about 1 hour

February 2019

26th February 2019 12:00:00 AM

Occasional timeouts when querying channel history in some circumstances

In the last three days (since the 26th February), a small proportion of channels experienced timeouts querying channel history if a message was published in the same region as the query was made in within 16 seconds of the query being made. We are rolling out a fix now and looking into why this was not caught by our test suites. We apologise for the inconvenience.


in 4 days

January 2019

29th January 2019 05:45:00 PM

Elevated error rates in ap-southeast-1

Users connected at the ap-southeast-1 region (Asia Singapore) may have experienced elevated latencies and/or errors in the past half hour.

Other data centers were unaffected, so client libraries should have transparently redirected traffic to another datacenter through normal fallback functionality.

As a precaution, we have shut down that data center; users who normally connect to ap-southeast-1 will now likely connect to either ap-southeast-2 (Australia), us-west-1 (California), or eu-central-1 (Frankfurt)

29th Jan 09:59 PM

The ap-southeast-1 region is now fully operational once again.
We're continuing to investigate the underlying issue.


in 34 minutes