www.ably.io
Back

Incident log archive

All times are shown in UTC

August 2016

22nd August 2016 06:56:19 PM

AWS issues in Asia are affecting our distributed database

AWS has reported EBS issues in a number of Asian data centres resulting in slow access to our data in those regions.

We're looking into this now.

22nd Aug 05:51 PM

An additional fault has been detected in ap-southeast-1-a

22nd Aug 05:52 PM

An additional fault has been detected in ap-southeast-2-a

22nd Aug 05:52 PM

An additional fault has been detected in ap-northeast-1-a

Resolved

about 2 years ago
22nd August 2016 06:06:37 PM

Asia data centres affected by AWS EBS issues

AWS has reported EBS issues in a number of Asian data centres resulting in slow access to our data in those regions.

We're looking into a fix for this until the EBS issues are resolved by Amazon.

22nd Aug 06:29 PM

The underlying disk issues in our database was resolved and all performance issues in Asia are now resolved.

Resolved

about 2 years ago
20th August 2016 06:00:00 PM

South America datacentre shut down due to connectivity issues

Due to continuing connectivity issues with our sa-east-1 (South America) data centre since 16:30 UTC today, that has been causing performance issues globally, we have decided to temporary shut down that datacentre until we can discover the root cause.

Traffic from South America will be automatically routed to the closest other datacentre, probably US-East or US-West. They will experience slightly higher latency, but other than that should be unaffected.

Resolved

about 2 years ago
20th August 2016 04:30:00 PM

Issues with the South America East datacentre are causing performance problems worldwide

We are investigating and hope to resolve this shortly

Closed

about 2 years ago
17th August 2016 11:00:00 PM

Faulty node

Intermittent 500 responses were reported in our Sydney data center on Tue 16 Aug at 7:06PM UTC time.

We identified the problem with a single node in the ap-southeast-2 data center and resolved the problem by redeploying all nodes in that region.

We are investigating the root cause of the issue.

Resolved

about 2 years ago
16th August 2016 07:06:12 PM

Faulty node

Intermittent 504 responses were reported in us-east-1 on Tue 16 Aug at 7:06PM UTC time.

We identified the problem with a single node in the us-east-1 data center and resolved the problem by simply restarting that faulty node within 2 hours.

However, the root cause appears to be a bug in our RPC layer, so we have taken action to get the issue resolved upstream and also ensure we get notified of this type of fault in advance moving forwards.

Resolved

about 2 years ago

July 2016

17th July 2016 04:00:00 AM

Reliability issues for certain channels for users in some regions

Some users in certain regions, including ap-northeast, ap-southeast, and sa-east, have been experiencing issues receiving messages from users in other regions on certain channels, due to an inter-region communication issue. We have done a temporary fix, and are preparing a permanent fix. Anyone still experiencing problems should contact us asap - thanks.

Resolved

over 2 years ago
9th July 2016 10:26:39 PM

Heroku platform issues affecting reliability of our website platform

Our website which is hosted with Heroku was offline intermittently for around an hour due to Heroku issues.

9th Jul 10:27 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

June 2016

14th June 2016 03:24:17 AM

Our automated health check system has reported an issue with realtime cluster health in ap-southeast-1-a

This incident was created automatically by our automated health check system as it has identified a fault.

This was caused by a temporary network issue between data centers.

14th Jun 03:30 AM

The temporary networking issue resolved itself.

Resolved

over 2 years ago
6th June 2016 12:04:55 PM

Cluster health issues

We are seeing some connection issues between regions in Australia, Singapore and Oregon US.

The impact of this is that those regions may experience some delays during this time.

28th May 12:48 PM

An additional fault has been detected in ap-southeast-2-a

28th May 12:48 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

7th Jun 02:02 PM

An additional fault has been detected in us-west-2-a

7th Jun 02:38 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

May 2016

28th May 2016 12:04:24 PM

Scheduled South America data & Singapore centre downtime

Due to an incompatible upgrade needed for our South American data centre (sa-east-1), we are replacing the entire data centre with a new one. During this time, the data centre in South America will be offline and customers will be routed to the nearest data centre in US East.

We expect the data centre in South America to be offline for no more than 15 minutes.

28th May 12:13 PM

Our data centre in Singapore also needed upgrading in the same way and unfortunately did not have a compatible upgrade path. Singapore will also be offline for roughly 15 minutes. All traffic will be routed to Sydney, Tokyo or US West during this time.

28th May 02:58 PM

We hit some unforeseen issues with the upgrades which resulted in our Australian, Singaporean and South America data centres being offline for the upgrade at different times over the last 4 hours.

During this time, the overall realtime service remained online, and all traffic for each data centre was routed to alternate data centres.

The work is now complete, and all data centres are back online.

Resolved

over 2 years ago
22nd May 2016 11:40:26 PM

Website performance

For 5 minutes our Redis cache used for the website was unavailable following a scheduled upgrade. During this time, the performance degraded significantly.

22nd May 11:45 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

April 2016

20th April 2016 05:13:15 PM

Partial database write operation disruption

At 5.13pm today, our database cluster experienced connectivity issues between the three regions the databases exist in. As a result, quorum writes for some operations failed because each a consensus between available servers was not possible.

Most of our realtime operations were unaffected during this time, however the following operations for some customers were failing:

- API Key creation
- Stats
- Message history

At approximately 6.30pm the issue was resolved.

Resolved

over 2 years ago
9th April 2016 11:30:14 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

9th Apr 12:32 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago
7th April 2016 04:17:34 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

7th Apr 04:18 AM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago