www.ably.io
Back

Incident log archive

All times are shown in UTC

August 2016

20th August 2016 04:30:00 PM

Issues with the South America East datacentre are causing performance problems worldwide

We are investigating and hope to resolve this shortly

Closed

about 2 years ago
17th August 2016 11:00:00 PM

Faulty node

Intermittent 500 responses were reported in our Sydney data center on Tue 16 Aug at 7:06PM UTC time.

We identified the problem with a single node in the ap-southeast-2 data center and resolved the problem by redeploying all nodes in that region.

We are investigating the root cause of the issue.

Resolved

about 2 years ago
16th August 2016 07:06:12 PM

Faulty node

Intermittent 504 responses were reported in us-east-1 on Tue 16 Aug at 7:06PM UTC time.

We identified the problem with a single node in the us-east-1 data center and resolved the problem by simply restarting that faulty node within 2 hours.

However, the root cause appears to be a bug in our RPC layer, so we have taken action to get the issue resolved upstream and also ensure we get notified of this type of fault in advance moving forwards.

Resolved

about 2 years ago

July 2016

17th July 2016 04:00:00 AM

Reliability issues for certain channels for users in some regions

Some users in certain regions, including ap-northeast, ap-southeast, and sa-east, have been experiencing issues receiving messages from users in other regions on certain channels, due to an inter-region communication issue. We have done a temporary fix, and are preparing a permanent fix. Anyone still experiencing problems should contact us asap - thanks.

Resolved

over 2 years ago
9th July 2016 10:26:39 PM

Heroku platform issues affecting reliability of our website platform

Our website which is hosted with Heroku was offline intermittently for around an hour due to Heroku issues.

9th Jul 10:27 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

June 2016

14th June 2016 03:24:17 AM

Our automated health check system has reported an issue with realtime cluster health in ap-southeast-1-a

This incident was created automatically by our automated health check system as it has identified a fault.

This was caused by a temporary network issue between data centers.

14th Jun 03:30 AM

The temporary networking issue resolved itself.

Resolved

over 2 years ago
6th June 2016 12:04:55 PM

Cluster health issues

We are seeing some connection issues between regions in Australia, Singapore and Oregon US.

The impact of this is that those regions may experience some delays during this time.

28th May 12:48 PM

An additional fault has been detected in ap-southeast-2-a

28th May 12:48 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

7th Jun 02:02 PM

An additional fault has been detected in us-west-2-a

7th Jun 02:38 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

May 2016

28th May 2016 12:04:24 PM

Scheduled South America data & Singapore centre downtime

Due to an incompatible upgrade needed for our South American data centre (sa-east-1), we are replacing the entire data centre with a new one. During this time, the data centre in South America will be offline and customers will be routed to the nearest data centre in US East.

We expect the data centre in South America to be offline for no more than 15 minutes.

28th May 12:13 PM

Our data centre in Singapore also needed upgrading in the same way and unfortunately did not have a compatible upgrade path. Singapore will also be offline for roughly 15 minutes. All traffic will be routed to Sydney, Tokyo or US West during this time.

28th May 02:58 PM

We hit some unforeseen issues with the upgrades which resulted in our Australian, Singaporean and South America data centres being offline for the upgrade at different times over the last 4 hours.

During this time, the overall realtime service remained online, and all traffic for each data centre was routed to alternate data centres.

The work is now complete, and all data centres are back online.

Resolved

over 2 years ago
22nd May 2016 11:40:26 PM

Website performance

For 5 minutes our Redis cache used for the website was unavailable following a scheduled upgrade. During this time, the performance degraded significantly.

22nd May 11:45 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

April 2016

20th April 2016 05:13:15 PM

Partial database write operation disruption

At 5.13pm today, our database cluster experienced connectivity issues between the three regions the databases exist in. As a result, quorum writes for some operations failed because each a consensus between available servers was not possible.

Most of our realtime operations were unaffected during this time, however the following operations for some customers were failing:

- API Key creation
- Stats
- Message history

At approximately 6.30pm the issue was resolved.

Resolved

over 2 years ago
9th April 2016 11:30:14 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

9th Apr 12:32 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago
7th April 2016 04:17:34 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

7th Apr 04:18 AM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

March 2016

16th March 2016 02:34:26 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

16th Mar 02:40 AM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

16th Mar 02:42 AM

The ably.io website experienced caching issues for 6 minutes, but the issue is now fixed.

Resolved

over 2 years ago

February 2016

24th February 2016 03:33:16 PM

Scheduled maintenance: 11pm UTC on 24 Feb 2016

We have scheduled maintenance to our global realtime infrastructure taking place at 11pm UTC time on the 24 Feb 2016.

Unfortunately we expect a few minutes of downtime globally as we are rolling out a change that requires all nodes to be upgraded simultaneously. This is due to us changing the communication protocol that all instances which is being done to help improve stability during network outages.

26th Feb 01:08 AM

This maintenance is now complete.

Closed

over 2 years ago
20th February 2016 01:15:05 PM

Our automated health check system has reported an issue with realtime cluster health in us-east-1-a

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

20th Feb 01:16 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

20th Feb 02:07 PM

For approximately 1 minute at 13:15, requests to one of our data centres were timing out. We are investigating the root cause, but the service has now returned to normal.

Resolved

over 2 years ago