www.ably.io
Back

Incident log archive

All times are shown in UTC

July 2016

17th July 2016 04:00:00 AM

Reliability issues for certain channels for users in some regions

Some users in certain regions, including ap-northeast, ap-southeast, and sa-east, have been experiencing issues receiving messages from users in other regions on certain channels, due to an inter-region communication issue. We have done a temporary fix, and are preparing a permanent fix. Anyone still experiencing problems should contact us asap - thanks.

Resolved

over 2 years ago
9th July 2016 10:26:39 PM

Heroku platform issues affecting reliability of our website platform

Our website which is hosted with Heroku was offline intermittently for around an hour due to Heroku issues.

9th Jul 10:27 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

June 2016

14th June 2016 03:24:17 AM

Our automated health check system has reported an issue with realtime cluster health in ap-southeast-1-a

This incident was created automatically by our automated health check system as it has identified a fault.

This was caused by a temporary network issue between data centers.

14th Jun 03:30 AM

The temporary networking issue resolved itself.

Resolved

over 2 years ago
6th June 2016 12:04:55 PM

Cluster health issues

We are seeing some connection issues between regions in Australia, Singapore and Oregon US.

The impact of this is that those regions may experience some delays during this time.

28th May 12:48 PM

An additional fault has been detected in ap-southeast-2-a

28th May 12:48 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

7th Jun 02:02 PM

An additional fault has been detected in us-west-2-a

7th Jun 02:38 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

May 2016

28th May 2016 12:04:24 PM

Scheduled South America data & Singapore centre downtime

Due to an incompatible upgrade needed for our South American data centre (sa-east-1), we are replacing the entire data centre with a new one. During this time, the data centre in South America will be offline and customers will be routed to the nearest data centre in US East.

We expect the data centre in South America to be offline for no more than 15 minutes.

28th May 12:13 PM

Our data centre in Singapore also needed upgrading in the same way and unfortunately did not have a compatible upgrade path. Singapore will also be offline for roughly 15 minutes. All traffic will be routed to Sydney, Tokyo or US West during this time.

28th May 02:58 PM

We hit some unforeseen issues with the upgrades which resulted in our Australian, Singaporean and South America data centres being offline for the upgrade at different times over the last 4 hours.

During this time, the overall realtime service remained online, and all traffic for each data centre was routed to alternate data centres.

The work is now complete, and all data centres are back online.

Resolved

over 2 years ago
22nd May 2016 11:40:26 PM

Website performance

For 5 minutes our Redis cache used for the website was unavailable following a scheduled upgrade. During this time, the performance degraded significantly.

22nd May 11:45 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

April 2016

20th April 2016 05:13:15 PM

Partial database write operation disruption

At 5.13pm today, our database cluster experienced connectivity issues between the three regions the databases exist in. As a result, quorum writes for some operations failed because each a consensus between available servers was not possible.

Most of our realtime operations were unaffected during this time, however the following operations for some customers were failing:

- API Key creation
- Stats
- Message history

At approximately 6.30pm the issue was resolved.

Resolved

over 2 years ago
9th April 2016 11:30:14 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

9th Apr 12:32 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago
7th April 2016 04:17:34 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

7th Apr 04:18 AM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

over 2 years ago

March 2016

16th March 2016 02:34:26 AM

Our automated health check system has reported an issue with website in website

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

16th Mar 02:40 AM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

16th Mar 02:42 AM

The ably.io website experienced caching issues for 6 minutes, but the issue is now fixed.

Resolved

over 2 years ago

February 2016

24th February 2016 03:33:16 PM

Scheduled maintenance: 11pm UTC on 24 Feb 2016

We have scheduled maintenance to our global realtime infrastructure taking place at 11pm UTC time on the 24 Feb 2016.

Unfortunately we expect a few minutes of downtime globally as we are rolling out a change that requires all nodes to be upgraded simultaneously. This is due to us changing the communication protocol that all instances which is being done to help improve stability during network outages.

26th Feb 01:08 AM

This maintenance is now complete.

Closed

over 2 years ago
20th February 2016 01:15:05 PM

Our automated health check system has reported an issue with realtime cluster health in us-east-1-a

This incident was created automatically by our automated health check system as it has identified a fault. We are now looking into this issue.

20th Feb 01:16 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

20th Feb 02:07 PM

For approximately 1 minute at 13:15, requests to one of our data centres were timing out. We are investigating the root cause, but the service has now returned to normal.

Resolved

over 2 years ago

January 2016

28th January 2016 07:01:55 AM

Degraded performance and intermittent failures

Following a routine recycle of our databases, our realtime system is attempting to connect to old database servers in all regions that do not exist.

This has resulted in an increased rate of 500, 401 and timeout errors.

We are looking into this now.

28th Jan 01:19 PM

We have identified the root cause of the issue and have fixed it.

We are also going to modify our monitoring systems to better detect intermittent failures as our monitoring systems were unfortunately reporting everything as healthy.

Resolved

almost 3 years ago
26th January 2016 11:00:32 AM

South America region taken offline due to AWS transient issues

At roughly 11:00am GMT, Amazon EC2 and ELB in the South America region have become unstable with transient intermittent issues.

We have taken the region offline and routed traffic to all other healthy data centres until Amazon resolves the issue.

26th Jan 01:17 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

26th Jan 01:18 PM

The issue has been resolved and all traffic is now being routed to South America again.

Update from Amazon:

"Between 2:15 AM and 4:33 AM PST we experienced connectivity issues for instances in a single Availability Zone and increased error rates for the EC2 APIs in the SA-EAST-1 Region. The issue has been resolved and the service is operating normally."

Resolved

almost 3 years ago
19th January 2016 08:17:17 PM

Our automated health check system detected a 1 minute issue in realtime Asia Singapore

This incident was created automatically by our automated health check system as it has identified a fault.

The issue appears to be a brief networking issue in that region.

19th Jan 08:18 PM

Our health check system has reported this issue as resolved.
We will continue to investigate the issue and will update this incident shortly.

Resolved

almost 3 years ago