On August 13th at 9:43 UTC, some services became unavailable.
The issue occurred due to a hardware reset by the provider, which caused a connection problem with one of the databases. The failover mechanism did not function correctly: it switched the database, but not instantly. We noticed the issue through our monitoring system, then users started reporting the problem.
Everything was restored within 9 minutes, by 9:52 UTC.
Some services restored operation automatically, while others were manually restored by us.
However, WebSocket experienced instability for an hour.
What we are doing to prevent this from happening again:
- We will improve the failover mechanism.