Over the past 2 days Narrator has experienced a higher than average error rate. We have now officially fixed the issue and I apologize for the inconvenience to everyone.
The Change
As Narrator continues to grow we aimed to improve our servers processing and security by the following changes:
The Deploy
We deployed our changes in steps over the last couple of months.
Over the weekend the some small edge case bugs were found and resolved but no major issues occurred.
The Timeline
On Tuesday, we began noticing way more of our requests timing out. This was very surprising since we had lots of auto-scaling measures.
As we began investigating the issues, we realized:
On Monday night, we updated our workers and noticed the problem was solved.
We kept monitoring the system to ensure the problem is premaritally solved.
At 8am on Tuesday, we saw another spike of timeouts
We investigated and realized that our auto-scaling was not working due to a bug in our Health Check that helps decide where to route traffic.
At 1pm on Tuesday, we implemented a fix to bring our systems back
On Tuesday, we continued to monitor all our endpoints
The Issue
Our Health-Check process checks our servers to ensure they are up and running. This helps us distribute the load. A bug was causing the health request to fail (we were blocking IP based requests to stop malicious behavior) thus the servers were being flagged as unhealthy, this resulted all our traffic to be routed into less and less servers. This resulted in timeout and the “Failed to Fetch” error.
The solution
Fixed the Health Check requests by flagging the traffic to enable the Health Check code to succeed.
The learning
We take this situation very seriously and we are updating our deployment strategy so this does not happen in the future.
Our new deploy process will now be:
With the addition of a deploy to some high usage customers we can ensure that auto-scaling and load is handled.
I hope you can see that we take these issues incredibly seriously and are very sorry for your experience. If you have any concerns or recommendations, please feel free to email ahmed@narrator.ai.
Thanks