at 9:04 AM CET, an alert is received concerning the response time for our backend tasks. After analysis, we discovered that the daily scaling of our infrastructure did not run. There were then not enough backend tasks to support the load.
at 9:34 AM CET, the number of backend tasks is back to its nominal value
at 9:41 AM CET, an alert is received concerning a high CPU level on one of our database server. After analysis, we discovered at 9:50 AM CET that it is due to the unwanted activation of a feature sent in production on Friday June 9th at 1:00 PM CET. A fix is developed and, in parallel of its deployment, a corrective action is performed on the concerned client database in order to lower the impact of the issue.
at 10:15 AM CET, the concerned database server CPU is back to its nominal value, both incidents are resolved
Following these incidents, the following actions were implemented:
Setup of an alert monitoring in real tome the number of active backend tasks
Integration of alerts to our day to day tools in order to gain reactivity
Posted Jun 14, 2023 - 15:00 UTC
Resolved
This incident has been resolved.
Posted Jun 12, 2023 - 08:15 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 12, 2023 - 08:12 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jun 12, 2023 - 07:55 UTC
Investigating
We are still seeing degraded performances for some of our clients.
Posted Jun 12, 2023 - 07:45 UTC
Monitoring
The issue is mitigated and we're now monitoring it.
Posted Jun 12, 2023 - 07:35 UTC
Identified
We've identified the cause of the problem and are working on it.
Posted Jun 12, 2023 - 07:32 UTC
Investigating
We are currently experiencing degraded performance on app.napta.io, our team is currently looking into this.