Degraded performance

Incident Report for Napta

Postmortem

  • at 9:04 AM CET, an alert is received concerning the response time for our backend tasks. After analysis, we discovered that the daily scaling of our infrastructure did not run. There were then not enough backend tasks to support the load.
  • at 9:34 AM CET, the number of backend tasks is back to its nominal value
  • at 9:41 AM CET, an alert is received concerning a high CPU level on one of our database server. After analysis, we discovered at 9:50 AM CET that it is due to the unwanted activation of a feature sent in production on Friday June 9th at 1:00 PM CET. A fix is developed and, in parallel of its deployment, a corrective action is performed on the concerned client database in order to lower the impact of the issue.
  • at 10:15 AM CET, the concerned database server CPU is back to its nominal value, both incidents are resolved

Following these incidents, the following actions were implemented:

  • Setup of an alert monitoring in real tome the number of active backend tasks
  • Integration of alerts to our day to day tools in order to gain reactivity
Posted Jun 14, 2023 - 15:00 UTC

Resolved

This incident has been resolved.
Posted Jun 12, 2023 - 08:15 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jun 12, 2023 - 08:12 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Jun 12, 2023 - 07:55 UTC

Investigating

We are still seeing degraded performances for some of our clients.
Posted Jun 12, 2023 - 07:45 UTC

Monitoring

The issue is mitigated and we're now monitoring it.
Posted Jun 12, 2023 - 07:35 UTC

Identified

We've identified the cause of the problem and are working on it.
Posted Jun 12, 2023 - 07:32 UTC

Investigating

We are currently experiencing degraded performance on app.napta.io, our team is currently looking into this.
Posted Jun 12, 2023 - 07:30 UTC
This incident affected: Application.