Skip to main content
Downtime Incident

๐Ÿ“‹ What You Need Before Starting

Make sure these are ready:
  • Incident.io Setup: For managing incidents.
  • ClickStack: For checking logs and errors.
  • Checkly Debugging: For testing and monitoring.

๐Ÿšจ Stay Calm and Take Action

Donโ€™t panic! Follow these steps to fix the issue.
  1. Tell Your Users:
    • Let your users know thereโ€™s an issue. Post on Community and Discord.
    • Example message: โ€œWeโ€™re looking into a problem with our services. Thanks for your patience!โ€
  2. Find Out Whatโ€™s Wrong:
    • Gather details. Whatโ€™s not working? When did it start?
  3. Update the Status Page:
    • Use Incident.io to update the status page. Set it to โ€œInvestigatingโ€ or โ€œPartial Outageโ€.

๐Ÿ” Check for Infrastructure Problems

  1. Look at DigitalOcean:
    • Check if the CPU, memory, or disk usage is too high.
    • If it is:
      • Increase the machine size temporarily to fix the issue.
      • Keep looking for the root cause.

๐Ÿ“œ Check Logs and Errors

  1. Use Clickstack:
  2. Check Sentry:
    • Look for grouped errors (errors that happen a lot).
    • Try to reproduce the error and fix it if possible.

๐Ÿ› ๏ธ Debugging with Checkly

  1. Check Checkly Logs:
    • Watch the video recordings of failed checks to see what went wrong.
    • If the issue is a timeout, it might mean thereโ€™s a bigger performance problem.
    • If itโ€™s an E2E test failure due to UI changes, itโ€™s likely not urgent.
      • Fix the test and the issue will go away.

๐Ÿšจ When Should You Ask for Help?

Ask for help right away if:
  • Flows are failing.
  • The whole platform is down.
  • Thereโ€™s a lot of data loss or corruption.
  • Youโ€™re not sure what is causing the issue.
  • Youโ€™ve spent more than 5 minutes and still donโ€™t know whatโ€™s wrong.
๐Ÿ’ก How to Ask for Help:
  • Use Incident.io to create a critical alert.
  • Go to the Slack incident channel and escalate the issue to the engineering team.
If youโ€™re unsure, ask for help! Itโ€™s better to be safe than sorry.

๐Ÿ’ก Helpful Tips

  1. Stay Organized:
    • Keep a list of steps to follow during downtime.
    • Write down everything you do so you can refer to it later.
  2. Communicate Clearly:
    • Keep your team and users updated.
    • Use simple language in your updates.
  3. Take Care of Yourself:
    • If you feel stressed, take a short break. Grab a coffee โ˜•, take a deep breath, and tackle the problem step by step.