๐ What You Need Before Starting
Make sure these are ready:
- Incident.io Setup: For managing incidents.
- ClickStack: For checking logs and errors.
- Checkly Debugging: For testing and monitoring.
๐จ Stay Calm and Take Action
Donโt panic! Follow these steps to fix the issue.
-
Tell Your Users:
- Let your users know thereโs an issue. Post on Community and Discord.
- Example message: โWeโre looking into a problem with our services. Thanks for your patience!โ
-
Find Out Whatโs Wrong:
- Gather details. Whatโs not working? When did it start?
-
Update the Status Page:
- Use Incident.io to update the status page. Set it to โInvestigatingโ or โPartial Outageโ.
๐ Check for Infrastructure Problems
- Look at DigitalOcean:
- Check if the CPU, memory, or disk usage is too high.
- If it is:
- Increase the machine size temporarily to fix the issue.
- Keep looking for the root cause.
๐ Check Logs and Errors
-
Use Clickstack:
-
Check Sentry:
- Look for grouped errors (errors that happen a lot).
- Try to reproduce the error and fix it if possible.
๐ ๏ธ Debugging with Checkly
- Check Checkly Logs:
- Watch the video recordings of failed checks to see what went wrong.
- If the issue is a timeout, it might mean thereโs a bigger performance problem.
- If itโs an E2E test failure due to UI changes, itโs likely not urgent.
- Fix the test and the issue will go away.
๐จ When Should You Ask for Help?
Ask for help right away if:
- Flows are failing.
- The whole platform is down.
- Thereโs a lot of data loss or corruption.
- Youโre not sure what is causing the issue.
- Youโve spent more than 5 minutes and still donโt know whatโs wrong.
๐ก How to Ask for Help:
- Use Incident.io to create a critical alert.
- Go to the Slack incident channel and escalate the issue to the engineering team.
If youโre unsure, ask for help! Itโs better to be safe than sorry.
๐ก Helpful Tips
-
Stay Organized:
- Keep a list of steps to follow during downtime.
- Write down everything you do so you can refer to it later.
-
Communicate Clearly:
- Keep your team and users updated.
- Use simple language in your updates.
-
Take Care of Yourself:
- If you feel stressed, take a short break. Grab a coffee โ, take a deep breath, and tackle the problem step by step.