Paul Zaich from Checkr tells us about a critical outage that occurred, what caused it and how they tracked down and fixed the issue. The conversation ranges through troubleshooting complex systems, building team culture, blameless post-mortems, and monitoring the right things to make sure your applications don’t fail or alert you when they do.
Panel
- Charles Max Wood
- Dave Kimura
- Luke Stutters
Guest
- Paul Zaich
Links
Picks
- Blood Pressure Monitor – Dave
- eft – Luke
- Ruby one-liners cookbook – Paul
- Podcast Growth Summit – Chuck
- Most Valuable Dev – Chuck
- Most Valuable Dev Summit – Chuck
- Mushroom Wars – Chuck
- Gmelius – Chuck
Podcast: Play in new window | Download