DevOps 007: Monitoring in a Technical Environment



  • Nell Shamrell-Harrington
  • Scott Nixon

Episode Summary

In this episode of the Adventures in DevOps podcast, panelists Nell Shamrell-Harrington and Scott Nixon talk about monitoring in the software world. They start the discussion by talking about the difference between monitoring and alerting. They discuss how logging comes into picture in monitoring, two main types of logs – structured and unstructured, log management in the DevOps environment, information storage, parsing logs and log aggregation. They list two major kinds of monitoring software – pull and push. Nell explains what they mean and how they work, and Scott gives examples of each including syslog, healthcheck, etc.

They then talk about what it means by a "working" system, and consequently, when can something be considered to be non-functional. This leads to answering the important question of what exactly should be monitored. They explain to what extent should one go while monitoring and how to determine the significance level of the events in general. They discuss some concepts from Mike Julian's book "Practical Monitoring" including anti-patterns such as tool obsession, what not do do in monitoring and the fact that businesses need to customize their systems based on what works for them. They talk about the tool Nagios, benefits in using default monitoring tools provided by native cloud systems, using monitoring as a crutch, and manual configuration. They then discuss some good practices, namely composable monitoring, performance monitoring from the users' perspective, the mantra – 'buy not build', and continual improvements. They briefly touch on the topic of security in monitoring, and wrap up the episode with picks.



Nell Shamrell-Harrington:

Scott Nixon: