Erik and Darby have been putting together a review of our current monitoring system (Nagios) and creating a strategy for our future monitoring and notification vision. This week, Darby and Erik presented a draft of their strategy document to the Production Services team. The doc needs a few edits yet, so a final version may be available for distribution in another week or so.
This is excellent work, and I thank Erik and Darby for their efforts on this project. In the spirit of Simplify, Standardize, Automate, it's important for us to evaluate and re-examine our tool use. Are we using the right tools? How can we improve what we do? This system monitoring strategy should give us a boost in improving system reliability.