The Problem with MTTR: Learning from Incident Reports
devinterrupted.substack.com
Tracking Mean Time To Restore (MTTR) is standard industry practice for incident response and analysis, but should it be? Courtney Nash, an Internet Incident Librarian, argues that MTTR is not a reliable metric - and we think she's got a point. We caught up with Courtney at the DevOps Enterprise Summit in Las Vegas, where she was making her case against MTTR in favor of alternative metrics (SLOs and cost of coordination data), practices (Near Miss analysis), and mindsets (humans are the solution, not the problem) to help organization better learn from their incidents.
The Problem with MTTR: Learning from Incident Reports
The Problem with MTTR: Learning from Incident…
The Problem with MTTR: Learning from Incident Reports
Tracking Mean Time To Restore (MTTR) is standard industry practice for incident response and analysis, but should it be? Courtney Nash, an Internet Incident Librarian, argues that MTTR is not a reliable metric - and we think she's got a point. We caught up with Courtney at the DevOps Enterprise Summit in Las Vegas, where she was making her case against MTTR in favor of alternative metrics (SLOs and cost of coordination data), practices (Near Miss analysis), and mindsets (humans are the solution, not the problem) to help organization better learn from their incidents.