The Problem with MTTR: Learning from Incident Reports

Feb 07, 2023

Tracking Mean Time To Restore (MTTR) is standard industry practice for incident response and analysis, but should it be?

Courtney Nash, an Internet Incident Librarian, argues that MTTR is not a reliable metric - and we think she's got a point.

We caught up with Courtney at the DevOps Enterprise Summit in Las Vegas, where she was making her case against MTTR in favor of alternative metrics (SLOs and cost of coordination data), practices (Near Miss analysis), and mindsets (humans are the solution, not the problem) to help organization better learn from their incidents.

Episode Highlights:

(1:54) The end of MTTR?
(4:50) Library of incidents
(13:20) What is an incident?
(19:41) Cost of coordination
(22:13) Near misses
(24:21) Mental models
(28:16) Role of language in shaping public discourse
(29:33) Learnings from The Void

While you’re here, check out this video from our YouTube channel, and be sure to like and subscribe when you do!

On Feb. 15th, learn how the smartest engineers increased productivity by up to 30% without working longer hours or burning out their teams

Every engineering team could use some easy wins right now, including yours. We’re talking quick ways to improve productivity without wasting more hours at work, spending a lot of money or burning out already-stretched teams.

That’s why LinearB teamed up with dozens of engineering orgs to crowdsource the real ways smart companies have improved their productivity - quickly and most importantly, easily.

On February 15th, LinearB will be presenting this playbook at our free, virtual Scaling Developer Efficiency workshop.

Designed to help you gain up to 30% increased productivity, this free online workshop will empower your team with strategies and solutions that let them focus on what they do best: Code.

Reserve Your Virtual Seat

Originally published at Dev Interrupted

Discussion about this post

Ready for more?