What Meta’s diffs per developer metric revealed about engineering at scale
What happens when effort metrics don’t tell the full story?
A guest article by James Everingham, CEO and co-founder of Guild.ai, a platform in stealth where developers build, reuse, and evolve intelligent software together. Join the waitlist today.
When I walked back into Meta for my second tour at the company, I expected the usual re-entry: relearning systems, reconnecting with people, and getting a feel for what had changed since I’d last been there. This time, I had joined DevInfra, the team responsible for internal developer workflows, and I assumed the first few weeks would be about understanding a new organization and its stack.
Instead, what I ran into first was a number. DDM: diffs per developer per month. It showed up everywhere. In reviews. In planning conversations. In leadership updates.
Before COVID, DDM hovered at a familiar level. During COVID, it dropped sharply. Leadership wanted to get DDM back to pre-COVID levels, which made sense at the time. In the middle of a global shock, everyone was looking for stable markers in an unstable moment. Seeing a key productivity metric fall was worrying, and of course people wanted to understand the drop and respond.
The gravity of work changed, the metric didn’t
But as I started digging, it became clear that DDM was reflecting more than effort. The codebase had grown massively, and the org had grown too. More engineers were working in the same areas, systems were more tightly coupled, and expectations around privacy, security, and safety were higher across the board. The work itself had gotten heavier. A simple diff in 2019 was not the same unit of work a few years later. The metric hadn’t changed, but the reality behind DDM had.
At some point, I started using a simple analogy to explain what was happening: Imagine you ran a sub-five-minute mile on Earth. That’s an extraordinary achievement. Now imagine someone asks why you can’t run the same mile on Jupiter. It’s the same runner, the stopwatch, but completely different gravity.
This isn’t a Meta-specific issue; it’s a challenge that every large, fast-moving engineering organization faces. The environment changes faster than the metrics meant to capture the environment. The unit of work gets heavier, but the chart keeps treating every diff as equal.
And once DDM became a target, as happens with almost any widely adopted metric, something predictable happened. Goodhart’s Law kicked in: when a measure becomes a target, it ceases to be a good measure.
People naturally started paying attention to what the system rewarded. It became tempting to slice work thinner, to favor low-risk changes that count, and to avoid the gnarly, high-leverage problems that take time and don’t show up cleanly in a metric. Not because anyone set out to game the system in some cartoonish way, just because when a number becomes important, systems shape behavior.
Motion vs. outcomes
Over time, a shift followed, not because anyone intended it, but because people adapt quickly to whatever the system emphasizes. No one wants to be on the wrong side of the chart. Big swings feel risky. Safe, visible activity starts to feel safer. And you can see how that slowly hardens into a toxic pattern: optimizing for motion rather than outcomes, for being seen as productive rather than doing the work that actually moves the business.
Even if you’ve never worked at Meta, you’ve still seen this movie. Swap DDM for tickets closed, story points burned, PRs merged, deployments per week, or cycle time taken out of context, pick your flavor. Every org has a number that’s convenient to count, and every org is tempted to turn that number into a proxy for value. The trap is always the same: the system evolves, the metric stays frozen, and leaders end up telling themselves a simple story that no longer matches how the work actually gets done.
Build a culture that reinforces positive behaviors
So what do you do if you’re a technology leader? For me, the shift is this: treat simple productivity metrics as clues, not verdicts. When a number drops, start by asking what changed in the codebase, the architecture, the risk profile, or the constraints around safety and privacy. Where did the work get heavier in ways the metric can’t see?
Then pair output metrics with questions about flow and impact.
How hard is it to get a straightforward change into production?
Where are reviews or builds slowing people down?
What work is actually improving the product, the customer experience, or the business?
Leadership is still a huge lever, but mostly in how you choose and interpret metrics, and whether you reward teams for chasing counts or for creating real outcomes in a system that’s constantly changing.
In an environment where the gravity keeps increasing, the real test isn’t whether you can hold a number steady, it’s whether you notice when the gravity has changed and adjust what you measure and celebrate. Because in the end, teams follow the work their leaders choose to see.
James is the CEO and co-founder of Guild.ai, a company currently in stealth, building a community where developers build, reuse, and evolve intelligent software together. He and his team built the first internal AI agent at Meta, which delivered immediate impact on how Meta engineers use AI tooling. During his time at Meta, James co-created Diem and served as Instagram’s Head of Engineering.



