Core 4 fails the vanity metric test
The consultants are back with yet another measurement framework
I wonder how much better off we might be if we focused less on creating academic frameworks like Core 4. After all, we already have proven methodologies like DORA and SPACE that address significant fundamental aspects of engineering productivity. Why do we need another framework that repackages existing metrics while adding problematic new ones?
The answer is that there's money to be made in creating proprietary measurement systems that require expensive consulting engagements and specialized tooling to implement. However, a troubling pattern emerges when you apply rigorous evaluation criteria to Core 4's metrics. The framework’s additions to the productivity measurement space fail basic tests for meaningful measurement.
I think it's time the engineering community moves beyond the frameworks to focus on the things that improve developers’ lives. So if you're an engineering leader thinking about purchasing a tool to improve developer productivity or DevEx, or beginning an engineering metrics program, this article will cover the critical pitfalls to avoid.
Understanding the difference between valuable and vanity metrics
Before we dive into Core 4 specifically, let's establish what makes a metric valuable versus vanity. I’m a big fan of John Cutler's approach to defining vanity metrics: he classifies them as metrics that "make us feel good but don't help us do better work or make better decisions." He outlines eight criteria that separate meaningful metrics from vanity metrics:
Actionable: The team can directly influence the metric through their work
Timely/Responsive: Changes in the metric quickly reflect actual changes in behavior or performance
Contextual/Normalized: The metric accounts for relevant variables and doesn't oversimplify
Aligned with Goals: The metric reflects the team or organization's strategic objectives
Unambiguous: Easy to interpret; people don't argue about what it means
Resistant to Gaming: Hard to manipulate without improving underlying performance
Drives Right Behavior: Encourages collaboration, long-term thinking, and value creation
Comparability: Can be meaningfully compared across teams or periods
The results don't look terrific when we apply these criteria to Core 4's metrics. The framework is filled with metrics that fail multiple criteria, while it borrows its most valuable metrics from existing frameworks like DORA and SPACE.
Let’s break down how Core 4’s additions to the developer productivity discussion fail to meet engineering leadership needs.
The pitfalls of Core 4's key metrics
Developer Experience Index™ (DXI) is the ultimate black box
Perhaps the most egregious example in Core 4 is the Developer Experience Index:™ a proprietary composite metric based on survey responses that fails virtually every criterion for meaningful measurement:
Actionable: Black box proprietary formula makes it impossible to know what actions will improve it.
Timely/responsive: Survey-based, so changes lag behind actual improvements.
Contextual/normalized: Composite metric obscures important contextual factors.
Aligned with goals: Unclear what specific goals it measures due to proprietary nature.
Unambiguous: Proprietary formula makes interpretation impossible.
Resistant to gaming: Unknown methodology makes gaming unpredictable but possible.
Drives good behavior: Can't drive behavior when the formula is unknown.
Comparability: Proprietary benchmarking against unspecified peer groups.
This metric is precisely the opaque, proprietary solution that puts engineering leaders on the defensive. When executives ask how improving your DXI score will drive bottom-line growth, what's your answer? "Well, we don't know how it's calculated, but the consultants say it's important."
Revenue per engineer fails every vanity test
Core 4's inclusion of revenue per engineer is the most telling indicator of the framework's fundamental misunderstanding of engineering productivity. This metric fails every single criterion:
Actionable: Engineers don't control pricing, customer mix, or sales strategy.
Timely/responsive: Revenue is a lagging indicator influenced by many external factors.
Contextual/normalized: Ignores differences in product type, company stage, and monetization strategy.
Aligned with goals: Incentivizes headcount reduction over value creation.
Unambiguous: Multiple definitions of revenue (ARR, GAAP, Gross vs. net) and engineer roles create ambiguity.
Resistant to gaming: Easily gamed by reducing headcount or reclassifying roles.
Drives good behavior: Promotes fear and internal competition rather than collaboration.
Comparability: Meaningless across different business models and contexts.
PRs per engineer is the easiest metric to game
Another problematic metric is PRs per engineer (branded as diffs per engineer in Core 4), which consistently fails at driving meaningful behavior. It succeeds at being timely and unambiguous, but it fails at every other requirement:
Actionable: Engineers can easily game this by submitting smaller, more frequent PRs without improving actual productivity.
Contextual/normalized: Ignores PR complexity, code quality, review time, or business value delivered.
Aligned with goals: May incentivize quantity over quality, leading to technical debt and poor code practices.
Resistant to gaming: Trivial to fake by generating purposeless PRs.
Drives good behavior: Encourages rushed, fragmented work rather than thoughtful development.
Comparability: Meaningless across teams with different codebases, complexity levels, or development practices.
This metric encourages developers to optimize for quantity over quality, leading to technical debt and poor code practices. It's the epitome of effort-based measurement that misses the point entirely.
The problem with subjective perception metrics
Core 4 heavily relies on subjective perception metrics like perceived rate of delivery, ease of delivery, and perceived software quality. While developer sentiment certainly matters, using perception as a proxy for measurable outcomes is problematic:
Timely/Responsive: Perception can lag behind reality or be influenced by external factors unrelated to objective performance.
Contextual/Normalized: It’s heavily dependent on individual bias, team dynamics, and personal expectations.
Resistant to Gaming: Perception can be influenced by managing expectations and communication style rather than actual improvement.
Unambiguous: Subjective measures are interpreted differently by different people.
Your most vocal critics might be your highest performers, while your quietest team could be struggling but afraid to speak up. Using subjective composite metrics as a proxy for productivity introduces risks that might cause you to misidentify where to focus improvements.
The proven effectiveness of DORA and SPACE metrics
Core 4's most valuable metrics aren't new; they're established measures from proven frameworks with years of research, particularly from DORA, and these generally pass the vanity test:
Lead time is actionable (teams can improve processes), timely (reflects improvements quickly), and resistant to gaming (hard to fake without real improvements).
Deployment frequency encourages better practices like smaller batch sizes and efficient automation.
Change failure rate and failed deployment recovery time drive investment in testing, monitoring, incident response, and overall software quality.
Further, SPACE already encompasses developer satisfaction, well-being, and perceived productivity. Core 4 took these proven metrics and layered problematic new ones on top, creating a framework that's worse than the sum of its parts. Read my Core 4 vanity metric analysis for more information about the good and bad aspects of Core 4.
Six critical productivity framework pitfalls to avoid
1. Don't fall into the McKinsey-esque effort-based metrics trap
Metrics like PRs per engineer represent the worst kind of effort-based measurement because they treat developers like factory workers who are measured on their ability to churn out widgets. This approach encourages stack-ranking developers based on activity rather than value creation, which is a culture killer that drives away your best talent. The consulting world loves effort-based metrics because they're easy to measure and create the illusion of objectivity. However, measuring effort without measuring outcomes is like judging a chef by how many ingredients they use rather than how the food tastes.
Elite engineering teams at major tech companies sometimes use effort metrics for two big reasons:
They are conducting controlled experiments, in which they have isolated all other factors that might impact productivity. They can use PRs per engineer as a proxy for velocity.
They have an extremely high degree of trust that the metric won’t be gamed because it is not factored into the overall productivity analysis.
However, the typical engineering team doesn’t always have the resources for the first scenario, and its extremely difficult to build the culture and psychological safety needed for the second. For most organizations, PRs per engineer is a vanity metric at best and a dangerous distraction at worst.
Instead, the healthiest engineering cultures focus on measuring the outcomes that matter: code quality, business impact, customer value, and team collaboration. By aligning metrics with the organization's real goals, leaders build teams that are empowered to solve problems, not just check boxes. In other words, don’t optimize for activity; optimize for results. That’s how you attract, retain, and unlock the full potential of great engineers.
2. Don't bet success on opaque, proprietary metrics
When you can't explain how you calculated a metric, you can't explain what actions will improve it. This creates a dependency on specific vendors and their consulting services, which is exactly the opposite of what a measurement framework should do. Meaningful metrics should be transparent, reproducible, and vendor-agnostic, unlike the Developer Experience Index.™ If your productivity measurement requires a specific tool to calculate, you're not measuring productivity; you're measuring vendor lock-in.
Instead, build your measurement program on transparent, well-defined metrics that anyone on your team can understand and act upon. Use established frameworks like DORA metrics (lead time, deployment frequency, change failure rate, recovery time) that have clear definitions and proven correlations with organizational performance. When adopting a metric, ensure your team can explain how it's calculated and what specific actions will improve it. This transparency builds trust and enables distributed decision-making across your organization.
3. Don't buy into perception metrics for quantifiable things
While developer sentiment matters, particularly when building a DevEx roadmap, using perception as a proxy for measurable outcomes is problematic. Perceived rate of delivery doesn't tell you whether you're actually delivering faster; it tells you whether people feel like you are. Perception metrics are subject to bias, timing effects, and communication issues. They should complement objective measurements, not replace them. When executives ask about productivity improvements, they want data, not feelings.
Instead, use perception metrics to complement objective measurements, not replace them. Measure actual deployment frequency alongside subjective data about sources of developer friction. Track real lead times while also surveying developer satisfaction with delivery processes. This dual approach helps you identify disconnects between reality and perception, which often reveal communication gaps or hidden friction points. When executives ask about productivity improvements, lead with the objective data and use perception metrics to provide context about adoption and team sentiment.
4. Don't focus on lagging indicators without clear definitions
Metrics like revenue per engineer and R&D as a percentage of revenue are lagging indicators that lack clear, actionable definitions. What counts as revenue? GAAP? ARR? Gross vs. net? What constitutes R&D? These ambiguities make the metrics meaningless for decision-making. Lagging indicators have their place, but they should be used to validate improvements, not drive them. By the time these metrics move, you've already missed opportunities to course-correct.
Instead, focus on leading indicators that can guide immediate action. Identify the indicators that predict revenue impact: feature adoption rates, customer satisfaction scores, or time-to-market for new capabilities. These provide early signals to course-correct.
5. Don't frame engineering as a cost center
Core 4's emphasis on metrics like revenue per engineer perpetuates the harmful view of engineering as a cost center rather than a strategic investment. This framing encourages cost-cutting over capability-building. Engineering should be viewed as an investment in the organization's future capacity to deliver value. Metrics should reflect this investment mindset, focusing on capability development, innovation potential, and strategic impact rather than cost efficiency.
Instead, adopt metrics that reflect engineering's strategic value and investment potential. Track your overall investments into DevEx, maintenance and upkeep, and new feature innovation to ensure long-term sustainable engineering success. Frame engineering discussions around building organizational capabilities rather than reducing costs. This investment mindset helps executives understand that engineering improvements compound over time, creating sustained competitive advantages rather than short-term cost savings.
6. Don't spend all your time measuring instead of improving
The most insidious pitfall is analysis paralysis: spending so much time measuring and collecting data that you never actually improve anything. Metrics dashboards and surveys don't drive improvement; action does. Core 4 perfectly represents this: a complex framework that requires significant investment to implement but provides little clear guidance on what to actually do with the results.
Instead, adopt a bias toward action with lightweight measurement. Start with 3-5 key metrics that directly tie to your biggest pain points. Spend 10% of your time measuring and 90% of your time improving. Set up simple dashboards that highlight trends rather than comprehensive scorecards that require analysis. Most importantly, establish clear action triggers: when review time exceeds X hours, we investigate bottlenecks. When deployment frequency drops below Y per week, we examine our release process. The goal is continuous improvement, not comprehensive measurement.
Moving past metric theatrics towards continuous improvement
Core 4's failure to pass this vanity metric test reveals a deeper issue: the consulting industry's incentive to create complex, proprietary solutions to sell rather than helping organizations actually improve. The metrics that work in Core 4 already existed in DORA and SPACE, and the new metrics it introduces are either vanity metrics that feel good but don't drive improvement. Even worse, some of the additions Core 4 makes are actively harmful and encourage the wrong behaviors.
Focus on the proven metrics from established frameworks and invest your time and energy in improving development workflows. Take the actions that reduce friction, improve automation, and enhance developer experience. The metrics will follow.
The goal isn't to measure developer productivity perfectly; it's to improve it continuously. That happens through action, not analysis. Stop measuring developer productivity like a black box. Start treating it like the strategic capability it is, and focus on making real improvements that benefit both your developers and your business. The rest is just consultant theater.