The hidden costs of pre-computing data | Chalk's Elliot Marx

Plus, Google’s new moat, talking to spreadsheets, and the 2026 Benchmarks.

Dec 09, 2025

Is your engineering team wasting budget and sacrificing latency by pre-computing data that most users never see? Chalk co-founder Elliot Marx joins Andrew Zigler to explain why the future of AI relies on real-time pipelines rather than traditional storage. They dive into solving compute challenges for major fintechs, the value of incrementalism, Elliot’s thoughts on and why strong fundamental problem-solving skills still beat specific language expertise in the age of AI assistants.

Recorded live at the Engineering Leadership Conference.

1. OpenAI’s “Code Red” moment

OpenAI CEO Sam Altman has declared a “code red” as Google Gemini 3 surges ahead in performance benchmarks, reportedly cancelling planned marketing initiatives in response. Google is finally demonstrating the power of incumbency, proving that their moat doesn’t come from the model alone, but from owning the entire stack. This ranges from the TPUs training the system to the distributed cloud network serving the inference. For engineering leaders, the takeaway is agility: the market is volatile, so design your workflows to be model-agnostic. Don’t get locked in when the leaderboard changes every week.

Read: Microsoft’s advantages in artificial intelligence evaporate — Google Gemini surges ahead, and OpenAI declares “code red” situation

2. I just want to talk to my spreadsheet

Google has quietly launched Workspace Studio, a platform for creating no-code AI agents that automate workflows across Docs, Sheets, and Drive. While much of the industry focuses on AI for coding, this represents the shift in general knowledge work. It is time to start thinking about how agents can handle the mundane toil of your inbox and data entry, freeing you up for the deep work.

Read: Create AI agents to automate work with Google Workspace Studio

3. Stop measuring engineering like a factory

Many common metrics like velocity, commit frequency, and lines of code are totally disconnected from business outcomes. In this reality check, James J. Boyer argues that you cannot operate software engineering like an assembly line; moving faster doesn’t matter if you ship the wrong things. Instead of adopting a generic framework, leaders should use metrics to identify specific friction points, bottlenecks, and feedback loops. Use data to make the system legible, not just to generate a report for executives who want to see a “return on investment” without understanding the context.

Read: Field Notes From the Efficiency Era, Part 3: Metrics That Actually Matter

Engineering Leadership Bites

Field Notes From the Efficiency Era, Part 3: Metrics That Actually Matter

Part three of a four-part series on what the Efficiency Era demands—and what it exposes…

2 months ago · 4 likes · James J. Boyer

4. Join the 2026 Benchmarks Roundtable

Investing in AI but seeing mixed results? You aren’t alone. Our latest data shows that while some tools like Devin are seeing rising acceptance rates, others like Copilot are slipping.

Join us live on December 10th or 11th for an exclusive virtual workshop. We’ll be debuting the 2026 Software Engineering Benchmarks Report, featuring 20+ metrics and 3 new AI insights. Come for the data, stay for the strategy session with CircleCI CTO Rob Zuber and Apollo GraphQL SVP Smruti Patel. Sign up today and the full report will be delivered right to your inbox.

Get the report

5. From “vibes” to engineering: A guide to Evals

If you are building agentic systems, Hamel Husain is required reading. His latest guide outlines the transition from subjective, “vibe-based” development to structured error analysis. If you don’t understand the reasoning and the documents your LLM pulled, it is just a black box in a black box. The guide distinguishes between deterministic challenges (which code solves best) and subjective ones (where an LLM-as-judge fits). Crucially, when using an LLM as a judge, aim for binary pass/fail judgments rather than arbitrary scoring scales to keep your evals actionable.

Read: A pragmatic guide to LLM evals for devs

The Pragmatic Engineer

A pragmatic guide to LLM evals for devs

One word that keeps cropping up when I talk with software engineers who build large language model (LLM)-based solutions is “evals”. They use evaluations to verify that LLM solutions work well enough because LLMs are non-deterministic, meaning there’s no guarantee they’ll provide the same answer to the same question twice. This makes it more complicated to verify that things work according to spec than it does with other software, for which automated tests are available…

2 months ago · 222 likes · 10 comments · Gergely Orosz and Hamel Husain

6. The unsung hero of infrastructure

In an industry obsessed with the new and shiny, Lalit Maganti, a Senior Staff Engineer at Google, makes the case for ignoring the spotlight. He describes how long-term stewardship of infrastructure and developer tools leads to compounding returns and deep technical impact, often far greater than hopping between product launches. By building deep expertise and ownership of core systems, you earn the social capital to say “no” to bad ideas and protect organizational quality. It is a reminder that promotion often comes from peer testimonials and sustained utility, not just executive visibility.

Read: Why I Ignore The Spotlight as a Staff Engineer

The AI Architect

Dec 9

Great point abot real-time pipelines vs pre-computed storage. Most teams optimize for the 90th percentile use case and waste massive compute on data that never gets accessed. The latency vs compute tradeoff is interesting, tho, since real-time lookups can bottlneck during peak load if you're not careful. Curious how Chalk handles caching strategies when features need millisecond-level consistency but you still want to avoid redundant computation.

Discussion about this post

Ready for more?