All articles

Six Agents, One Pull Request, and the Emerging Art of 'Skills-as-Infrastructure'

Snap engineering manager Jeffrey Lee-Chan joined The Read Replica to discuss his six-agent AI system to automate code reviews, and the future of AI-native enterprise codebases.

Credit: The Read Replica

Make The Read Replica one of your go-to sources on Google

It starts with letting go of your code to the point where the time to review shortens... Eventually you find yourself asking, 'How do I fully parallelize now that things are more efficient and I've orchestrated my own trust?'

Jeffrey Lee-Chan

Engineering Manager

Snap

For most of software engineering's history, skill lived solely within the engineer. You learned how a system worked, built intuition for where bugs hid, developed judgment about what to review closely and what to trust. The tools and frameworks were silent abstractions of underlying binary switches, and the human was the primary skill layer.

AI coding tools have undeniably inverted that assumption. AI-enabled engineers experiment each day to find the line where humans stop being the ones with the skills, and instead architect the systems that hold the skills in quasi-deterministic natural language.

Jeffrey Lee-Chan doesn't review his own code anymore. Or rather, he does, but not the way he used to.

Lee-Chan has spent nearly two decades building software at scale. He spent a decade at Google, where he led engineering on YouTube's notification systems, social features, and creator monetization platform. For the past seven years, he's managed engineering at Snap, where his growth team is responsible for notifications, activity feeds, and communities on a platform that processes over four billion daily interactions.

Instead of reading through pull requests line by line, Lee-Chan has devised a system to send them through five separate AI code review agents, each configured to catch different categories of issues. Then he sends the results to a sixth.

"Even with five agentic code reviewers, I still notice some gaps," he told the Read Replica in a recent interview. "I have a sixth and final agent that reviews all of the others and then makes a final decision that is ultimately surfaced to me." It's a system that keeps the human in the loop feeling in control, but also thrives on a gradual relinquishment of control as trust is built over time and models improve. And when the best models can be run relatively inexpensively, very few blockers remain in the way of parallel multi-agent productivity systems.

Adversarial by design

The new question in developer productivity isn't about if AI can write code, or even if it can write it well. The subjective concept of "good code" is almost an afterthought when the task at hand calls for brute force productivity in a non-sensitive environment that is likely to be refactored later. The bigger issue is the trap of false positives from models that aim to please.

Lee-Chan's architecture is adversarial by design. He draws a comparison to writing with AI: "You might say, 'I've written the most beautiful essay ever!' And Claude will tend to agree with me. But in a moment of self-aware reflection I remember, 'Well when was the last time you disagreed with me?'"

The tension is the same in coding. One solution is to put the work in front of a second AI, in a separate context window, with an explicit mission to critique. "If you don't have multi-agent critical feedback in your coding, it's going to sneak in some bugs, for sure."

Lee-Chan's meta-agent, the one that rules the other five, is fully custom. Its prompt reads everything in the pull request including every comment from every other reviewer, and critically, the stated goal of the PR itself. The result is something that would have been heresy two years ago: he now prefers AI code reviewers to human ones.

"When it's set up right, it'll find bugs that a human would never find, and reply a lot faster." He still sees a role for human judgment, but it's narrowing. "Humans should ideally be more focused on what's the purpose of what you're doing. As time goes on, that will evolve into less and less of the actual code."

The measurement problem no one has solved

On the subject of subjectivity, the measurement of developer output has never benefited from consensus.

According to a 2025 JetBrains developer survey, 66% of developers don't believe current productivity metrics reflect their actual contributions. Another 2025 DORA Report found that AI tools create a paradox where an improvement in code quality was met with a near-equal reduction in delivery stability. And a randomized controlled trial by METR found that experienced open-source developers were about 20% slower when using AI coding tools despite predicting beforehand they would be about 25% faster.

Lee-Chan thinks the industry's obsession with measuring AI-driven productivity is asking the wrong question entirely.

"I don't think anyone has measurement solved," he said. "And even before AI, it was pretty hard to solve." Some organizations are tracking token consumption. Others are counting pull requests. Lee-Chan thinks both approaches miss the point.

"Some leaders default to measuring how many tokens someone uses. But that only means someone is doing 'something', but it doesn't necessarily mean they're getting a good outcome," he said. "Pretty much every engineer I've talked to at companies of all sizes is unsure how to measure what true productivity looks like."

A maturity model for letting go

Lee-Chan described AI adoption as a ladder, with each rung representing a progressively deeper shift in how engineers relate to their code. The first rungs are familiar: autocomplete, then code assist, then allowing AI to generate code that you still review. But the rungs get harder.

"It starts with letting go of your code to the point where the time to review shortens. Then you parallelize, with multiple agents running at the same time. Then you let them run autonomously for more than a few hours," he said. "From there you optimize that codebase to make it more AI-friendly, and eventually you find yourself asking, 'How do I fully parallelize now that things are more efficient and I've orchestrated my own trust?'"

But the top of the ladder is a step that even he finds psychologically difficult: letting agents make permanent code changes without human review. "Even for me, as someone who experiments a lot, it's hard for me to let go even when the consequences are low."

The psychological difficulty points to something structurally important. If the industry is heading toward agent-authored and agent-reviewed code, even the most robust systems and bravest engineers will reach a trust plateau beyond which returns diminish.

The rewrite argument

"Honestly, I think many legacy codebases should be rewritten and modernized now, because AI has made it possible," he said. Before AI, a full codebase rewrite was a multi-quarter bet that leadership rarely approved. The labor costs were prohibitive, the risks were high, and the pragmatic answer was always to keep patching.

Lee-Chan argued the economics have inverted. AI can handle the grunt work of a rewrite, and the cost of not rewriting, or maintaining a codebase that agents can't effectively operate in, is climbing.

"You can section off parts of the codebase where there's relative independence. You can get that agent-friendly first, then iterate on the rest."

The hardest cases are the ones every enterprise knows well: legacy systems that nobody fully understands, he said. "Big old legacy codebases that most people don't even understand are the most challenging ones for sure." But even for those, he sees a floor. The early rungs of his maturity ladder, such as autocomplete, basic code assist, are accessible regardless of codebase quality. "No matter how ugly or old a codebase is, I think you can get to that basic stage rather quickly," he said, even if the higher rungs, where agents operate autonomously, require a clean foundation.

The infrastructure the agents need

The rewrite argument carries an implication that extends beyond the application layer: if you're rebuilding a codebase for autonomous agents, the data layer underneath it matters enormously.

This is a challenge the industry is actively confronting. Snap's own engineering blog recently documented a major migration of its data orchestration platform, moving from a monolithic single-cluster architecture to a tiered multi-cluster system. The post describes familiar growing pains at scale: Postgres as a metadata database becoming a scaling bottleneck, blast radius concerns when a single misconfiguration could affect thousands of data pipelines, and the operational burden of systems that weren't designed for the pace at which they now need to change. After the migration, a production incident that would have been catastrophic under the old architecture was contained in hours.

These are the same structural problems that multiply when agents are the ones reading, writing, and deploying code. An agent doesn't intuitively know which database tables contain sensitive data. It doesn't default to RLS. It doesn't understand blast radius unless the architecture enforces it.

The platform ecosystem is beginning to respond. In January 2026, Supabase released Agent Skills as part of its Postgres best practices. It's a set of ~30 rules across eight categories designed to teach AI coding agents to write correct Postgres code. The skills cover query performance, schema design, connection management, and security, including RLS policy implementation. They work across Claude Code, Cursor, GitHub Copilot, and other AI coding tools, and activate automatically when an agent encounters a relevant task.

The approach represents a shift in thinking about developer tooling: instead of relying on human engineers to catch agent mistakes after the fact, you embed the knowledge into the agent's workflow before it writes the first line. Skills-as-infrastructure, rather than review-as-infrastructure.

It's a pattern that the data layer may be uniquely positioned to pioneer. When 20 documented AI app data breaches since January 2025 trace back to the same root causes, the argument for building guardrails into the platform rather than bolting them on becomes hard to ignore.

From code assist to idea assist

Lee-Chan sees one more shift on the horizon that most organizations are under-indexing on.

"I often find myself asking coding agents the simple question of, 'How would you solve this?' And it will come up with some idea to perhaps monitor some things that I never thought of monitoring. Great. Let's just do that!"

He frames this as a transition from "code assist" to "idea assist", and ultimately, to a world where the AI is the idea driver, not the human. "Every time I see a glimpse of it, I'm pleasantly surprised. My advice for other engineers is to push AI as far as you can now, so each time the technology improves, I've got something ready to go."

The developers and engineering leaders who are building the review architectures, agent-friendly codebases, and the data layers that enforce security by default are the ones who won't be waiting when the next step change arrives. But the ones who are still trying to measure productivity via tokens might be.

To Stop AI Amplifying Security Issues, Some Experts Are Grounding Security In Hardware

How a Laid-Off Atlassian Engineer's Sovereign Breakdown Started a Timer On Every Vibe-Coded Codebase

Authorization Has Always Been Hard, and a New Generation of Builders Is Discovering Why

What Does a Compiler Look Like When Its Audience Isn't Human? Vercel's Zero Just Soft-Launched the Answer

After pgBackRest's Close Call, Postgres Shops Are Drafting New Runbooks For Continuity

pgBackRest Is Back, But Open Source Has a Stewardship Problem It Can't Keep Ignoring

AI-Generated Pull Requests Are Crashing Postgres Instances Daily, And the Only Way Through Is Architectural

Decoding 'Architecture As An Algorithm' and Production AI Explainability In Regulated Industries

The Case Of 'A Billion APIs': Overcoming AI-Induced Monolithic Tech Debt No One Can Maintain

Tell One AI Model Another Will Review Its Work, and Output Quality Jumps 1.5x

Old-School Speccing Makes a Comeback As Devs Find The Limits of Natural Language LLM Interactions

How Mississippi Built The Nation's First Statewide AI Initiative For Education And Workforce Readiness

From Banking to Blackboard: An Engineer's Case For CI/CD-Level Governance Of AI Code

Why The Success Of AI In Regulated Industries Depends On Compliance-As-Architecture

AI Parallelized Biomedical Research, But The Verification Layer Needs To Match The Throughput

Grainger's MarTech Lead On How Disciplined AI Governance Prevents Expensive Shelfware

You Can't 'Write Off' AI QA Tax at the Read/Write Layer, You Can Only Move When The Bill Comes Due

What Happens When Read-Only Access is the Only AI Guardrail That Actually Holds?

When AI Agents Go 'God Mode,' the Security Perimeter Must Move to the Database

One Layer Away, Always: A Framework for AI in Production Data Systems

You Can't Vibe Code A Payments Platform, But Even Regulated Industries Are Rethinking Writing By Hand

AI Readiness Starts With the Queries Nobody Bothered to Optimize

Agents Collapsed The Wall Between Analytics And Operations, And Postgres Is What's On The Other Side

A Financial Service Engineer's Search For Provability Amid The MCPification Of Everything

You Can't Outrun Governance, but Reducing Hurdles to Production-Readiness Starts With Condensing AI Pipelines

Let The Agents Cook. But First, Secure The Infrastructure.

The Traditional OLTP Playbook Breaks RAG. Here's What Replaces It.

Same IDE, Two Developers: One Shipped A Month Of Front-End In An Afternoon. The Other Wiped The Drive.