All articles

One Layer Away, Always: A Framework for AI in Production Data Systems

Anton Krupnov, Senior Data Engineer at DoorDash, on the architectural principle that keeps AI one layer removed from production code, production data, and production decisions.

Credit: The Read Replica

Make The Read Replica one of your go-to sources on Google

AI, in a way, tests how well your team and organization control your product. For teams who have good tests and a culture of code review, bringing AI just accelerates the product and they still have control. For chaotic teams, the AI brings mess and problems.

Anton Krupnov

Senior Data Engineer

DoorDash

Every data engineering team is having the same argument right now. AI tools are fast, available, and increasingly capable, but the question of which ones should touch production systems, and how, remains largely unsettled. The teams getting it right aren't the ones adopting fastest. They're the ones drawing the sharpest boundaries.

Anton Krupnov has a specific framework for where those boundaries should be. A senior data engineer with 20 years of experience in Java and distributed systems at some of the most impactful companies of the last decade, Krupnov has built backend infrastructure at Airbnb, Pinterest, and most recently at DoorDash, where he works on data pipelines running across Airflow, Spark, Snowflake, Databricks, and Delta Lake. His scope spans more than 60 pipelines processing over five terabytes of data daily. It's the kind of environment where a bad deployment doesn't just break a feature, it corrupts the data that downstream teams depend on for decisions.

His rule is deceptively simple: AI should always operate one layer removed from production. Not writing the code that ships. Not accessing the data that matters. Not making decisions that can't be safely undone. Applied consistently across code generation, data access, and testing, it produces an architecture that's both more aggressive and more disciplined than what most teams are running.

The tool that builds the tool

The most specific thing Krupnov describes is a code generation workflow that inverts how most teams think about AI-assisted development.

His team built an internal tool that generates data pipelines. When they wanted to scale the number of pipelines, they used AI to improve the code generation tool itself, not to write the pipeline code directly. Then they manually validated every step of one prototype pipeline, confirmed it behaved correctly, and let the improved tool scale that validated pattern across dozens of additional pipelines with different parameters.

"We didn't deploy AI-generated code," Krupnov told The Read Replica. "We deployed code generated by a tool that was generated by AI."

The distinction sounds semantic. It isn't. In Krupnov's architecture, AI never touches production output directly. It operates on the tooling layer for improving the generator, not the generated artifact. The prototype pipeline is still human-validated. The scaled pipelines are regenerated on every run, with all AI-introduced changes visible and auditable against the prototype.

"AI helps with repetitive tasks, with routines," Krupnov said. "We were still controlling the prototype. But we weren't controlling every other pipeline individually. We were confident because each one was regenerated from the same validated pattern."

Krupnov's pattern is an architectural response to exactly that problem. He's using AI to scale validated patterns across pipelines, which is a fundamentally different task where the value isn't speed but reach.

Meta's engineering team recently arrived at a strikingly similar architecture. In an April 2026 blog post, they described building a swarm of more than 50 specialized AI agents to map tribal knowledge across a large-scale data processing pipeline spanning four repositories and over 4,100 files. The AI didn't write pipeline code. It produced structured context files like navigation guides that encoded the undocumented conventions, naming patterns, and cross-module dependencies that previously existed only in engineers' heads. Preliminary tests showed 40% fewer AI agent tool calls per task and complex workflow guidance that previously took two days of research completing in about 30 minutes.

Same structural move: AI improves the layer that supports production, not production itself.

The resilience test

What makes Krupnov's framework more than a personal preference is his theory about why it matters organizationally. He argues that AI tools function as a stress test for engineering discipline not because the tools are dangerous, but because they amplify whatever process quality already exists.

"AI tests how good your team and your organization are at controlling your product," Krupnov said. "For teams that already had good tests and a culture of code review, bringing in AI just accelerated everything. They still had their controls. For chaotic teams, AI brings mess and problems. The AI itself is a test of your resilience as a team, as a company."

A peer-reviewed study from NYU's Center for Cybersecurity, published in ACM Transactions on Software Engineering and Methodology, analyzed Copilot-generated code in real GitHub repositories and found that roughly 30% of snippets contained security weaknesses across 43 different CWE categories including eight from the CWE Top 25. Independent testing through 2026, across newer models and broader benchmarks, has consistently landed in the same range: between 25% and 45% depending on the model and methodology. Models are getting dramatically better at writing syntactically correct code. They're not getting better at writing secure code.

For well-controlled teams with automated security scanning in CI/CD, rigorous code review, and established testing cultures, AI-generated code gets caught before it ships. For teams without those controls, AI scales the same vulnerabilities at the same rate it scales features.

Krupnov's framework says: figure out which kind of team you are before you figure out which AI tools to adopt.

The line that can't be crossed

The sharpest boundary in Krupnov's framework is around data access. His team runs an internal AI agent that has read-only access to their knowledge base, including documentation, onboarding instructions, broad technical questions. He described it as genuinely useful.

"I don't have to search documentation anymore," Krupnov said. "I don't have to go to different people and ask. The AI agent can analyze what a document is about and provide good suggestions, even when the documentation itself isn't great."

But the agent can't write to production data, modify it, or access it beyond what's in the knowledge base. Krupnov drew that line without ambiguity. "I don't know of any case where AI accesses production data and changes it," he said. "To me, nothing sounds more dangerous."

The industry data suggests he's right to be cautious. MCP has seen explosive adoption, but security hasn't kept pace. A scan reported by Dark Reading discovered 1,862 MCP servers connected to the public internet, almost all without authentication. OWASP now classifies shadow MCP deployments where employees are connecting AI agents to production systems without security review, IT oversight, or procurement process — as a Top 10 MCP risk category.

The financial cost of getting this wrong is already measurable. IBM's Cost of a Data Breach Report found that shadow AI was a factor in 20% of breaches, adding an average of $670,000 to breach costs. Of the organizations that reported AI-related security incidents, 97% lacked proper AI access controls.

Krupnov's read-only, knowledge-base-only model employs the principle of least privilege applied to AI specifically, only for the time it's needed, and never "just in case."

Monkey testing 2.0

If Krupnov's framework is about keeping AI out of production code and away from production data, testing is where it gets to prove its value. His reasoning follows the same logic that governs the rest of his architecture. It's about the failure mode.

"Testing is where AI works really well," Krupnov said. "You can tell AI what the expectations are and ask it to generate tests. If a test fails, you can check whether it's a problem in your system or a problem in the test. Worst case, the test is useless. That's not a big deal."

The contrast with production code is the point. A useless test costs nothing. A shipped vulnerability costs everything from remediation time to regulatory penalties. The failure mode determines where AI belongs.

Krupnov connects this to an older idea. "There was a concept maybe ten years ago of 'monkey testing', where you do random things in different parts of the system and see how it goes," he said. "I think AI is perfect for this."

He's referencing a lineage that runs through Netflix's Chaos Monkey and the broader chaos engineering movement of deliberately introducing failures to validate system resilience. That lineage has found new life in AI-powered testing tools. Community extensions like Gremlins Forge now use large language models to generate semantically valid attack instances rather than purely random inputs. It aids with understanding login flows, payment forms, and shopping carts to dismantle them in realistic ways. It's the monkey testing concept, but with comprehension.

Avoiding what breaks next

The same "one layer away" principle applies to the infrastructure underneath. Managed Postgres platforms now offer database branching in isolated environments with separate API credentials that spin off from production, letting teams experiment and test without touching live data. When the experiment validates, changes merge back. When it doesn't, nothing in production was ever at risk. It's the same architectural boundary Krupnov draws in code, applied to the data layer.

Krupnov is arguing that the cost of imperfection is fundamentally different in testing than in production. A bad test reveals either a bug in your system (useful) or a bug in the test itself (harmless). A bad deployment reveals itself in your customers' data.

"I think AI is going to get access to places where it shouldn't," Krupnov said. "And AI is going to leak a lot of sensitive data." His prescription is the same principle that runs through everything else: minimize access, bound it in time, and never grant it just in case. AI didn't create the need for engineering discipline. It just made the cost of not having it impossible to ignore.

The signal, once a week

Reporting, contributor perspectives and sharp notes from the people building with Supabase in the real world. No noise, no spam.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Global Indexes, A Reversed Commit, And A Dead Extension Model: How Postgres Features Actually Reach Users

Side-Stepping The Data Movement Tax: How Fragmented Data Stacks Are Converging Back Onto Postgres

Agent Workloads Are Just Beginning To Expose What Row-Level Security Really Costs

Repack Took a Decade To Graduate Into Postgres Core. The Features Still Waiting Could Take Longer.

OWASP Checked Its AI Risk Rankings Against 6,600 Real Incidents And The Surprises Are All Below The Model

Postgres' Logical Replication Stream Gets A Second Life As A Cache Invalidation Engine

'There Is No Boss': How Postgres Decides Its Future In Public

Everyone Is Handing AI Agents The Keys To Production Postgres. POSETTE’s Sharpest Talks Were About The Brakes.

The Best Lesson From POSETTE’s Vendor Day: When Your Postgres Feels Slow, It's Rarely Postgres

What A Morning With Postgres’ Maintainers Reveals About How The Database Really Gets Made

POSETTE '26: Postgres Has Stopped Trying To Prove It Belongs And Started Absorbing The Stack Around It

How Postgres' Rise To Enterprise Default Is Outpacing The Operational Model Behind It

What Happens To Agent Memory When You Swap The Model? Two Practitioners Are Building The Answer In The Database Layer.

The Line Between The Database And AI Memory Layer Is Getting Blurry. The Trick Is Making Sure Your Schemas Are Not Paying For It.

To Stop AI Amplifying Security Issues, Some Experts Are Grounding Security In Hardware

How a Laid-Off Atlassian Engineer's Sovereign Breakdown Started a Timer On Every Vibe-Coded Codebase

Authorization Has Always Been Hard, and a New Generation of Builders Is Discovering Why

What Does a Compiler Look Like When Its Audience Isn't Human? Vercel's Zero Just Soft-Launched the Answer

After pgBackRest's Close Call, Postgres Shops Are Drafting New Runbooks For Continuity

pgBackRest Is Back, But Open Source Has a Stewardship Problem It Can't Keep Ignoring

AI-Generated Pull Requests Are Crashing Postgres Instances Daily, And the Only Way Through Is Architectural

Decoding 'Architecture As An Algorithm' and Production AI Explainability In Regulated Industries

The Case Of 'A Billion APIs': Overcoming AI-Induced Monolithic Tech Debt No One Can Maintain

Tell One AI Model Another Will Review Its Work, and Output Quality Jumps 1.5x

Old-School Speccing Makes a Comeback As Devs Find The Limits of Natural Language LLM Interactions

How Mississippi Built The Nation's First Statewide AI Initiative For Education And Workforce Readiness

From Banking to Blackboard: An Engineer's Case For CI/CD-Level Governance Of AI Code

Why The Success Of AI In Regulated Industries Depends On Compliance-As-Architecture

AI Parallelized Biomedical Research, But The Verification Layer Needs To Match The Throughput

Grainger's MarTech Lead On How Disciplined AI Governance Prevents Expensive Shelfware

You Can't 'Write Off' AI QA Tax at the Read/Write Layer, You Can Only Move When The Bill Comes Due

What Happens When Read-Only Access is the Only AI Guardrail That Actually Holds?

When AI Agents Go 'God Mode,' the Security Perimeter Must Move to the Database