All articles

Let The Agents Cook. But First, Secure The Infrastructure.

Docker Developer Relations Engineer Oleg Šelajev on his approach to risk management and securing data infrastructure in the era of MCP.

Make The Read Replica one of your go-to sources on Google

The amount of work you can do if you let the agent cook by itself is incredible. But you can't be sitting there approving every action. That doesn't scale.

Oleg Šelajev

Developer Relations Engineer

Docker

AI agents are enterprise-ready. But are enterprises agent-ready? Every engineer and architect, from the CTO down, has gotten the productivity memo: let the agents cook.

"Every CTO and CEO is pushing the AI-first narrative. You have KPIs for how many tokens you burn a week. And then at the same companies, the security teams are screaming and pulling hair because they don't fully understand how to manage the risks," said Oleg Šelajev, Developer Relations Engineer at Docker.

Šelajev works on Docker's AI sandboxes initiative, MCP Toolkit, and Compose for Agents. Previously, he was the founding Head of DevRel at AtomicJar, the company behind Testcontainers, leading the function from post-seed through Docker's 2023 acquisition. He's also a Java Champion and a Microsoft MVP, and he is currently dogfooding the isolation tooling he ships.

"Literally nobody has fully solved agentic risk management, as evidenced by the fact that the frontier AI labs don't even have an answer yet themselves."

Start from the bottom (or don't start at all)

Šelajev's framework for thinking about agent security is architectural, and it starts lower in the stack than most enterprises are looking. "Start from the bottom where the physical limitations of the systems agents run on," he said. "My first stop is ensuring proper sandboxing. Then go up the stack and start thinking about which data sources you want to connect." Physical isolation first, then data source governance, then monitoring, then centralized protocol governance at the top. Skip a layer and you're building on sand.

From there, it's a triage problem: what do you connect first, and what's the blast radius if something goes wrong? "If my agent goes rogue and just wrecks my machine, it's not a fatal problem. If it finds my AWS keys and goes to production and switches off some services, that's still relatively manageable. But if it leaks my customer data, connects to my financial systems, and bets everything on Bitcoin, now that's very, very bad."

Šelajev's logic is to start where failure is embarrassing, not catastrophic, and build trust incrementally as you work up the stack. "We're not coming back to typing code character-by-character," he said. But the speed at which agents ship doesn't mean the security posture can skip steps.

"If you want to run autonomous agentic systems right now with your live data, without human oversight, you will be running a security risk. It's a hard problem that is probably here to stay for at least a few months, even in this rapid AI timeline." Cisco's data from RSA Conference 2026 frames the same gap from the vendor side: 85% of enterprise customers experimenting with AI agents, just 5% in production. The distance between those numbers is the governance problem. Another report from API management company Gravitee found that 88% of organizations reported confirmed or suspected AI agent security incidents in the past year. Only 14.4% deployed agents with full security or IT approval, and yet 82% of executives reported confidence that their existing policies were sufficient.

When data becomes code

The security models enterprises rely on were built for a clean separation: codebases are ever-evolving targets, data is inert. You review the code, you sandbox the code, and the data just sits until called by the code. AI agents have collapsed that distinction by putting data on the move and calling it into build cycles. "Now more than ever, data is code," Šelajev said. "People are uploading rogue CSV files with eval instructions to AI agents, and just evaluating that, the agent is fully compromised."

Once an agent ingests external input like an email, a support ticket, or a webhook payload, the content becomes part of the reasoning chain, and the reasoning chain drives action. What happens in between is largely invisible. "If you connect your internal Grafana instance, for example, you're probably running a lower risk of prompt injection than if you connect your email system. One raw email and your agent will do whatever it's told."

In March, Claude shipped its channels feature and Claude Cowork outside of the IDE, making it easier to pipe third-party data into an agent that already has significant power on your machine. "I would do that very, very carefully," Šelajev said.

Where MCP fits in the governance stack

If nobody's watching your agents, the burden falls to the infrastructure itself. For enterprises building agent workflows at scale, that governance layer is increasingly MCP. For individual developers, deterministic CLI tools often make more sense. They're faster and the security surface is smaller. "For software development workflows and for agents they run locally, command line tools are probably superior," Šelajev said.

The enterprise problem is different. Hundreds of devs connecting agents to corporate systems need centralized visibility, enforceable policy at the connection layer, and audit trails that survive an incident review. CLI scripts don't give you that. "MCPs are much better at enforcing enterprise rules, being more auditable and more observable." The alternative is what he calls "a CISO's nightmare": every developer installing skills from random repositories with no central oversight.

The MCP Dev Summit happened this week in New York, with maintainers from Anthropic, AWS, Microsoft, and OpenAI laying out an enterprise security roadmap. Microsoft published its internal governance playbook. And security researchers have catalogued over 7,000 internet-exposed MCP servers, roughly half of all known deployments, many with no authorization controls at all. The protocol gives security teams what they need: audit trails, centralized policy enforcement, and a way to govern what agents can connect to. Whether enterprises deploy it with discipline or let it sprawl ungoverned is the open question.

Is your sandbox really sandboxed?

With the data layer as a new attack surface, the question becomes whether current security tooling can handle it. On macOS, some coding agents use a system feature like seatbelt to restrict access. Šelajev isn't impressed. "It's completely impractical and inadequate for long-running agentic coding tasks because it's severely limiting what you can do without actually providing you strong guarantees. It gives you the illusion of security, which is even worse."

The problem runs deeper than any single implementation. Some agent frameworks allow the agent to modify its own configuration, including the rules about what it can and can't do. "It makes security guardrails a recommendation instead of an enforceable set of rules," Šelajev said. Permission prompts aren't much better. Anthropic's own data on Claude Code's auto mode (an ostensibly safer evolution of --dangerously-skip-permissions) shows that users approved 93% of permission prompts. At that rate, it's nothing more than a speed bump everyone runs over.

The NanoClaw story made the gap tangible. Developer Gavriel Cohen built it as a weekend project after discovering that OpenClaw had stored all his WhatsApp messages in unencrypted plaintext on his local machine. It collected 22,000 GitHub stars in weeks. Šelajev worked on integrating it with Docker Sandboxes, which run each agent in a dedicated microVM with its own kernel and network stack. Container-level isolation wasn't enough. "The amount of work you can do if you let the agent cook by itself is incredible," Šelajev said. "But you can't be sitting there approving every action. That doesn't scale."

The infrastructure underneath

Every team wants to let their agents cook. Few have built the infrastructure to do it safely. The FOMO is running in both directions: leadership afraid of falling behind, individual contributors afraid everyone else is shipping faster. Once people experience what autonomous agents can do, "it's going to take some convincing to make them abandon that. It's hard to put the genie back in the bottle."

The enterprises that build security from the infrastructure up, isolation at the base, enforcement at the data layer, governance at the protocol level, are the ones that won't end up as the cautionary tale. Platforms building RLS, scoped permissions, and audit logging into architecture by default are positioned for a world where the application layer can no longer be trusted to enforce its own rules. The ones waiting for someone else to solve it are, in Šelajev's words, "one major incident from being an example for everyone else."

The signal, once a week

Reporting, contributor perspectives and sharp notes from the people building with Supabase in the real world. No noise, no spam.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Side-Stepping The Data Movement Tax: How Fragmented Data Stacks Are Converging Back Onto Postgres

Agent Workloads Are Just Beginning To Expose What Row-Level Security Really Costs

Repack Took a Decade To Graduate Into Postgres Core. The Features Still Waiting Could Take Longer.

OWASP Checked Its AI Risk Rankings Against 6,600 Real Incidents And The Surprises Are All Below The Model

Postgres' Logical Replication Stream Gets A Second Life As A Cache Invalidation Engine

'There Is No Boss': How Postgres Decides Its Future In Public

Everyone Is Handing AI Agents The Keys To Production Postgres. POSETTE’s Sharpest Talks Were About The Brakes.

The Best Lesson From POSETTE’s Vendor Day: When Your Postgres Feels Slow, It's Rarely Postgres

What A Morning With Postgres’ Maintainers Reveals About How The Database Really Gets Made

POSETTE '26: Postgres Has Stopped Trying To Prove It Belongs And Started Absorbing The Stack Around It

How Postgres' Rise To Enterprise Default Is Outpacing The Operational Model Behind It

What Happens To Agent Memory When You Swap The Model? Two Practitioners Are Building The Answer In The Database Layer.

The Line Between The Database And AI Memory Layer Is Getting Blurry. The Trick Is Making Sure Your Schemas Are Not Paying For It.

To Stop AI Amplifying Security Issues, Some Experts Are Grounding Security In Hardware

How a Laid-Off Atlassian Engineer's Sovereign Breakdown Started a Timer On Every Vibe-Coded Codebase

Authorization Has Always Been Hard, and a New Generation of Builders Is Discovering Why

What Does a Compiler Look Like When Its Audience Isn't Human? Vercel's Zero Just Soft-Launched the Answer

After pgBackRest's Close Call, Postgres Shops Are Drafting New Runbooks For Continuity

pgBackRest Is Back, But Open Source Has a Stewardship Problem It Can't Keep Ignoring

AI-Generated Pull Requests Are Crashing Postgres Instances Daily, And the Only Way Through Is Architectural

Decoding 'Architecture As An Algorithm' and Production AI Explainability In Regulated Industries

The Case Of 'A Billion APIs': Overcoming AI-Induced Monolithic Tech Debt No One Can Maintain

Tell One AI Model Another Will Review Its Work, and Output Quality Jumps 1.5x

Old-School Speccing Makes a Comeback As Devs Find The Limits of Natural Language LLM Interactions

How Mississippi Built The Nation's First Statewide AI Initiative For Education And Workforce Readiness

From Banking to Blackboard: An Engineer's Case For CI/CD-Level Governance Of AI Code

Why The Success Of AI In Regulated Industries Depends On Compliance-As-Architecture

AI Parallelized Biomedical Research, But The Verification Layer Needs To Match The Throughput

Grainger's MarTech Lead On How Disciplined AI Governance Prevents Expensive Shelfware

You Can't 'Write Off' AI QA Tax at the Read/Write Layer, You Can Only Move When The Bill Comes Due

What Happens When Read-Only Access is the Only AI Guardrail That Actually Holds?

When AI Agents Go 'God Mode,' the Security Perimeter Must Move to the Database

One Layer Away, Always: A Framework for AI in Production Data Systems

You Can't Vibe Code A Payments Platform, But Even Regulated Industries Are Rethinking Writing By Hand

AI Readiness Starts With the Queries Nobody Bothered to Optimize

Agents Collapsed The Wall Between Analytics And Operations, And Postgres Is What's On The Other Side

A Financial Service Engineer's Search For Provability Amid The MCPification Of Everything

You Can't Outrun Governance, but Reducing Hurdles to Production-Readiness Starts With Condensing AI Pipelines