Let The Agents Cook. But First, Secure The Infrastructure.
Docker Developer Advocate Oleg Šelajev on why your data is code now, your sandbox might not be real, and MCP is becoming the enterprise governance layer.

The amount of work you can do if you let the agent cook by itself is incredible. But you can't be sitting there approving every action. That doesn't scale.
AI agents are enterprise-ready. But are enterprises agent-ready? Every engineer and architect, from the CTO down, has gotten the productivity memo: let the agents cook.
"Every CTO and CEO is pushing the AI-first narrative. You have KPIs for how many tokens you burn a week. And then at the same companies, the security teams are screaming and pulling hair because they don't fully understand how to manage the risks," said Oleg Šelajev, Developer Relations Engineer at Docker.
Šelajev works on Docker's AI sandboxes initiative, MCP Toolkit, and Compose for Agents. Previously, he was the founding Head of DevRel at AtomicJar, the company behind Testcontainers, leading the function from post-seed through Docker's 2023 acquisition. He's also a Java Champion and a Microsoft MVP, and he is currently dogfooding the isolation tooling he ships.
"Literally nobody has fully solved agentic risk management, as evidenced by the fact that the frontier AI labs don't even have an answer yet themselves."
Start from the bottom (or don't start at all)
Šelajev's framework for thinking about agent security is architectural, and it starts lower in the stack than most enterprises are looking. "Start from the bottom where the physical limitations of the systems agents run," he said. "My first stop is ensuring proper sandboxing. Then go up the stack and start thinking about which data sources you want to connect." Physical isolation first, then data source governance, then monitoring, then centralized protocol governance at the top. Skip a layer and you're building on sand.
From there, it's a triage problem: what do you connect first, and what's the blast radius if something goes wrong? "If my agent goes rogue and just wrecks my machine, it's not a fatal problem. If it finds my AWS keys and goes to production and switches off some services, that's still relatively manageable. But if it leaks my customer data, connects to my financial systems, and bets everything on Bitcoin, now that's very, very bad."
Šelajev's logic is to start where failure is embarrassing, not catastrophic, and build trust incrementally as you work up the stack. "We're not coming back to typing code character-by-character," he said. But the speed at which agents ship doesn't mean the security posture can skip steps.
"If you want to run autonomous agentic systems right now with your live data, without human oversight, you will be running a security risk. It's a hard problem that is probably here to stay for at least a few months, even in this rapid AI timeline." Cisco's data from RSA Conference 2026 frames the same gap from the vendor side: 85% of enterprise customers experimenting with AI agents, just 5% in production. The distance between those numbers is the governance problem. Another report from API management company Gravitee found that 88% of organizations reported confirmed or suspected AI agent security incidents in the past year. Only 14.4% deployed agents with full security or IT approval, and yet 82% of executives reported confidence that their existing policies were sufficient.
When data becomes code
The security models enterprises rely on were built for a clean separation: codebases are ever-evolving targets, data is inert. You review the code, you sandbox the code, and the data just sits until called by the code. AI agents have collapsed that distinction by putting data on the move and calling it into build cycles. "Now more than ever, data is code," Šelajev said. "People are uploading rogue CSV files with eval instructions to AI agents, and just evaluating that, the agent is fully compromised."
Once an agent ingests external input like an email, a support ticket, or a webhook payload, the content becomes part of the reasoning chain, and the reasoning chain drives action. What happens in between is largely invisible. "If you connect your internal Grafana instance, for example, you're probably running a lower risk of prompt injection than if you connect your email system. One raw email and your agent will do whatever it's told."
In March, Claude shipped its channels feature and Claude Cowork outside of the IDE, making it easier to pipe third-party data into an agent that already has significant power on your machine. "I would do that very, very carefully," Šelajev said.
Where MCP fits in the governance stack
If nobody's watching your agents, the burden falls to the infrastructure itself. For enterprises building agent workflows at scale, that governance layer is increasingly MCP. For individual developers, deterministic CLI tools often make more sense. They're faster and the security surface is smaller. "For software development workflows and for agents they run locally, command line tools are probably superior," Šelajev said.
The enterprise problem is different. Hundreds of devs connecting agents to corporate systems need centralized visibility, enforceable policy at the connection layer, and audit trails that survive an incident review. CLI scripts don't give you that. "MCPs are much better at enforcing enterprise rules, being more auditable and more observable." The alternative is what he calls "a CISO's nightmare": every developer installing skills from random repositories with no central oversight.
The MCP Dev Summit happened this week in New York, with maintainers from Anthropic, AWS, Microsoft, and OpenAI laying out an enterprise security roadmap. Microsoft published its internal governance playbook. And security researchers have catalogued over 7,000 internet-exposed MCP servers, roughly half of all known deployments, many with no authorization controls at all. The protocol gives security teams what they need: audit trails, centralized policy enforcement, and a way to govern what agents can connect to. Whether enterprises deploy it with discipline or let it sprawl ungoverned is the open question.
Is your sandbox really a sandbox?
With the data layer as a new attack surface, the question becomes whether current security tooling can handle it. On macOS, some coding agents use a system feature like seatbelt to restrict access. Šelajev isn't impressed. "It's completely impractical and inadequate for long-running agentic coding tasks because it's severely limiting what you can do without actually providing you strong guarantees. It gives you the illusion of security, which is even worse."
The problem runs deeper than any single implementation. Some agent frameworks allow the agent to modify its own configuration, including the rules about what it can and can't do. "It makes security guardrails a recommendation instead of an enforceable set of rules," Šelajev said. Permission prompts aren't much better. Anthropic's own data on Claude Code's auto mode (an ostensibly safer evolution of --dangerously-skip-permissions) shows that users approved 93% of permission prompts. At that rate, it's nothing more than a speed bump everyone runs over.
The NanoClaw story made the gap tangible. Developer Gavriel Cohen built it as a weekend project after discovering that OpenClaw had stored all his WhatsApp messages in unencrypted plaintext on his local machine. It collected 22,000 GitHub stars in weeks. Šelajev worked on integrating it with Docker Sandboxes, which run each agent in a dedicated microVM with its own kernel and network stack. Container-level isolation wasn't enough. "The amount of work you can do if you let the agent cook by itself is incredible," Šelajev said. "But you can't be sitting there approving every action. That doesn't scale."
The infrastructure underneath
Every team wants to let their agents cook. Few have built the infrastructure to do it safely. The FOMO is running in both directions: leadership afraid of falling behind, individual contributors afraid everyone else is shipping faster. Once people experience what autonomous agents can do, "it's going to take some convincing to make them abandon that. It's hard to put the genie back in the bottle."
The enterprises that build security from the infrastructure up, isolation at the base, enforcement at the data layer, governance at the protocol level, are the ones that won't end up as the cautionary tale. Platforms like Supabase are building RLS, scoped permissions, and audit logging into architecture by default are positioned for a world where the application layer can no longer be trusted to enforce its own rules. The ones waiting for someone else to solve it are, in Šelajev's words, "one major incident from being an example for everyone else."




