Same IDE, Two Developers: One Shipped a Month of Front-End in an Afternoon. The Other Wiped the Drive.
CBOS senior developer Damian Matthews on why multi-tool AI pipelines through MCP can collapse weeks of work into hours, but only when the person driving them understands the architecture underneath.

Doing the heavy implementation yourself ingrains it. Use AI to help write the code, but make sure you understand exactly what it’s doing, and never commit anything you wouldn’t write yourself.
Two weeks after Google launched Antigravity in November 2025, a developer asked the agent to clear a project cache. Turbo mode issued an rmdir against the root of the D: drive instead of the project folder, and the deletion bypassed the Recycle Bin and the data was unrecoverable. It wasn't an isolated incident. Months earlier, a prominent vibe coding agent famously deleted a user's production database and then reportedly fabricated data to cover it up.
The distance between either compressing a month of work into an afternoon or wiping an entire drive isn't the model, the IDE, or the protocol. It entirely depends on the assumption that the person driving the pipeline knows what's underneath it.
Damian Matthews didn't do either of those things. But he watched both ends of the spectrum play out from the same office.
The pipeline that shouldn't have worked this fast
Matthews is a Senior Software Developer at CBOS, a software and digital transformation firm based in South Africa. Self-taught, he came up through support technician roles at Spinnaker Software, moved into test engineering at Autedi Digital Transformation, and worked his way through senior dev positions at Exclr8 Business Automation before landing at CBOS.
Recently, Matthews' CTO needed a fresh UI for a client. Rather than assigning the work to a dedicated front-end team, the CTO wired up a local MCP server, connected Google's Stitch to it, piped the same MCP instance into Antigravity, and linked the whole chain to Figma. When the CTO noticed the app lacked a login screen, he prompted Stitch, and the pipeline pulled design context through MCP, generated a cohesive login screen matching the existing app's look and feel, and pushed the result back into the IDE.
"What he accomplished in a late afternoon would have probably taken a team that's skilled and proficient in the front end two to four weeks," Matthews told The Read Replica.
Matthews didn't build the pipeline. But he understood immediately why it worked, and why it wouldn't have in most people's hands. The CTO already knew the architecture from a full stack perspective. He understood which layers needed to talk to each other, what the existing app's design language was, and where the AI output would slot into the broader system.
The CTO's relatively simple pipeline is a case study in what most organizations can't do yet. And if the setup seems obvious, you're likely ahead of the adoption curve by an order of magnitude.
Why 10 percent gains stay at 10 percent
Bain's 2025 Technology Report puts the productivity gain from AI coding assistants at 10-15%, but writing and testing code accounts for only 25-35% of the enterprise development process. Speed up that slice alone, and the returns are marginal across the board. Companies pairing AI with end-to-end process transformation see 25-30% gains, Bain found, but those companies are redesigning workflows, not just bolting on autocomplete.
Matthews' CTO wasn't using AI to just write code faster, but to orchestrate a multi-tool pipeline that collapsed weeks of cross-functional effort into a single session. That tracks with Bain's own conclusion: real payoff comes from applying AI across the lifecycle, not just code generation.
Meanwhile Faros AI's data across more than 10,000 developers tells the same story from the other direction. High AI adoption correlates with over 150% larger pull requests and around 10% more bugs per developer. Developers on those teams completed more tasks and merged almost double the amount of PRs, but review time ballooned on the back end. The throughput is real. Whether anyone is reviewing what ships is a different question.
YC's Garry Tan recently claimed 37,000 lines of code per day across five agentic projects. But a senior engineer who inspected the output found bloat and rookie mistakes visible without touching the back end. "Right now we're in a moment where AI lets you generate code faster than any human can review it," the engineer told Fast Company, "and the answer from people like Garry seems to be 'so stop reviewing.'"
The JSON jigsaw
Matthews isn't surprised by any of this. The core problem, as he sees it, is architectural. Modern applications span a front end, an API layer, and a data store, and AI tooling still operates within one surface at a time.
"AI can assist in debugging in certain aspects, but it will usually be in a very granular sense where one thing is not working as expected," Matthews said. "It's not going to have the context of, 'Well, your front end layer is expecting this and your API is just returning results with a different JSON object.'"
The code compiles. The function returns data. But the serialization format between layers doesn't match, and the front end can't deserialize what the API sends. In a decoupled architecture, that mismatch lives in the seam between services where AI's context window runs out. The CTO's pipeline worked precisely because it bridged those seams through MCP, giving the AI chain visibility across tools that normally can't see each other. But someone had to know which seams existed in the first place.
The ecosystem is starting to catch up to this problem at the protocol level. Most of the major DX platforms have shipped MCP server implementations that let AI tools connect directly to the database layer, standardizing how LLMs talk to backend services instead of leaving developers to wire those connections by hand. But the tooling only works when someone understands the architecture it's connecting to.
DIY debugging in the age of agents
"What really drives progress in this industry is grit," Matthews said. "The ability to stick with something and figure it out no matter what." But his concern is that AI-generated implementations let junior devs skip the reps that build those instincts. DevTools, Postman, console logs at each layer, tracing the data until the divergence shows up. That's the work that determines whether the app actually functions in production.
"Doing the heavy implementation yourself ingrains it," he said. "Use AI to help write the code, but make sure you understand exactly what it's doing, and never commit anything you wouldn't write yourself."
He's not anti-AI. He's anti-skipping. The distinction matters because the failure mode isn't AI-generated code that doesn't work, but code that fails in the real world because nobody understands how the layers connect in prod. A junior developer running the same tools without that literacy might generate output that looks correct in isolation and breaks at every integration point.
Stay away from React?
Faster output also means faster intake, and the checkpoints that assume human oversight like dependency review, package audits are the first casualties when code ships at machine speed. Matthews doesn't hedge on where the risk is highest.
"I stay away from React. There's been too many recent data breaches and security issues. I tell people to watch your npm packages, and always make sure that they're clear."
The past six months have made that hard to argue with. React2Shell, disclosed in December 2025, was a CVSS 10.0 remote code execution vulnerability in React Server Components that state-sponsored threat groups exploited within hours. In September 2025, the Shai-Hulud worm compromised 18 widely used npm packages, including chalk and debug, with a combined 2.6 billion weekly downloads. CISA issued an alert. A second wave hit in November with destructive payloads. And just last week, Google Threat Intelligence reported that North Korean threat actors compromised the axios npm package in yet another supply chain attack.
When dependencies get pulled in at machine speed without anyone auditing the tree, every one of those incidents becomes a deployment waiting to happen.
Matthews' team has shifted their testing approach accordingly, moving from strict TDD to Cucumber, a BDD framework that describes expected behavior in human-readable language and executes it programmatically.
"You can think of it as old-school Selenium where you create your front-end tests where the browser goes and does all the clicking," Matthews explained, "but you base it off on using a very verbose language, so you tell it what you want to do."
The logic is straightforward: when AI generates code that blows past a function's intended scope, narrow unit tests produce noise. Behavioral guardrails that test what the user actually sees catch what rigid syntax tests miss. It's a testing philosophy built for a world where the code comes fast and the question isn't whether it runs, but whether it does what it was supposed to.
Matthews sees AI closing that gap eventually. "It will get there, without a doubt," he said of full-context debugging across the stack. But right now, the distance between using these tools well and using them catastrophically has never been wider. The tooling doesn't care which side you're on. It just amplifies whatever's already there and fortunately forces engineers of all skill levels to fight the urge to let AI take the reins entirely.




