All articles

Tell One AI Model Another Will Review Its Work, and Output Quality Jumps 1.5x

Vishal Sharma, who has built enterprise architecture across JPMorgan Chase, PwC, and Prudential for two decades, on how the same social pressure that makes engineers write better code under review now structures multi-model development workflows in financial services.

Credit: Read Replica

AI governance should be a curve ahead compared to AI experimentation. That would take us far.

Vishal Sharma

VP of Software Engineering

Tier 1 Financial Services

Engineers write better code when they know someone is reviewing it. The presence of a reviewer, even an implied one, changes the quality of the output. AI models, it turns out, respond to the same pressure.

Vishal Sharma has spent 20 years building enterprise architecture across JPMorgan Chase, PwC, Prudential, and Cognizant. He holds TOGAF and AWS Solutions Architect certifications alongside recent credentials in Vercel v0's agentic builder and vibe coding fundamentals. Today he leads software engineering across systems that process proxy voting, trade settlement, and shareholder communications for global capital markets.

Sharma ran a test, asking Claude to generate an application from a blank project. The output was clean. It passed tests and quality analysis. Then he changed one variable in the prompt: he told Claude that GPT would review its work. The quality increased by roughly 1.5x.

"It's similar to a human doing it," Sharma told The Read Replica. "When humans develop their code, if I have a tester sitting on my side, I'll be extra cautious. But if I'm the one developing it, I'm the one testing it, I'm the one validating it, then I might cut corners."

The finding replicates a known organizational behavior at the model level. Models produce better output when the prompt implies scrutiny from a second system, just as engineers produce better work under peer review. The mechanism differs, but the outcome follows the same curve.

Multi-model workflows as a development architecture

Sharma's teams have scaled that same adversarial pressure across the entire development lifecycle. Engineering teams integrate copilots directly into their IDEs, selecting between models for different phases: one model to plan the system, a second to generate application code, a third to write and execute tests. Major financial institutions now track these multi-model workflows through internal dashboards, measuring which model combinations produce the best results per phase.

Each model operates in a separate context, evaluating the prior model's output with no shared incentive to preserve it. The architecture creates a feedback loop where the system itself enforces quality rather than relying on process or human discipline. "I've challenged teams: Don't write anything on the editor. Use the prompt to make everything happen," he said.

The result changes what engineering work looks like in practice. Product managers generate functional UIs in tools like Lovable that ship as usable code, cutting UI development cycles by three-fourths. API product managers who once spent weeks writing OpenAPI specifications now generate them through prompts and validate the output. "It was developers when we started, then engineering. Now an individual software engineer is becoming a software director. The work becomes about directing," Sharma said.

The organizational design implications are already visible at companies where AI forces teams to rethink which humans sit at each stage. The role survives, but the job description is unrecognizable. Execution gives way to validation.

When automation starts thinking

The adversarial review architecture addresses one side of the problem: code quality. The other side is what happens when AI replaces the deterministic systems that already run in production.

Financial services runs on automation that was never designed to make decisions. Sharma has watched the same workflows his teams maintain migrate from scripted rule engines to something less predictable. "It was done as part of RPA, but that was not intelligent automation. That was a lot of if-else statements. Now it has evolved into thinking like a human."

The distinction matters because the failure modes are different. A broken RPA script fails visibly at a known branch point. A model-driven workflow can fail silently by choosing the wrong path with high confidence. That silent failure is the same correctness problem financial services engineers are now confronting across the stack, from code generation to operational automation to the governance layer that has to account for both.

Governance at the data layer

Silent failures in model-driven systems are precisely the kind of risk that regulators haven't yet written rules for. Financial institutions operate under SEC, FINRA, and international regulators whose pace of oversight now trails the speed at which banks deploy AI. That gap is where compliance risk compounds.

"Trust is a fabric that cannot be an afterthought," Sharma said. "For us, that is part of our DNA."

In practice, that means governing what data AI systems can touch before they touch it. Sharma's teams enforce containment at the enterprise perimeter, ensuring models train on internal data without that data leaving the organization. They classify and tag what can and cannot be surfaced to AI systems, a concern other financial institutions are encoding directly into compliance architecture. And they evaluate whether a system introduced into production opens attack surfaces that did not previously exist. "I've seen many of these fancy things which look great on the surface. But when we dive deep, it is breaking one or two or many security concerns."

Why demos die in production

The failure modes that kill AI initiatives at the production threshold don't arrive one at a time. They compound.

"When experiments are performed in a very controlled environment, it's like reading out of a script. That's why it looks successful," Sharma said. Demos run on clean data with predefined workflows. Production doesn't. And the data problem runs deeper than format. "One thing can make my data pipeline super fast. But what if my data is not right?" he said. "Documentation is really outdated. Missing fields, duplicate records, siloed systems." IBM's 2026 banking outlook identified the same pattern: AI capability outpacing the data infrastructure required to support it.

Even when the data is clean, the organizational scaffolding often isn't. "When you're doing a proof of concept, it can survive through your enthusiasm. But productionized systems need ownership." And ownership alone doesn't solve for regulatory exposure. Features that work in a sandbox may violate compliance constructs that only surface under production scrutiny. "If it goes to production, will it be governable?"

Bad data feeds an ungoverned model that nobody owns, running in an environment never stress-tested against production load. The teams solving for this at the governance layer rather than the model layer are the ones whose pilots graduate.

Governance, a curve ahead

"Not every experiment gets into production. But a governed one will, because it's built for a regulated environment." Governed experiments have the accountability structures, the review loops, the data classification, and the ownership in place before the system reaches the point where failure carries real consequences. The multi-model review pattern Sharma discovered in his own coding experiments is the same principle applied at the prompt level: systems that know they are being watched perform better than systems that don't.

"AI governance should be a curve ahead compared to AI experimentation," Sharma said. "That would take us far."

The teams operating in that gap between what AI can do and what governance allows are defining what production-grade AI actually looks like in regulated industries. The question remains: Are the organizational structures around the models mature enough to let them run, and disciplined enough to stop them when they shouldn't?

The views and opinions expressed are those of Vishal Sharma and do not represent the official policy or position of any organization.

pgBackRest Is Back, But Open Source Has a Stewardship Problem It Can't Keep Ignoring

AI-Generated Pull Requests Are Crashing Postgres Instances Daily, And the Only Way Through Is Architectural

Decoding 'Architecture As An Algorithm' and Production AI Explainability In Regulated Industries

The Case Of 'A Billion APIs': Overcoming AI-Induced Monolithic Tech Debt No One Can Maintain