All articles

From Banking to Blackboard: An Engineer's Case For CI/CD-Level Governance Of AI Code

Blackboard senior engineer Matt Erman on what a career in banking and FERPA-regulated EdTech teaches you about governing AI code, and why his team built enforcement into the pipeline before most organizations knew they needed it.

Credit: The Read Replica

I would really like to set up ephemeral environments. We end up with roadblocks because people are testing on the same pipeline at the same time.

Matt Erman

Senior Software Engineer @ Blackboard

ex-HSBC, ex-M&T Bank

The majority of all new code written globally is now AI-generated or assisted. Productivity gains are hit or miss depending on the harness, but code churn is definitely up, while trust in AI output is down, and the teams shipping the most AI-assisted code are increasingly the ones filing the most bug reports. The industry keeps diagnosing it as a tooling problem, promising that better models will fix it. But a growing number of practitioners think the problem is simpler and harder to solve: the tools work fine. The discipline doesn't.

Matt Erman has been watching this play out from inside systems where undisciplined code has consequences that go beyond technical debt. Erman spent the first half of his career in banking, building full-stack applications at HSBC Global Banking and Markets and then M&T Bank, environments where consistency requirements are non-negotiable and regulatory exposure punishes sloppy deployments. For the past four years, he's been a senior software engineer at Blackboard, where student data privacy rules under FERPA create a similar compliance surface.

He's also self-taught. Erman started programming at seven, was working professionally before finishing his CS degree, and spent a decade as lead developer on an open-source career simulation game built in VB.NET and C# with an embedded SQLite database. The combination of regulated enterprise work and hobbyist builder instincts puts him in an unusual spot on the AI coding debate: he uses these tools every day and has strong opinions about how.

"Writing code is the lowest level of stuff that we do. That's the grunt level work," Erman told The Read Replica. "If you can abstract that away through an AI tool and oversee what it's writing, it can write code twenty times faster than you can. But it can also produce twenty times as much code that has nothing to do with what you want, or that's not designed for the way your system is architected."

The pipeline as the last line of defense

Most AI-assisted review setups are advisory. Erman's team built one that isn't. His team at Blackboard runs AI-automated code reviews and accessibility checks inside their build pipelines using GitHub Copilot and multiple models. "If something's not accessible, it gets flagged and you have to go back in and fix it," Erman said. "It doesn't move forward until you do."

That's more than most organizations have in place. GitHub Copilot's code review feature, which reached general availability in April 2025 and hit a million users in its first month, leaves comments on pull requests but cannot gate merges based on its findings. Teams that want enforcement-level AI review are building custom integrations or using dedicated tools that offer CI/CD gating.

The reason enforcement matters more than suggestion is written in the security data. A recent report from a security vendor tested over 100 large language models and found that AI-generated code introduces security vulnerabilities in 45% of cases, but also found the security pass rate virtually unchanged at approximately 55%, despite dramatic capability gains in the underlying models. Syntax correctness has climbed, but it seems security hasn't moved.

If the models aren't improving at security on their own, the enforcement layer has to exist downstream. Erman's team has it. Most teams don't.

Inverse confidence

Recent vendor reports found that senior developers see the largest quality gains from AI at 60%, but also report the lowest confidence shipping AI-generated code without review. Junior developers report smaller quality improvements but are far more willing to ship unreviewed. The Stack Overflow 2025 Developer Survey puts numbers on the trust gap: experienced developers show the lowest "highly trust" rate at 2.6% and the highest "highly distrust" rate at 20%.

Erman sees it as a force multiplier problem. "If you're an engineer who doesn't really understand the application, doesn't know what's going on, and you use AI, you're basically multiplying that times ten," he said. "Whereas if you have an understanding and you know how to use it properly, you're multiplying it in the good direction."

Scale that dynamic across a full engineering org and the multiplier cuts both directions at once. "The role is changing from being a player in the orchestra to being the conductor." But the metaphor has a structural problem, and he knows it. "The problem is that everyone's conducting the orchestra at the same time. Without set guidelines for the team, it's the Wild West. As one person is using it one way, another person's using it another way, and you end up with a mess."

Rules that travel with the code

The pipeline catches what ships. But Erman's team also governs what gets generated in the first place. For most enterprise teams, AI governance lives in a policy document serving as a stand-in tie-breaker. Someone in engineering leadership writes up acceptable use guidelines, distributes them via Confluence or Notion, and hopes for the best. Erman's team took a different approach. "We create dedicated instruction files that the AI follows. If it's not doing what it's supposed to be doing, you tell it, 'No, this is not what we discussed.' And you keep going through iterations until it gets there."

The instruction files he describes are part of a rapidly maturing ecosystem that barely existed eighteen months ago. Cursor's .cursor/rules/ directory injects coding standards, architectural constraints, and security requirements into every AI interaction, with glob patterns that activate different rule sets based on file type and context. GitHub Copilot supports a similar pattern through .github/copilot-instructions.md. Claude Code uses CLAUDE.md. Each tool has its own implementation, but the concept converges: governance artifacts that travel with the codebase, not in a wiki nobody reads.

The instruction file is only part of it. Erman's team treats AI output the way they'd treat a junior engineer's first pass. "You don't ever just ask the AI to do something for you," he said. "You have a plan, you work with it as a partner, and you go through multiple iterations. You're not just taking the first thing it gives you." An Ox Security report analyzing hundreds of repositories landed on the same conclusion, describing AI-generated code as highly functional but systematically lacking in architectural judgment. The instruction files and iterative review Erman describes are the oversight that closes that gap. "AI makes it easy to cheat, because it can do the work and you don't have to do anything. But that just causes chaos and a lot of extra work fixing the issues it creates."

The governance gap makes this urgent. 98% of organizations report unsanctioned AI use, while only 37% have AI governance policies in place. Instruction files are the ground-level answer: governance that's version-controlled, code-reviewed, and enforced by the tool itself before the first line is written.

The infrastructure AI can't provide

When asked what he'd build if budget weren't a constraint, Erman didn't ask for better AI. He asked for better infrastructure. "I would really like to set up ephemeral environments," he said. "We end up with roadblocks because people are testing on the same pipeline at the same time. With ephemeral environments, each one is individual to that person. One person testing one feature won't interfere with the other, and it gets destroyed as soon as it's done." The catch is cost. "Unfortunately, the reality is it would be very costly because of the way we run our pipelines," he said.

The tension is revealing. His team has invested in AI at the application layer: instruction files, pipeline-level enforcement, daily use of Copilot across multiple models. But the infrastructure that would let developers actually use those tools without stepping on each other remains unfunded. AI may give teams the ability to produce code faster, but it doesn't give them more environments to test it in. Better models will keep coming. The discipline to govern what they produce won't ship itself.