All articles

Old-School Speccing Makes a Comeback As Devs Find The Limits of Natural Language LLM Interactions

Gary Ang, former Lead of AI Supervision at the Monetary Authority of Singapore, discusses the evolution of spec coding in the era of LLM-driven development.

Credit: The Read Replica

The real change is that you start designing what the system is allowed to do up front, instead of trying to steer it once it is already running.

Gary Ang

Former AI Lead

Monetary Authority of Singapore

Limitations of speech should not limit quality of work. Natural language, with its ambiguity and interpretive pitfalls, only gets developers so far. While prompts describe intent, they do not draw hard boundaries. And when systems depend on interpretation, behavior shifts in ways engineers cannot reliably predict or reproduce.

This is why the discipline of speccing for AI-native developers is regaining value.

We spoke with Gary Ang, an independent doing AI training and advisory, focused on AI governance and risk management in financial services. As the former Lead of AI Supervision at the Monetary Authority of Singapore, he developed the country’s first AI risk management guidelines for the financial sector and holds a PhD in computer science with 11 peer-reviewed publications. In our interview, Ang noted:

“There’s no real need for a large language model to actually try to write code on the fly. If you could write the function beforehand and then just tell the model to use that tool, that’s higher trust.”

That idea shows up in how teams are shifting work in practice. They are pulling control out of prompts and pushing it into specifications that define inputs, outputs, and allowed actions before anything runs.

Why natural language breaks at scale in AI systems

“The real change is that you start designing what the system is allowed to do up front, instead of trying to steer it once it is already running,” Ang said. Natural language works fine when humans are in the loop. It gets messy when it becomes system logic. The more steps you add, the more that mess compounds.

In a single interaction, ambiguity is manageable. In multi-step workflows or agent chains, it stacks. One agent interprets a prompt slightly differently, another builds on that output, and the drift spreads quietly through the system. By the time it reaches production behavior, no single step looks obviously wrong.

What’s missing are constraints. Teams do not fully pin down what the system is allowed to do, so behavior becomes something they infer after the fact rather than something they define in advance. In Ang’s view, this is where the power of LLMs becomes a liability:

“You can end up in situations where everything looks fine at each step, but the system still produces the wrong outcome because nobody defined the boundaries tightly enough,” Ang said.

Spec-first engineering: replacing language with structured constraints

Spec-first engineering replaces descriptive prompts with explicit structure. Instead of guiding a model with language, teams define what the system must accept, produce, and execute before it runs.

At its core, this looks simple: Inputs are fixed and typed, not loosely described; outputs must match schemas and validation rules; and actions come from a closed list of allowed tools or functions.

Instead of asking a model to “figure out” code or queries, teams give it tools and force it to choose from them. Instead of accepting flexible responses, they validate structure at every step.

The key shift is timing. Teams enforce structure before execution, not after failure. That changes how systems behave in practice. Ambiguity is not resolved for later. It never enters the system in the first place.

“If you break a system down into small, well-defined spec units, you give yourself a fighting chance of actually understanding what it will do and testing it properly,” Ang said.

Constraint design as the new security model

AI systems push security away from application-layer fixes that assume predictable execution. AI systems do not behave that way, so teams now have to constrain behavior at the level where decisions happen.

Ang labeled the three constraints that matter most: Tight permission scoping so agents only access what they need; deterministic tool use instead of free-form generation; and enforcement at the database layer where possible.

Database-level controls are especially strong because they sit below the application. They’re able to restrict data access at a granular level based on rules, regardless of how queries are generated.

Ang framed this as a permission problem; you simply reduce what models are allowed to touch:

“If you give an agent access to a database, you need to be deliberate about every permission. Read-only should actually mean read-only, and anything beyond that should require explicit checks.”

What LLM speccing enables: predictability in agentic systems

Agentic systems prove the point especially well. They fail in ways that rarely trace back to a single broken component. The issue usually sits in how components interact over time. That makes debugging feel more like reconstructing a chain of decisions than fixing a single problem.

"If you have multiple agents in a workflow, that means you not only have to test each component or each agent, you also need to look at how it behaves across the trajectory. That's going to be harder than just testing a single unit,” Ang said.

When teams constrain each agent and tightly define how they interact, the system becomes easier to reason about. Behavior stops shifting unpredictably across runs because the system has fewer degrees of freedom. Mandating how the whole system behaves prevents runtime problems more than fine tuning individual agents.

Less reliance on natural language reduces flexibility, but it increases predictability. In production environments, predictability is often the constraint that matters most.

pgBackRest Is Back, But Open Source Has a Stewardship Problem It Can't Keep Ignoring

AI-Generated Pull Requests Are Crashing Postgres Instances Daily, And the Only Way Through Is Architectural

Decoding 'Architecture As An Algorithm' and Production AI Explainability In Regulated Industries

The Case Of 'A Billion APIs': Overcoming AI-Induced Monolithic Tech Debt No One Can Maintain

Tell One AI Model Another Will Review Its Work, and Output Quality Jumps 1.5x