All articles

The Line Between The Database And AI Memory Layer Is Getting Blurry. The Trick Is Making Sure Your Schemas Are Not Paying For It.

Are teams putting data in the wrong layer when it comes to AI? We spoke with KPMG's Head of AI and enterprise data engineers about what proper data and memory management looks like in practice.

Credit: The Read Replica

Make The Read Replica one of your go-to sources on Google

The database remains as a source of truth. The memory layer becomes a source of context. But most architectures don't draw that line clearly enough.

Ashish Chandra

Partner and Global Head of AI

KPMG

The views and opinions expressed are those of the individuals and do not represent the official policy or position of any organization.

Enterprises are pouring the bulk of their AI budgets into models. According to KPMG's Ashish Chandra, that's the wrong line item. It's the infrastructure that determines whether an AI system holds up in production, and most teams can't answer the question that shapes everything underneath: where does the database layer end and the memory layer begin?

"The database remains as a source of truth. The memory layer becomes a source of context," Chandra said. "But most architectures don't draw that line clearly enough." As Partner and Global Head of AI at KPMG, Chandra leads AI transformation across enterprise clients in financial services. Before that, he spent nearly two decades at Standard Chartered Bank leading technology innovation across retail, wealth, and corporate banking. He also runs a home lab where he builds agents and operates inference on his own hardware, surfacing the same infrastructure bottlenecks his clients encounter at production scale.

Get the line wrong and the costs cascade: the budget goes to the wrong layer, the schema stores data where it cannot be governed, and the system breaks the first time someone asks it to explain a decision it made six months ago.

The database holds the facts. The memory layer holds the reasoning.

Traditional enterprise architecture stacks users, applications, business logic, and databases. The database stores transactions, maintains records, and supports reporting as the system of record. ERP, CRM, and HRM systems all sit on the same infrastructure pattern, and the database's job is to answer a narrow set of questions: what was ordered, what was paid, what was filed.

AI-native architectures insert an agent layer and an enterprise memory layer between the application and the database. "This is exactly where many AI strategies become disconnected from operational reality," Chandra said. "The database layer doesn't disappear. In fact, it becomes more important than ever." The database answers what happened: an invoice was paid, a tax return was filed, an audit review was completed. The memory layer answers why it happened, drawing on prior decisions, institutional context, and process history stored across vector databases, knowledge graphs, and semantic memory systems.

The agent layer sits on top of both and decides what should happen next. Get the layering wrong and the agent is reasoning over data it should be joining, or joining data it should be searching semantically, and the failure mode is subtle enough that most teams don't catch it until production.

Chandra points to a fuel consumption agent he's been building inside a mining digital twin as an example. The agent needs process data, maintenance history from an ERP, previous optimization decisions from a memory layer, engineering knowledge from a knowledge graph, and plant constraints from a vector store. Five sources, five layers, each there for a different reason. "Many executives assume the LLM is the slow part. In reality, the memory search, context assembly, permission checks, and prompt construction are where the time goes," he added. At scale, he estimates inference accounts for roughly 20 to 30 percent of total response time. The rest is infrastructure.

The proof is in the budget

The spending reflects the confusion. Chandra estimates that enterprises are typically allocating roughly 70 percent of AI spend to models, 20 percent to infrastructure, and 10 percent to data. He argues the ratio should be closer to 10 percent on the model, 40 percent on data, 30 percent on the memory and retrieval layer, and 20 percent on operations. The direction tracks with BCG's widely cited 10/20/70 framework for enterprise AI investment, which allocates just 10 percent to algorithms and models.

For Kishan Raj VG, a data engineer at Midtown Athletic Clubs who builds data pipelines and quality frameworks for a multi-location fitness brand, that 40 percent data allocation landed on his desk as an entirely new category of work. "We never used to tag our tables before AI came into play," Kishan said. "There was no need to. I knew exactly what data was in every table. But now we're tagging tables, defining every column, and writing rules for how each metric gets calculated." Row-level quality checks run before data enters the transformation layer. All of it became primary work once AI replaced the human analyst as the consumer of the data.

"AI reads whatever is in the table," Kishan said. "If the numbers aren't right, it'll just repeat them as if they are." If a mid-market fitness brand with a lean data team is doing this work, the idea that it's optional at enterprise scale is difficult to defend.

What belongs where

The rule for the database layer isn't complicated. If it needs integrity, governance, or joins, it belongs in Postgres. In an AI-native system, that scope extends well past traditional transactional records to include agent state, task status, workflow steps, conversation metadata, and governance records. The acquisition spree tells the story faster than any architecture whitepaper. Snowflake spent $250 million on Crunchy Data, Databricks acquired Neon for roughly $1 billion, and as VentureBeat noted, PostgreSQL "will be more relevant than it has ever been before."

Specialized systems earn their place when the workload stops being relational: vector databases like pgvector, Pinecone, or Milvus for semantic recall; graph databases like Neo4j for relationships and reasoning paths; time-series databases for high-volume temporal signals. The temptation on day one is to split the stack early. Chandra's advice is to resist it.

Keep vector search inside Postgres using pgvector until retrieval scale actually forces the move. A separate vector database before you have the data to justify it adds infrastructure complexity and a synchronization surface that creates more problems than it solves. "Don't start with MongoDB because you think AI data is unstructured," Chandra said. "That can be a costly mistake."

The distinction that matters most, though, is between a vector database and a memory layer, which are not the same thing. Vector search retrieves documents; it does not reconstruct the decision history, institutional knowledge, or process lineage that constitute organizational memory. Chandra's practical test: put it in Postgres when the question is "is this true, approved, governed, or auditable?" Put it in specialized memory when the question is "can we retrieve, relate, simulate, search, or reason over this at scale?"

Kishan's team at Midtown pressure-tests that boundary every time a model goes to production. If the truth layer is wrong, the AI is wrong, and there's no human in the loop to catch it. "We pulled six months of finalized billing reports from our finance team. Then we asked the same questions to AI and compared the answers. If the numbers didn't match, we knew exactly where the problem was."

The control plane extends beyond the database

Drawing the line is an architecture decision. Enforcing controls across it is where most teams stall. Governance is now a fast-growing line item, with Gartner projecting spending on AI governance platforms to reach $492 million in 2026 and surpass $1 billion by 2030.

Debasish Bhattacharjee is an engineering leader who has spent 22 years scaling AI systems across Fortune 500 organizations like Oracle, Broadcom (formerly CA Technologies), Macy's, IBM, and most recently SAP, where he shipped a production RAG chatbot that replaced tier-one support and saved $19 million annually. He has watched teams identify the governance gap between these layers and then do almost nothing to close it.

"Guardrails are scattered across all three layers right now, and that's the problem," Bhattacharjee told The Read Replica. "Prompt-level guardrails fail under pressure. Code-level is where real enforcement happens. The data layer is the maturity gap."

The pattern that holds: the agent calls a backend service that enforces RBAC and policy checks before returning data. Secrets stay in infrastructure vaults, never in the LLM context window. But most teams stop at logging. "Observability does not equal enforcement," Bhattacharjee said. "Logging a bad decision after it happens doesn't prevent it."

Even when enforcement does reach the data layer, it introduces its own complications. At Midtown, Kishan's team has found that row-level governance creates a new problem for AI to solve. "When governance goes to row level, it becomes tougher for AI to read the data," he said. "If someone with less permissions starts asking questions they don't have permissions to, the AI should be robust enough to answer them saying you don't have permissions, and at the same time it should not be hallucinating."

Row-level security enforced at the database layer has to travel with the data through retrieval and into the context window. The read-only access pattern that infrastructure leaders at other enterprises have landed on as the only guardrail that consistently holds is a version of this principle: if enforcement doesn't live at the data layer, it doesn't hold under pressure. "This is not an LLM ops problem. It's an infrastructure problem most teams haven't tackled yet," said Bhattacharjee.

Every AI decision needs a paper trail

Chandra's day-one schema for an AI-native system starts with Postgres as the primary system of record, object storage for large artifacts, pgvector for embeddings and chunks, and Redis only for session cache, rate limits, and job queues rather than as a memory of record. The most important table in that schema is not messages. It's agent_runs. "Every AI action must be traceable. Who asked, what context was retrieved, which model was used, which prompt version, which tools were called, what confidence score," said Chandra.

The traceability he's describing requires instrumentation that most teams don't have. Standard application performance monitoring was built for synchronous request-response pairs, and it captures almost none of what matters when an agent is selecting tools, assembling context, and making decisions across multiple layers. The AI-generated workloads already hitting production Postgres compound fast when there's no structured record of why an agent did what it did.

"You need to capture why the agent selected tool A over tool B at a decision point," Bhattacharjee said. "We logged every reasoning step: the thought, the action selection, and the observation, with timestamp and confidence score." He calls this "tool call genealogy": the full dependency chain where an agent called one tool, which triggered a second asynchronously, which failed and triggered a retry of a third. That's a directed acyclic graph, not a trace.

"An agent might call five tools to answer one user question. You need to know the cost per decision, not per tool invocation."

Models come and go. The data layer carries forward.

"Organizations think they're buying AI," said Chandra. "In reality, they're building memory systems. The long-term value is decisions, expertise, institutional knowledge, and process history."

The line between the database layer and the memory layer is getting drawn whether teams are deliberate about it or not. Every agent deployed without that distinction made clearly adds to an architectural debt that compounds with each decision the system makes and cannot explain.

The signal, once a week

Reporting, contributor perspectives and sharp notes from the people building with Supabase in the real world. No noise, no spam.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Global Indexes, A Reversed Commit, And A Dead Extension Model: How Postgres Features Actually Reach Users

Side-Stepping The Data Movement Tax: How Fragmented Data Stacks Are Converging Back Onto Postgres

Agent Workloads Are Just Beginning To Expose What Row-Level Security Really Costs

Repack Took a Decade To Graduate Into Postgres Core. The Features Still Waiting Could Take Longer.

OWASP Checked Its AI Risk Rankings Against 6,600 Real Incidents And The Surprises Are All Below The Model

Postgres' Logical Replication Stream Gets A Second Life As A Cache Invalidation Engine

'There Is No Boss': How Postgres Decides Its Future In Public

Everyone Is Handing AI Agents The Keys To Production Postgres. POSETTE’s Sharpest Talks Were About The Brakes.

The Best Lesson From POSETTE’s Vendor Day: When Your Postgres Feels Slow, It's Rarely Postgres

What A Morning With Postgres’ Maintainers Reveals About How The Database Really Gets Made

POSETTE '26: Postgres Has Stopped Trying To Prove It Belongs And Started Absorbing The Stack Around It

How Postgres' Rise To Enterprise Default Is Outpacing The Operational Model Behind It

What Happens To Agent Memory When You Swap The Model? Two Practitioners Are Building The Answer In The Database Layer.