Latest News

Enterprise Innovators

April 10, 2026

The Traditional OLTP Playbook Breaks RAG. Here's What Replaces It.

AI/ML engineering leader, Debasish Bhattacharjee, sat down with The Read Replica to discuss why production RAG succeeds or fails at the data layer, and why the hardest calls around lineage and observability are pulling directors back into the codebase.

Credit: The Read Replica

You can't make good architecture decisions for a RAG pipeline from a spreadsheet.

Debasish Bhattacharjee

Engineering Leader

AI/ML Advisor

Enterprise teams are consolidating onto Postgres for AI workloads at a pace the industry hasn't seen for any database category. Snowflake spent $250 million on Crunchy Data, Databricks picked up Neon for roughly $1 billion, the database startup incumbents are raising as newer players expand. The teams building AI-native pipelines are reaching for the database they already know, but there's a problem with that instinct.

The schema design principles that make Postgres reliable for transactional workloads, like its ruthless normalization, minimized storage, and enforce consistency, are seemingly in conflict with retrieval performance that has become table stakes for AI. Teams trained on OLTP that ship a RAG system on Postgres without fully scoping the current landscape of vector options might fatigue from reading the breadth of mixed signals from vendor docs.

Debasish Bhattacharjee is an AI/ML expert with 22 years of experience scaling systems across Fortune 500 organizations like Oracle, IBM, Broadcom, SAP, and independent advisory work. In a recent enterprise initiative, he led a 12-person team that shipped a production RAG chatbot in four months, replacing tier-one support entirely and saving the company $19 million a year. He was 50% hands-on through the build, blocking four-hour, no-meeting chunks to write code and sit inside every architecture review, because retrieval pipeline design is nearly impossible to reverse once shipped into a mature product.

"You can't make good architecture decisions for a RAG pipeline from a spreadsheet," Bhattacharjee told The Read Replica in a recent interview. "You have to understand retrieval, data quality, the tradeoffs." Tools like Claude Code have collapsed the context-switching cost that used to make that impractical for directors who can now jump into a PR, understand the context in thirty seconds, and jump out.

RAG is not dead

The late-2025 "RAG is dead" commentary in forums was likely a byproduct of the AI-induced Stack Overflow diaspora. Tactical commentary gave way to hypothetical waxing. Yes, retrieval pipelines are evolving with GraphRAG, agentic memory architectures, and longer context windows, but the data-layer problems Bhattacharjee spent four months solving carry over to every one of them. Lineage, versioning, low-latency retrieval, and systematic human evaluation are the difference between a system that gets debugged in minutes and one that hallucinates silently in production.

The deeper lesson is a schema-design principle that enterprise teams trained on OLTP are going to keep relearning: traditional database design minimizes storage and enforces consistency, while AI workloads minimize query latency and maximize observability. These are borderline opposite goals. Postgres can serve both. But only if you stop treating your vector store like a black box and start treating it like a system that needs the same observability, versioning, and lineage discipline you'd demand from any production data pipeline.

"A RAG system doesn't fail because of the model," Bhattacharjee said. "It fails because the data layer wasn't designed for retrieval, versioning, and observability from day one."

The invisible month

Bhattacharjee laid out a timeline for his own implementation.

Month one was entirely data work. Cleaning, deduplicating, and reconciling five years of outdated support documentation. Month two was pipeline architecture like chunking, embedding, retrieval, and reranking decisions. Month three was integration with live support: routing logic, human fallback, confidence thresholds. Month four was hardening, load testing, and security.

"Nobody warns you about that first month," Bhattacharjee said. "Everyone talks about model selection and prompt engineering, but the data quality piece is what made the $19 million possible. That's invisible in the timeline."

He's not alone in that assessment. HBR analysis published in May 2025 reaches the same conclusion at the industry level, identifying unstructured data quality as the primary bottleneck for enterprise gen AI value. Bain research found that AI pilots often succeed because they run on manually cleaned, offline datasets, and then stall when they hit the messy reality of production data. The pattern Bhattacharjee described isn't an edge case. It's the norm that most teams discover too late.

Picking Postgres and moving on

The team skipped the vendor evaluation phase entirely. They chose pgvector on Postgres, built a baseline in two weeks, and iterated on quality from there.

"Tooling was not the bottleneck," Bhattacharjee said. "The speed came from organizational design. A 12-engineer team, two-week sprints, one technical leader making fast architecture calls, and zero committee approval."

The performance data backs the call. Tiger Data benchmarks show Postgres with pgvectorscale hitting 99 percent recall on 50 million vectors with more throughput than Qdrant in the same configuration. For a team that needed to move fast, pgvector on a database they already understood was the lowest-risk starting point.

But Bhattacharjee's experience also surfaces where that bet gets tested.

Where one index strategy breaks

Under a million vectors, pgvector's default IVFFlat index worked fine for the team without much tuning. His RAG chatbot was different. Millions of document chunks across multiple product lines. Bhattacharjee faced a choice: tighter recall with aggressive reindexing, which burns production friction every few weeks, or faster queries with fewer reindex cycles and slightly looser recall on edge cases.

He solved it with a hybrid approach. IVFFlat handled the first approximate search in roughly 100 milliseconds. A cross-encoder reranker caught whatever the first pass missed. End-to-end latency stayed under three seconds at high recall.

"The honest call," Bhattacharjee said, "is that once you pass a few million vectors with frequent reindexing, a dedicated vector store earns operational isolation. You don't want vector search spikes tanking your transactional database." His advice: start with pgvector, but have the intellectual honesty to recognize when you've outgrown it.

Flip the OLTP playbook

Bhattacharjee's RAG schema took a counterintuitive approach to traditional Postgres OLTP. "We denormalized aggressively," he said. "Embedding metadata, chunk boundaries, and source document versions, all in the same row, in a JSONB column. One query instead of five joins. You pay a bit more on write consistency, but you gain massive read throughput and simpler indexing."

The second inversion is temporal. OLTP tracks transactions. AI workloads track lineage and provenance. Bhattacharjee added timestamp columns everywhere such as when embeddings were generated, when documents were chunked, when the embedding model changed.

"That reindexing strategy only works if you can ask which vectors are stale. That requires denormalized temporal metadata baked into the schema from day one, not bolted on later." And that carried through to rigor in testing.

"There is no unit test for 'was that helpful?'" Bhattacharjee said. "We built human evaluation rubrics. Support agents graded chatbot responses weekly. That feedback loop was the single most important thing we shipped."

The implication for anyone designing a Postgres schema for AI workloads is concrete: your instincts are likely outdated. The normalization discipline that makes OLTP systems reliable will make your retrieval system slow, opaque, and hard to debug.

Versioning the hard way

The team learned embedding versioning through production pain. They would update a source document or re-embed with a newer model, but the vectors didn't carry information about which version they represented. Stale chunks sat alongside fresh ones. When a support agent flagged a bad answer, the engineers couldn't trace it back to the document version or embedding model that produced it.

The fix had three layers. First, metadata tagging: every chunk stored a document version identifier and an embedding model version in its metadata. Second, gradual rollout: when they switched embedding models, they ran old and new indexes in parallel for several weeks before pruning. Third, an audit trail that logged which version of the knowledge base each query hit, so a bad answer from three weeks ago could be replayed against the state of the system at that time.

Platforms are starting to build this kind of lineage natively through automatic embedding generation that handles versioning at the infrastructure level in an attempt to keep teams from having to learn the lesson the way Bhattacharjee's team did.