All articles

What Happens To Agent Memory When You Swap The Model? Two Practitioners Are Building The Answer In The Database Layer.

A startup founder running three databases for a single agent and a Snap engineer burning 5 billion tokens a day on an AI RPG are converging on the same conclusion: the database layer is the only part of the stack that survives a model swap.

Credit: The Read Replica

Make The Read Replica one of your go-to sources on Google

If you want to actually capture what people are thinking, how the processes work, the only way is at the harness level. Even if you change the model, you keep all that knowledge in the memory.

Joshua Sum

Founder & CEO

Morphic

The model is the most hyped, and most disposable, component in the stack. It gets deprecated, swapped for a cheaper alternative, or replaced by the next generation every few months. The memory layer doesn't. The context graphs, the user state, the knowledge that gives an agent its identity across sessions, that lives in the database. And unlike the model, it compounds.

Joshua Sum is the Founder and CEO of Morphic, a company building enterprise software that adapts to how each user works rather than forcing users to adapt to it. That product vision puts him at the center of a question most AI teams are still avoiding: what database architecture do you build when the agent's memory is the product? Sum's stack runs on Supabase, MongoDB, and HydroDB simultaneously, and the combination is deliberate. "Piecing all these together is actually what delivers value to the end user," he told The Read Replica.

The harness outlasts the model

Sum's framing starts with a provocation backed by benchmarks. "You give the same model a different harness, you can see upwards of 40 to 50 percent improvements across all the benchmarks," he said. The harness, in his definition, is everything around the model: retrieval, tool selection, memory management, context compaction, execution environments. The model handles reasoning. The harness handles everything else.

That framing inverts the conventional assumption about where competitive advantage lives. "People aren't buying the best model or the best partners," Sum said. "They're buying the best outcomes." And outcomes, in his experience, depend more on the infrastructure wrapping around the model than on which model you choose. "If you have the right harness and the right memory layer, even older models or free open-source models would vastly outperform the most intelligent models if you gave them a very bad harness."

The deeper implication is about durability. Models get swapped; the harness persists. And the most durable part of the harness is the memory layer, because it captures something models can't: tribal knowledge. "If you want to actually capture what people are thinking, how the processes work, the only way is at the harness level. Even if you change the model, you keep all that knowledge in the memory," said Sum.

Why one database isn't enough

Sum didn't land on three databases by accident. Each one solves a different category of state, and the categories don't collapse into each other.

Structured operational data goes to Supabase, which gives Morphic a full Postgres instance. When the workload is well-defined, relational, and needs to be optimized for cost, speed, and regional availability, Postgres is the natural fit. "If you're doing something very structured, something like a Postgres would work," Sum said.

Flexible application state goes to MongoDB. When agents morph to each user's workflow, the data they accumulate doesn't fit a fixed schema. Preferences, tool configurations, and interaction patterns vary per person, and the shape of that data changes as the agent learns. "You need something more flexible where your product and your agents morph to the user," Sum said. NoSQL handles that more naturally than a relational model trying to anticipate every variation upfront.

The third layer is where Sum sees the most architectural novelty. HydroDB stores context as a versioned graph that preserves every state transition, connecting entities through relationships rather than rows. "Context graphs and context databases are useful for retrieval, doing things that are more logical instead of cosine similarity," Sum said. "You're finding logical understanding of people's ontologies and then searching through it faster."

The temporal element is what makes it more than a retrieval layer. Sum tracks how the graph evolves over time, so agents can look back and see what the state of a company or a user was at any given point. That's the kind of continuity that gives an agent institutional memory across sessions and across model swaps.

Five billion tokens a day, and the model still forgets

Jeffrey Lee-Chan is Head of Vibe Coding at Snap, where he's built multi-agent code review pipelines that run without human oversight. But the project that taught him the most about agent memory isn't at a multibillion-dollar public company. It's an AI-powered role-playing game he builds on personal time, and it consumes roughly 5 billion tokens a day.

"The AI RPG is totally different," Lee-Chan said. "The model is responsible for creating a world that seems realistic and it doesn't forget who your uncle is." Every session has to pick up where the last one left off. Characters, relationships, events, decisions, all of it has to persist. And the model can't hold all of it in the context window. "You have a finite context space. You have context rot. You need to do a search somehow."

That volume forced Lee-Chan to think about agent memory as a cost engineering problem. The solutions he found changed how he thinks about state management entirely.

The past is immutable. That's a caching strategy.

History doesn't change. So Lee-Chan stopped recomputing it. In his RPG, events that already happened and decisions already made never get rewritten, which means they can be cached. "Most people aren't going to change the past in the game," he said. "So all of those things can stay as cache tokens." By treating settled state as cache-friendly and only recomputing what's actively evolving, he cut costs significantly without losing any fidelity in the agent's responses.

The same principle applies well beyond games. Any agent system has a mix of settled state (completed actions, confirmed decisions, historical context) and active state (current reasoning, in-progress tasks, live user interaction). Treating those differently at the data layer is the difference between a system that scales affordably and one that recomputes everything on every request.

Lee-Chan also adopted RTK, a CLI proxy that compresses command outputs before they reach the context window, and saw his token usage drop roughly in half. "Maybe I'll pay an hour of maintenance every week, but that's totally worth it for most people," he said.

Personal stacks lead enterprise

The pattern Lee-Chan describes isn't isolated to side projects. He built an MCP server at Snap that connects to roughly 30 internal services, inspired by the personal tooling he'd already proven on his own time. "Instead of going to a website and copying and pasting something into my Claude Code, it can already talk to 30 different things at Snap," he said.

The idea didn't come from an enterprise directive, and he sees this as a structural pattern. "The personal stuff will always be ahead of enterprise for the power users. And then you figure out how to bring it into your workplace, with more restrictions." The emerging indie stack he describes is a central AI agent controlling a fleet of background workers. "I think you could have shareable concepts, and they might not even be code," he said. "It's a concept you share around."

That bottom-up adoption is what turns experimental tooling into production infrastructure. And the teams that build their database layer to support persistent agent state early are the ones who won't have to rebuild it later.

The durable layer

Sum and Lee-Chan arrived at the same conclusion from different starting points. Sum designed a multi-database architecture where each store handles a different type of agent state. Lee-Chan learned, through the daily pressure of 5 billion tokens and a world that has to remember everything, that the data layer is where persistence either works or doesn't.

The model is a component you swap. The memory layer is what gives the agent its identity. "Even if you change the model, you keep all that knowledge in the memory," Sum said. The teams building that layer deliberately, choosing the right stores for the right types of state and designing for persistence from day one, are building the only part of the AI stack that survives the next model cycle.

The signal, once a week

Reporting, contributor perspectives and sharp notes from the people building with Supabase in the real world. No noise, no spam.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Global Indexes, A Reversed Commit, And A Dead Extension Model: How Postgres Features Actually Reach Users

Side-Stepping The Data Movement Tax: How Fragmented Data Stacks Are Converging Back Onto Postgres

Agent Workloads Are Just Beginning To Expose What Row-Level Security Really Costs

Repack Took a Decade To Graduate Into Postgres Core. The Features Still Waiting Could Take Longer.

OWASP Checked Its AI Risk Rankings Against 6,600 Real Incidents And The Surprises Are All Below The Model

Postgres' Logical Replication Stream Gets A Second Life As A Cache Invalidation Engine

'There Is No Boss': How Postgres Decides Its Future In Public

Everyone Is Handing AI Agents The Keys To Production Postgres. POSETTE’s Sharpest Talks Were About The Brakes.

The Best Lesson From POSETTE’s Vendor Day: When Your Postgres Feels Slow, It's Rarely Postgres

What A Morning With Postgres’ Maintainers Reveals About How The Database Really Gets Made

POSETTE '26: Postgres Has Stopped Trying To Prove It Belongs And Started Absorbing The Stack Around It

How Postgres' Rise To Enterprise Default Is Outpacing The Operational Model Behind It