All articles

Postgres' Logical Replication Stream Gets A Second Life As A Cache Invalidation Engine

Two co-founders building a Postgres caching proxy discovered that logical replication is a better invalidation mechanism than anything the caching ecosystem has come up with on its own. The hard part was teaching the system to understand SQL.

Credit: The Read Replica

Make The Read Replica one of your go-to sources on Google

Add The Read Replica on Google

The value is: don't serve stale data, and don't make the user have to do anything.

James Nelson

Co-Founder & CTO

PgCache

Logical replication landed in PostgreSQL 10 and the ecosystem immediately knew what to do with it: read replicas. The feature gave teams a way to subscribe to row-level changes on specific tables without copying the entire WAL byte-for-byte, and the primary use case was obvious. Spin up a subscriber, point reads at it, free up the primary for writes.

Nearly a decade later, that framing has calcified. Most Postgres teams encounter logical replication in exactly one context: setting up a replica for read scaling or a major version upgrade. The official documentation lists half a dozen other applications, from cross-platform replication to database consolidation, but the mental model has locked in. Logical replication means read replicas.

But there's a more interesting application that almost nobody is building for: using the logical replication stream as an automatic cache invalidation mechanism.

The engineer who wanted to cache from the edge

James Nelson is the technical co-founder of PgCache, a Postgres caching proxy that uses logical replication to keep cached query results in sync with the origin database. His co-founder, Philip Johnston, handles the product and go-to-market side. The two are building what they call a "smart read replica," a term they landed on after the caching explanation kept losing people. PgCache is early. It's source-available under the Elastic License v2, and production deployments are minimal so far. But the engineering question it's built on stands regardless of where the product is.

It all started with a latency problem. Nelson had been working at a digital asset management company that served assets from the edge but still routed every database query back to a single origin in New York. "I was like, 'hey, I want to serve 100% of this from the edge. There's got to be caching tools where I could stick it on the edge and keep it up to date,'" he told The Read Replica. "But it didn't exist."

He'd had his eye on logical replication, and one question kept nagging: what if you could build a cache that consumed the change stream and used it to invalidate or update cached results automatically, without any application-side code? No TTLs to guess at, no hand-rolled invalidation logic. The database would tell the cache what changed, and the cache would figure out which results were affected. The idea stayed with him until he connected with Johnston and decided the problem was worth solving properly.

Caching has a cognitive problem

The reason logical replication got pigeonholed into the read replica use case is partly technical and partly psychological. On the technical side, the feature was designed to replicate table data between Postgres instances, and read replicas are the most natural consumer. On the psychological side, caching has become what Johnston called "a cognitive monolith."

"With Redis and memcached and all these key-value store approaches to caching, where you're setting time-to-live and doing all this stuff as ways of managing invalidation, caching has become really heavy for people to think about," Johnston said. The traditional approach forces developers through a gauntlet of decisions before they write a single line of cache logic: what to cache, how to key it, when to expire it, and how to handle the inevitable stale reads when the invalidation logic misses an edge case.

The mechanics of that gauntlet are familiar to anyone who's tried it. You pick a TTL, but TTLs are a guess: too short and you're hammering the database on every expiry cycle, too long and you're serving stale data. You can write through the cache on every update instead, but then you need to track which cached keys a given row touches, and that mapping gets complicated the moment you're caching join results or aggregations. A user updates their email, and now you have to remember every cached query that included that user's profile. Miss one and the application keeps serving the old value with no error and nothing in the logs to flag the inconsistency.

Nelson has been on the other side of that calculus. "In my career I've been like, 'yeah, it'd be nice if I had caching. I don't want to spend two weeks figuring this out. I don't know what the invalidation scenario is,'" he said. "'Caching would be great, but it's not going to work.'" The cost-benefit math often kills the project before it starts. Teams look at the engineering time required to implement and maintain cache invalidation logic, weigh it against the read pressure they're dealing with, and decide to just throw more hardware at the database instead.

Phil Karlton's famous line about cache invalidation being one of the two hard problems in computer science has persisted for decades because the fundamental challenge hasn't changed. When a row changes in the database, figuring out which cached results that row touches is genuinely difficult, and the difficulty scales with query complexity.

Nelson's initial instinct was that logical replication could eliminate invalidation entirely. "My original first hope was, 'hey, we're never going to have to invalidate anything,'" he said. "That ran into a pretty big stone wall pretty quickly."

But the replication stream gave him something more nuanced. "The value is: don't serve stale data, and don't make the user have to do anything," Nelson said. The goal isn't perfect freshness. It's safe staleness, where the cache either serves correct data or gets out of the way and forwards the query to the origin.

The SQL parsing wall

The hardest engineering problem in the system has nothing to do with replication and everything to do with SQL comprehension. When a row changes on the origin database and the change arrives on the replication stream, the cache needs to determine which cached query results are affected. For a single-table query with a simple WHERE clause, that's tractable. For joins, it gets hard fast.

"The first test was feasibility," said Nelson. "If I have a single table and simple queries, can I access the stream and keep the data up to date? Great. Worked for single table. Let me work on joins. Oh yeah, this is a bunch harder."

The complexity curve steepens at every level. Two-table joins are manageable. Three and four-table joins introduce cases where the cache can't keep data fresh from the stream alone and has to fall back to invalidation, fetching the result fresh from the origin. Subqueries add another layer. Correlated subqueries, GROUP BY, HAVING clauses, window functions: each one expands the space of query shapes the system needs to understand well enough to make a correctness decision.

Nelson spent a large portion of his development time on exactly this problem: mapping the landscape of SQL constructs and figuring out which ones the cache could handle reliably. "We talked to a bunch of people and they shared their SQL, and man, there's some crazy SQL out there," he said. Some query shapes will remain unsupported for a while.

This is the piece of the problem that generalizes beyond any single product. Any system that tries to sit between the application and the database and make intelligent decisions about data freshness runs into the same wall. SQL is expressive enough that understanding the relationship between a data change and a query result is itself a hard problem, and the difficulty isn't linear in the number of tables involved.

What the replication stream can't see

The limits of logical replication as a caching primitive show up most clearly around row-level security. RLS lets Postgres enforce per-user or per-role access policies at the database level: same query, different results depending on who's asking.

"There's not enough in the logical replication stream to actually support RLS. It's going to require some creative solutions on how to get the necessary information from origin to the caches without violating security," said Nelson.

For now, PgCache detects RLS-enabled tables and declines to cache them. It's an honest architectural constraint, and it says something broader about what logical replication was designed to expose. The stream gives you row-level changes with table and column information, but it doesn't carry session context or role information. Any downstream consumer that needs to make access-control decisions will have to find another path to that data.

PostgreSQL 19, currently in beta, introduces the WAIT FOR LSN command, which lets a replica session block until it's replayed up to a specific log position. It's a step toward better consistency guarantees on the replication stream, and Nelson flagged it as something PgCache intends to support. But it doesn't solve the RLS visibility gap. That's a deeper limitation baked into what the stream was designed to carry.

The read replica framing isn't wrong, but it's incomplete

The replication stream is a change data feed. Read replicas are one consumer of that feed, and they've been such a dominant one that the ecosystem has largely stopped imagining others. But the primitive underneath, a structured, real-time stream of row-level changes delivered to subscribers, has applications that the read replica framing obscures.

Johnston described PgCache as sitting "between a read replica and a cache," and that positioning captures something real about the architectural gap. A traditional read replica copies your entire database, including all the cold data your application never touches. A traditional cache requires developers to make all the decisions about what to store and when to throw it away. The space between those two approaches, where the database's own change stream drives cache behavior automatically, is mostly unexplored.

"If nobody ever spun up a replica just to handle read load again, that would be success," said Nelson. It's an ambitious target. But if the idea gains traction, the implications reach further than one product's adoption numbers. If the replication stream becomes a standard invalidation mechanism, caching stops being a separate engineering project that teams budget weeks for and starts being something closer to a configuration decision. The read replica itself stops being a binary, a full copy of your database or nothing, and becomes a spectrum where teams can choose how much data to replicate and how. The application-side cache logic that currently lives in hundreds of lines of Redis integration code collapses into a connection string change.

That's a meaningful shift in how Postgres teams think about read scaling. And the underlying argument, that Postgres ships primitives the ecosystem tends to slot into a single canonical use case and then stop thinking about, extends beyond caching. The same pattern plays out with foreign data wrappers and with the extension framework more broadly. Postgres keeps handing the community building blocks that get filed into one obvious application and left there. The interesting work happens when someone pulls them out of that default context.

The signal, once a week

Reporting, contributor perspectives and sharp notes from the people building with Supabase in the real world. No noise, no spam.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.