Database Replication: Primary-Replica, Sync vs Async

The Bug That Taught Me How Replica Lag Actually Behaves

A customer opened a ticket claiming their order history was missing the purchase they had just completed. Refresh the page, still missing. Wait a minute, there it was. The checkout service wrote to the primary PostgreSQL instance, then immediately redirected to an order history page that read from a replica. Async streaming replication was running at a healthy 80 ms lag -- which, for a user clicking "View my order" 200 ms after checkout, is an eternity.

The replica was not broken. The dashboard said green. pg_stat_replication on the primary showed a few kilobytes of replay_lag_bytes, well under every alert threshold we had. The bug was that we had treated replication as a binary -- synced or not synced -- when it is actually a distribution of lag measured in milliseconds, and our read-your-writes code path sat squarely in the tail of that distribution.

This guide walks through PostgreSQL streaming replication the way I wish it had been taught to me before that incident: WAL and how it actually gets to the replica, the real difference between async and synchronous commit modes, the lag signals that matter versus the ones that lie, and the failover mechanics that bite at 3 AM. Managed RDS is covered at the end, but the fundamentals below apply equally to self-managed PostgreSQL and to every cloud provider that wraps it.

Physical Replication in PostgreSQL: The 30-Second Mental Model

Every change to a PostgreSQL database -- every INSERT, UPDATE, DELETE, and DDL statement -- is first written to the Write-Ahead Log (WAL) before the actual data pages are modified. Physical streaming replication works by piping those WAL records over a TCP connection to one or more follower (replica) servers, which replay them in the same order. The replica ends up as a byte-for-byte copy of the primary, available for read-only queries or as a failover target.

Two variants exist: physical (ships raw WAL bytes, whole-cluster, read-only replicas) and logical (decodes WAL into row-level change events, per-table, writable targets). Physical is what you want for HA and read scaling; logical is for selective sync, cross-version migration, and change data capture pipelines.

Physical vs Logical: The Full Comparison

Feature	Physical (Streaming)	Logical
What's replicated	Byte-level WAL changes	Row-level change events
Replica is writable?	No (read-only)	Yes
Selective replication	Entire cluster only	Per-table, per-database
Cross-version support	Same major version only	Different major versions OK
DDL replication	Yes (automatic)	No (manual)
Performance overhead	Lower	Higher (decoding step)
Use case	HA failover, read replicas	Selective sync, migrations, CDC

The rest of this guide focuses on physical streaming replication -- the variant that covers HA and read scaling and the one that produced the stale-read bug above. Logical replication gets a dedicated treatment in the sharding and CDC articles.

How to Set Up PostgreSQL Streaming Replication

Here's the step-by-step process for configuring physical streaming replication between a primary and a replica.

Configure the primary server -- set wal_level = replica, max_wal_senders = 10, and wal_keep_size = 1GB in postgresql.conf
Create a replication user -- CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
Update pg_hba.conf -- allow the replica's IP to connect with the replication role
Take a base backup -- use pg_basebackup -h primary_host -U replicator -D /var/lib/postgresql/data -Fp -Xs -P
Configure the replica -- create standby.signal and set primary_conninfo in postgresql.conf
Start the replica -- it connects to the primary and begins streaming WAL
Verify replication -- check pg_stat_replication on the primary and pg_stat_wal_receiver on the replica

-- On the primary: check replication status
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       (sent_lsn - replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

-- On the replica: confirm it's receiving WAL
SELECT status, received_lsn, latest_end_lsn,
       latest_end_time
FROM pg_stat_wal_receiver;

# postgresql.conf on the primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
hot_standby = on

# postgresql.conf on the replica
primary_conninfo = 'host=primary_host port=5432 user=replicator password=secure_password'
hot_standby = on

Synchronous vs Asynchronous Replication

This is where the real engineering trade-off lives. Async replication is the default. Sync replication is available but comes with a significant cost.

Asynchronous Replication (Default)

The primary writes WAL to disk, sends it to replicas, and commits the transaction immediately -- without waiting for replicas to acknowledge. This means the primary is never slowed down by replica performance, but there's a window where the replica is behind. If the primary crashes during that window, those transactions are lost on the replica.

Synchronous Replication

The primary waits for at least one replica to confirm it has received (and optionally flushed to disk) the WAL before committing. This guarantees zero data loss on failover, but every write is now limited by network latency to the replica.

-- Enable synchronous replication on the primary
-- postgresql.conf
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'

-- Control the durability guarantee per-transaction
SET synchronous_commit = 'remote_apply';  -- Strongest: wait for replay
SET synchronous_commit = 'remote_write';  -- Wait for OS write (not fsync)
SET synchronous_commit = 'on';           -- Wait for local fsync only (async)
SET synchronous_commit = 'off';          -- Don't wait for any fsync

synchronous_commit	Data Loss Risk	Write Latency Impact	Use Case
`remote_apply`	Zero	Highest (replica replay time)	Financial transactions
`remote_write`	Near-zero (OS crash only)	Moderate (network round-trip)	Most HA setups
`on`	Replica may lag	None (async to replica)	Default, most workloads
`off`	Local crash loses data	Lowest	Bulk imports, non-critical writes

Pro tip: You can set synchronous_commit per transaction. Use remote_apply for payment processing and off for logging or analytics writes. This gives you strong guarantees where they matter without penalizing every write on the system.

Replica Lag: The Silent Problem

Replica lag is the delay between a change being committed on the primary and that change being visible on the replica. With async replication, some lag is expected. The question is how much and how to detect it.

-- Monitor lag on the replica itself
SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;

-- Monitor from the primary (more accurate)
SELECT client_addr,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
       replay_lag
FROM pg_stat_replication;

-- Set up alerting: lag over 30 seconds is a problem
-- lag over 5 minutes means your replica might be useless for reads

Common causes of replica lag: the replica has slower disks than the primary, long-running queries on the replica block WAL replay (check max_standby_streaming_delay), or a burst of writes on the primary overwhelms the replica's replay capacity. I've seen teams route read traffic to replicas without monitoring lag, then spend days debugging why users see stale data.

Watch out: If your application reads from a replica immediately after writing to the primary, you'll get stale reads. Either read from the primary after writes, use synchronous replication, or add application-level logic to handle eventual consistency.

Failover: The Hard Part

Setting up replication is straightforward. Handling failover correctly is where teams stumble. When the primary goes down, you need to:

Detect the failure -- is the primary actually down, or is there a network partition?
Promote a replica -- run pg_promote() or pg_ctl promote on the chosen replica
Redirect application traffic -- update DNS, load balancer, or connection string
Handle the old primary -- when it comes back, it can't rejoin as primary (split-brain risk)
Rebuild replication -- use pg_rewind to re-sync the old primary as a new replica

# Promote a replica to primary
pg_ctl promote -D /var/lib/postgresql/data

# Or from SQL (PostgreSQL 12+)
SELECT pg_promote();

# Rewind the old primary to follow the new primary
pg_rewind --target-pgdata=/var/lib/postgresql/data           --source-server="host=new_primary port=5432"

Automated failover tools like Patroni, repmgr, and pg_auto_failover handle steps 1 through 4. Patroni with etcd is the most battle-tested option and the one I'd recommend for production. It handles leader election, fencing, and replica promotion with minimal manual intervention.

Managed PostgreSQL Replication: Provider Comparison

If you're running managed PostgreSQL, replication setup and failover are handled for you -- but the details differ significantly.

Provider	Failover Time	Read Replicas	Cross-Region	Starting Price/mo
AWS RDS	60-120 seconds	Up to 15	Yes	~$190 (Multi-AZ)
Google Cloud SQL	~60 seconds	Up to 10	Yes	~$175 (HA)
Azure Database	~60 seconds	Up to 5	Yes	~$200 (HA)
Supabase	Automatic	Up to 2	No	$25 (Pro)
Neon	Instant (compute)	Via branching	Limited	$19 (Launch)

Managed services handle the operational complexity, but you still need to understand replication to make informed decisions about consistency, read routing, and failover behavior. Don't assume "managed" means "no replication knowledge needed."

When to Use Each Replication Strategy

Here's my decision framework after setting up replication at dozens of companies:

Start with a single primary and async streaming replicas -- this covers 80% of use cases
Add synchronous replication only for zero-data-loss requirements -- financial systems, compliance-driven workloads
Use logical replication for selective sync -- when you need specific tables replicated to a different system or PostgreSQL version
Deploy Patroni for automated failover -- manual failover is a liability at 3 AM
Monitor replica lag from day one -- set alerts at 10 seconds and 60 seconds

Frequently Asked Questions

What is the difference between physical and logical replication in PostgreSQL?

Physical replication ships raw WAL bytes and creates an exact binary copy of the entire cluster. The replica is read-only. Logical replication decodes WAL into row-level changes and can replicate specific tables to a writable target. Use physical for HA failover and read scaling. Use logical for selective sync, cross-version migrations, or change data capture.

How much replica lag is acceptable?

It depends on your application's tolerance for stale reads. For most web applications, under 1 second of lag is fine. For financial dashboards or inventory systems, even 100 milliseconds matters. Monitor with pg_stat_replication and alert at thresholds that match your business requirements. If zero lag is required, use synchronous replication.

Can a PostgreSQL replica handle write queries?

Physical replicas are strictly read-only. Any write attempt returns an error. Logical replication targets can accept writes, but you must handle conflicts manually -- if the same row is modified on both sides, PostgreSQL won't resolve it automatically. Bidirectional logical replication is possible but complex and error-prone.

What happens when the primary server crashes?

With async replication, transactions that were committed on the primary but not yet shipped to replicas are lost. With synchronous replication (remote_apply), no committed transactions are lost. In both cases, you need to promote a replica to primary and redirect traffic. Tools like Patroni automate this process.

How does Patroni handle PostgreSQL failover?

Patroni uses a distributed consensus store (etcd, ZooKeeper, or Consul) to elect a leader. When the primary becomes unreachable, Patroni nodes hold an election, the winning replica promotes itself, and other replicas reconfigure to follow the new primary. The entire process typically completes in 10-30 seconds with properly tuned timeouts.

Should I use replication or backups for disaster recovery?

Both. Replication provides fast failover (seconds to minutes) but protects against hardware failure, not human error. If someone runs DROP TABLE, that statement replicates immediately. Backups (pg_dump or continuous WAL archiving with pg_basebackup) let you restore to a specific point in time. Use replication for availability and backups for recovery.

What is WAL shipping and how does it relate to streaming replication?

WAL shipping is the older method where completed WAL files (16 MB each by default) are copied to a replica via archive_command. Streaming replication is the modern approach where WAL records are sent to replicas in real time over a TCP connection. Streaming has much lower lag. WAL archiving is still useful as a fallback if the streaming connection drops.

Database Replication Explained: Primary-Replica, Sync vs Async