Database Replication Explained: Primary-Replica, Sync vs Async
A complete guide to PostgreSQL streaming replication. Covers WAL shipping, synchronous vs asynchronous modes, replica lag monitoring, failover with Patroni, and physical vs logical replication trade-offs.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

The Bug That Taught Me How Replica Lag Actually Behaves
A customer opened a ticket claiming their order history was missing the purchase they had just completed. Refresh the page, still missing. Wait a minute, there it was. The checkout service wrote to the primary PostgreSQL instance, then immediately redirected to an order history page that read from a replica. Async streaming replication was running at a healthy 80 ms lag -- which, for a user clicking "View my order" 200 ms after checkout, is an eternity.
The replica was not broken. The dashboard said green. pg_stat_replication on the primary showed a few kilobytes of replay_lag_bytes, well under every alert threshold we had. The bug was that we had treated replication as a binary -- synced or not synced -- when it is actually a distribution of lag measured in milliseconds, and our read-your-writes code path sat squarely in the tail of that distribution.
This guide walks through PostgreSQL streaming replication the way I wish it had been taught to me before that incident: WAL and how it actually gets to the replica, the real difference between async and synchronous commit modes, the lag signals that matter versus the ones that lie, and the failover mechanics that bite at 3 AM. Managed RDS is covered at the end, but the fundamentals below apply equally to self-managed PostgreSQL and to every cloud provider that wraps it.
Physical Replication in PostgreSQL: The 30-Second Mental Model
Every change to a PostgreSQL database -- every INSERT, UPDATE, DELETE, and DDL statement -- is first written to the Write-Ahead Log (WAL) before the actual data pages are modified. Physical streaming replication works by piping those WAL records over a TCP connection to one or more follower (replica) servers, which replay them in the same order. The replica ends up as a byte-for-byte copy of the primary, available for read-only queries or as a failover target.
Two variants exist: physical (ships raw WAL bytes, whole-cluster, read-only replicas) and logical (decodes WAL into row-level change events, per-table, writable targets). Physical is what you want for HA and read scaling; logical is for selective sync, cross-version migration, and change data capture pipelines.
Physical vs Logical: The Full Comparison
| Feature | Physical (Streaming) | Logical |
|---|---|---|
| What's replicated | Byte-level WAL changes | Row-level change events |
| Replica is writable? | No (read-only) | Yes |
| Selective replication | Entire cluster only | Per-table, per-database |
| Cross-version support | Same major version only | Different major versions OK |
| DDL replication | Yes (automatic) | No (manual) |
| Performance overhead | Lower | Higher (decoding step) |
| Use case | HA failover, read replicas | Selective sync, migrations, CDC |
The rest of this guide focuses on physical streaming replication -- the variant that covers HA and read scaling and the one that produced the stale-read bug above. Logical replication gets a dedicated treatment in the sharding and CDC articles.
How to Set Up PostgreSQL Streaming Replication
Here's the step-by-step process for configuring physical streaming replication between a primary and a replica.
- Configure the primary server -- set
wal_level = replica,max_wal_senders = 10, andwal_keep_size = 1GBinpostgresql.conf - Create a replication user --
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password'; - Update pg_hba.conf -- allow the replica's IP to connect with the replication role
- Take a base backup -- use
pg_basebackup -h primary_host -U replicator -D /var/lib/postgresql/data -Fp -Xs -P - Configure the replica -- create
standby.signaland setprimary_conninfoinpostgresql.conf - Start the replica -- it connects to the primary and begins streaming WAL
- Verify replication -- check
pg_stat_replicationon the primary andpg_stat_wal_receiveron the replica
-- On the primary: check replication status
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
(sent_lsn - replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
-- On the replica: confirm it's receiving WAL
SELECT status, received_lsn, latest_end_lsn,
latest_end_time
FROM pg_stat_wal_receiver;
# postgresql.conf on the primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
hot_standby = on
# postgresql.conf on the replica
primary_conninfo = 'host=primary_host port=5432 user=replicator password=secure_password'
hot_standby = on
Synchronous vs Asynchronous Replication
This is where the real engineering trade-off lives. Async replication is the default. Sync replication is available but comes with a significant cost.
Asynchronous Replication (Default)
The primary writes WAL to disk, sends it to replicas, and commits the transaction immediately -- without waiting for replicas to acknowledge. This means the primary is never slowed down by replica performance, but there's a window where the replica is behind. If the primary crashes during that window, those transactions are lost on the replica.
Synchronous Replication
The primary waits for at least one replica to confirm it has received (and optionally flushed to disk) the WAL before committing. This guarantees zero data loss on failover, but every write is now limited by network latency to the replica.
-- Enable synchronous replication on the primary
-- postgresql.conf
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'
-- Control the durability guarantee per-transaction
SET synchronous_commit = 'remote_apply'; -- Strongest: wait for replay
SET synchronous_commit = 'remote_write'; -- Wait for OS write (not fsync)
SET synchronous_commit = 'on'; -- Wait for local fsync only (async)
SET synchronous_commit = 'off'; -- Don't wait for any fsync
| synchronous_commit | Data Loss Risk | Write Latency Impact | Use Case |
|---|---|---|---|
remote_apply | Zero | Highest (replica replay time) | Financial transactions |
remote_write | Near-zero (OS crash only) | Moderate (network round-trip) | Most HA setups |
on | Replica may lag | None (async to replica) | Default, most workloads |
off | Local crash loses data | Lowest | Bulk imports, non-critical writes |
Pro tip: You can set
synchronous_commitper transaction. Useremote_applyfor payment processing andofffor logging or analytics writes. This gives you strong guarantees where they matter without penalizing every write on the system.
Replica Lag: The Silent Problem
Replica lag is the delay between a change being committed on the primary and that change being visible on the replica. With async replication, some lag is expected. The question is how much and how to detect it.
-- Monitor lag on the replica itself
SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;
-- Monitor from the primary (more accurate)
SELECT client_addr,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
replay_lag
FROM pg_stat_replication;
-- Set up alerting: lag over 30 seconds is a problem
-- lag over 5 minutes means your replica might be useless for reads
Common causes of replica lag: the replica has slower disks than the primary, long-running queries on the replica block WAL replay (check max_standby_streaming_delay), or a burst of writes on the primary overwhelms the replica's replay capacity. I've seen teams route read traffic to replicas without monitoring lag, then spend days debugging why users see stale data.
Watch out: If your application reads from a replica immediately after writing to the primary, you'll get stale reads. Either read from the primary after writes, use synchronous replication, or add application-level logic to handle eventual consistency.
Failover: The Hard Part
Setting up replication is straightforward. Handling failover correctly is where teams stumble. When the primary goes down, you need to:
- Detect the failure -- is the primary actually down, or is there a network partition?
- Promote a replica -- run
pg_promote()orpg_ctl promoteon the chosen replica - Redirect application traffic -- update DNS, load balancer, or connection string
- Handle the old primary -- when it comes back, it can't rejoin as primary (split-brain risk)
- Rebuild replication -- use
pg_rewindto re-sync the old primary as a new replica
# Promote a replica to primary
pg_ctl promote -D /var/lib/postgresql/data
# Or from SQL (PostgreSQL 12+)
SELECT pg_promote();
# Rewind the old primary to follow the new primary
pg_rewind --target-pgdata=/var/lib/postgresql/data --source-server="host=new_primary port=5432"
Automated failover tools like Patroni, repmgr, and pg_auto_failover handle steps 1 through 4. Patroni with etcd is the most battle-tested option and the one I'd recommend for production. It handles leader election, fencing, and replica promotion with minimal manual intervention.
Managed PostgreSQL Replication: Provider Comparison
If you're running managed PostgreSQL, replication setup and failover are handled for you -- but the details differ significantly.
| Provider | Failover Time | Read Replicas | Cross-Region | Starting Price/mo |
|---|---|---|---|---|
| AWS RDS | 60-120 seconds | Up to 15 | Yes | ~$190 (Multi-AZ) |
| Google Cloud SQL | ~60 seconds | Up to 10 | Yes | ~$175 (HA) |
| Azure Database | ~60 seconds | Up to 5 | Yes | ~$200 (HA) |
| Supabase | Automatic | Up to 2 | No | $25 (Pro) |
| Neon | Instant (compute) | Via branching | Limited | $19 (Launch) |
Managed services handle the operational complexity, but you still need to understand replication to make informed decisions about consistency, read routing, and failover behavior. Don't assume "managed" means "no replication knowledge needed."
When to Use Each Replication Strategy
Here's my decision framework after setting up replication at dozens of companies:
- Start with a single primary and async streaming replicas -- this covers 80% of use cases
- Add synchronous replication only for zero-data-loss requirements -- financial systems, compliance-driven workloads
- Use logical replication for selective sync -- when you need specific tables replicated to a different system or PostgreSQL version
- Deploy Patroni for automated failover -- manual failover is a liability at 3 AM
- Monitor replica lag from day one -- set alerts at 10 seconds and 60 seconds
Frequently Asked Questions
What is the difference between physical and logical replication in PostgreSQL?
Physical replication ships raw WAL bytes and creates an exact binary copy of the entire cluster. The replica is read-only. Logical replication decodes WAL into row-level changes and can replicate specific tables to a writable target. Use physical for HA failover and read scaling. Use logical for selective sync, cross-version migrations, or change data capture.
How much replica lag is acceptable?
It depends on your application's tolerance for stale reads. For most web applications, under 1 second of lag is fine. For financial dashboards or inventory systems, even 100 milliseconds matters. Monitor with pg_stat_replication and alert at thresholds that match your business requirements. If zero lag is required, use synchronous replication.
Can a PostgreSQL replica handle write queries?
Physical replicas are strictly read-only. Any write attempt returns an error. Logical replication targets can accept writes, but you must handle conflicts manually -- if the same row is modified on both sides, PostgreSQL won't resolve it automatically. Bidirectional logical replication is possible but complex and error-prone.
What happens when the primary server crashes?
With async replication, transactions that were committed on the primary but not yet shipped to replicas are lost. With synchronous replication (remote_apply), no committed transactions are lost. In both cases, you need to promote a replica to primary and redirect traffic. Tools like Patroni automate this process.
How does Patroni handle PostgreSQL failover?
Patroni uses a distributed consensus store (etcd, ZooKeeper, or Consul) to elect a leader. When the primary becomes unreachable, Patroni nodes hold an election, the winning replica promotes itself, and other replicas reconfigure to follow the new primary. The entire process typically completes in 10-30 seconds with properly tuned timeouts.
Should I use replication or backups for disaster recovery?
Both. Replication provides fast failover (seconds to minutes) but protects against hardware failure, not human error. If someone runs DROP TABLE, that statement replicates immediately. Backups (pg_dump or continuous WAL archiving with pg_basebackup) let you restore to a specific point in time. Use replication for availability and backups for recovery.
What is WAL shipping and how does it relate to streaming replication?
WAL shipping is the older method where completed WAL files (16 MB each by default) are copied to a replica via archive_command. Streaming replication is the modern approach where WAL records are sent to replicas in real time over a TCP connection. Streaming has much lower lag. WAL archiving is still useful as a fallback if the streaming connection drops.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
SQLite at the Edge: When libSQL Beats Postgres
SQLite at the edge via libSQL embedded replicas and Cloudflare D1 delivers 2-5ms reads worldwide versus 20-100ms for Postgres read replicas. Real benchmarks, pricing comparisons, production failure modes, and a decision framework for when edge SQLite wins and when Postgres-with-replicas is still the right call.
15 min read
DatabasesSnowflake vs BigQuery vs Databricks vs Redshift (2026): Which Data Warehouse?
Snowflake wins on concurrency, BigQuery on serverless simplicity, Databricks on ML, Redshift on AWS depth. Real 2026 pricing, TPC-DS benchmarks, and a clear decision matrix.
16 min read
DatabasesPostgres as a Queue: When You Don't Need Kafka or RabbitMQ
Build a production-grade task queue entirely in PostgreSQL using SELECT FOR UPDATE SKIP LOCKED, LISTEN/NOTIFY, exponential backoff retries, and dead-letter handling. Covers PGMQ, Graphile Worker, River, and when a dedicated broker actually earns its keep.
14 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.