Databases

PostgreSQL Indexes Explained: B-Tree, GIN, GiST, Hash, and BRIN

A complete guide to PostgreSQL index types. Learn when to use B-Tree, Hash, GIN, GiST, BRIN, and partial indexes with real SQL examples, EXPLAIN ANALYZE verification, and index maintenance strategies.

A
Abhishek Patel14 min read

Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

PostgreSQL Indexes Explained: B-Tree, GIN, GiST, Hash, and BRIN
PostgreSQL Indexes Explained: B-Tree, GIN, GiST, Hash, and BRIN

The Single Biggest Performance Win in PostgreSQL

A missing PostgreSQL index is the most common reason queries go from 5 milliseconds to 5 seconds. Adding the right index on the right column is the single most impactful query optimization available to you -- more than rewriting SQL, more than tuning work_mem, more than throwing hardware at the problem. But PostgreSQL ships with six index types, and choosing the wrong one can make writes slower without helping reads at all.

This guide covers every index type PostgreSQL offers, when each one is the right choice, and how to verify your indexes are actually being used. If you've ever stared at a slow query and wondered whether to add an index or rewrite the query, start here.

What Is a PostgreSQL Index?

Definition: A PostgreSQL index is a separate data structure, maintained alongside the table, that provides the database with a fast lookup path to rows matching a query condition. Without an index, PostgreSQL must scan every row in the table sequentially. With one, it can jump directly to matching rows.

Indexes don't change your data. They're an additional structure that the query planner considers when deciding how to execute a query. PostgreSQL creates them in the background (with CONCURRENTLY) or blocks writes during creation (without it). Every index carries a cost: disk space, and slower writes because the index must be updated on every INSERT, UPDATE, or DELETE.

PostgreSQL Index Types: Complete Comparison

Here's when to use each index type. If you're unsure, B-Tree is almost certainly the right choice.

Index TypeBest ForSupports ORDER BYSupports Range QueriesMulti-Column
B-TreeEquality, range, sortingYesYesYes
HashEquality onlyNoNoNo
GINFull-text search, JSONB, arraysNoNoYes
GiSTGeometry, ranges, nearest-neighborNoYes (ranges)Yes
BRINHuge tables with naturally ordered dataNoYesYes
PartialSubsets of rows matching a WHERE clauseDepends on typeDepends on typeYes

B-Tree: The Default (And Usually the Right Choice)

When you run CREATE INDEX without specifying a type, you get a B-Tree. It handles equality (=), range (<, >, BETWEEN), sorting (ORDER BY), and IS NULL checks. B-Tree indexes store data in sorted order, which is why they're so versatile.

-- Basic B-Tree index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- Multi-column B-Tree (column order matters!)
CREATE INDEX idx_orders_status_created ON orders (status, created_at DESC);

-- The planner can use this for:
-- WHERE status = 'pending'
-- WHERE status = 'pending' AND created_at > '2025-01-01'
-- ORDER BY status, created_at DESC
-- But NOT for: WHERE created_at > '2025-01-01' alone (first column not referenced)

Pro tip: Multi-column B-Tree indexes follow the "leftmost prefix" rule. An index on (a, b, c) can satisfy queries on (a), (a, b), or (a, b, c) -- but not (b) or (c) alone. Put the most selective column first, or the column that appears in equality conditions.

B-Tree is the right choice for 90% of indexes you'll create. It's well-optimized, well-understood, and covers most query patterns. Only consider other types when B-Tree can't handle your data type or query pattern.

Hash Indexes: Rarely the Right Choice

Hash indexes support only equality comparisons (=). They can't handle range queries, sorting, or IS NULL. Before PostgreSQL 10, they weren't even crash-safe -- they weren't written to the WAL.

-- Hash index (equality only)
CREATE INDEX idx_sessions_token ON sessions USING hash (token);

-- Works for: WHERE token = 'abc123'
-- Does NOT work for: WHERE token LIKE 'abc%' or ORDER BY token

In theory, hash indexes are faster than B-Tree for pure equality lookups because the hash function maps directly to a bucket. In practice, the difference is negligible for most workloads, and you lose all the flexibility of B-Tree. I've used hash indexes exactly twice in production over 12 years -- both times on lookup tables with hundreds of millions of rows where every query was an exact match on a UUID column.

GIN: Full-Text Search, JSONB, and Arrays

GIN (Generalized Inverted Index) is purpose-built for values that contain multiple elements. Think full-text search vectors, JSONB documents, and array columns. GIN creates an entry for each element and points back to the rows containing it.

-- Full-text search with GIN
CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));

-- Query using the index
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'postgresql & indexes');

-- JSONB containment with GIN
CREATE INDEX idx_events_data ON events USING gin (metadata jsonb_path_ops);

-- Query JSONB
SELECT * FROM events WHERE metadata @> '{"source": "api", "version": 2}';

-- Array overlap with GIN
CREATE INDEX idx_posts_tags ON posts USING gin (tags);
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];

Warning: GIN indexes are expensive to build and update. Every element in the indexed value gets its own index entry. A JSONB column with 50 keys per row creates 50 index entries per row. On write-heavy tables, consider gin_pending_list_limit to batch index updates, accepting slightly stale reads.

If you're building a search feature on top of PostgreSQL and don't want to bring in Elasticsearch, GIN with tsvector is the way to do it. It's remarkably capable for most use cases.

GiST: Geometry, Ranges, and Nearest-Neighbor

GiST (Generalized Search Tree) handles data types where you need overlap, containment, or proximity queries. PostGIS uses GiST extensively for spatial data. It's also the go-to for range types and nearest-neighbor searches.

-- Spatial index with PostGIS
CREATE INDEX idx_locations_geom ON locations USING gist (geom);

-- Find points within 5km
SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-73.9857, 40.7484)::geography, 5000);

-- Range type index
CREATE INDEX idx_reservations_during ON reservations USING gist (during);

-- Find overlapping reservations
SELECT * FROM reservations
WHERE during && tsrange('2025-06-01', '2025-06-15');

-- Nearest-neighbor search (K-NN)
SELECT * FROM locations
ORDER BY geom <-> ST_MakePoint(-73.9857, 40.7484)::geometry
LIMIT 10;

GiST is lossy for some data types -- it may return false positives that PostgreSQL filters out in a recheck step. This is fine; the planner handles it automatically. The key advantage over B-Tree is that GiST can answer "find me the 10 nearest points" efficiently, something B-Tree cannot do at all.

BRIN: Tiny Indexes for Huge Tables

BRIN (Block Range Index) is the smallest index type available. Instead of indexing every row, it stores summary information (min/max values) for ranges of physical table blocks. This works brilliantly when data is inserted in roughly sorted order -- think time-series data, log tables, or append-only event streams.

-- BRIN index on a time-series table
CREATE INDEX idx_events_created ON events USING brin (created_at);

-- A table with 500 million rows might have:
-- B-Tree index: 10 GB
-- BRIN index: 50 MB

-- BRIN works because rows inserted at 10:00 AM are physically near other 10:00 AM rows
-- The planner knows "block 4500-4600 contains timestamps between 10:00 and 10:15"

Pro tip: BRIN is useless if your table has been heavily updated or if data wasn't inserted in order. Run SELECT correlation FROM pg_stats WHERE tablename = 'events' AND attname = 'created_at' -- if the value is close to 1.0 or -1.0, BRIN will work well. Below 0.9, stick with B-Tree.

BRIN's storage savings are dramatic. For a 100 GB table, a B-Tree index on a timestamp column might consume 2-3 GB. The equivalent BRIN index uses a few megabytes. On append-only tables where you query by time range, BRIN is almost always the right call.

Partial Indexes: Index Only What Matters

A partial index includes a WHERE clause that limits which rows are indexed. If 95% of your orders are completed and you only ever query pending ones, why index all of them?

-- Only index pending orders
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

-- This index is tiny compared to a full index
-- And it's faster because the planner skips 95% of the table

-- Partial unique index (enforce uniqueness only for active records)
CREATE UNIQUE INDEX idx_users_email_active ON users (email)
WHERE deleted_at IS NULL;

-- Allows duplicate emails for soft-deleted users
-- But enforces unique emails for active users

Partial indexes are one of PostgreSQL's most underused features. They're smaller, faster to scan, and cheaper to maintain. Any time you have a status column where you only query a small subset, a partial index should be your first thought.

Verifying Index Usage with EXPLAIN ANALYZE

Creating an index doesn't guarantee PostgreSQL will use it. The query planner might decide a sequential scan is faster -- and it's often right for small tables or queries that return a large percentage of rows. Here's how to check:

-- Always use EXPLAIN (ANALYZE, BUFFERS) -- not just EXPLAIN
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 12345;

-- Good output (index is used):
-- Index Scan using idx_orders_customer_id on orders
--   Index Cond: (customer_id = 12345)
--   Buffers: shared hit=4
--   Planning Time: 0.1 ms
--   Execution Time: 0.05 ms

-- Bad output (sequential scan despite index existing):
-- Seq Scan on orders
--   Filter: (customer_id = 12345)
--   Rows Removed by Filter: 999900
--   Buffers: shared hit=8334
--   Planning Time: 0.1 ms
--   Execution Time: 120.5 ms

Steps to Diagnose an Unused Index

  1. Run EXPLAIN (ANALYZE, BUFFERS) on the actual slow query -- not a simplified version
  2. Check if the planner chose a Seq Scan when you expected an Index Scan
  3. Verify the WHERE clause matches the indexed columns exactly (including type casts)
  4. Check pg_stat_user_indexes to see if the index has ever been used: SELECT * FROM pg_stat_user_indexes WHERE relname = 'orders'
  5. Run ANALYZE orders to update table statistics -- the planner relies on these
  6. Check if the table is small enough that a sequential scan is genuinely faster

Note: EXPLAIN ANALYZE actually executes the query. On a DELETE or UPDATE, wrap it in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE FROM ...; ROLLBACK;

Index Bloat and Maintenance

PostgreSQL's MVCC model means that UPDATE and DELETE operations don't immediately remove old row versions. Dead tuples accumulate in indexes, causing bloat. A bloated index is larger than necessary and slower to scan.

-- Check index bloat (simplified)
SELECT
  schemaname || '.' || indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS times_used,
  idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

-- Rebuild a bloated index without locking the table
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

-- Or rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY orders;

Autovacuum handles most dead tuple cleanup, but heavily updated tables can outpace it. If you see index sizes growing disproportionately to table size, increase autovacuum_vacuum_scale_factor for that table or schedule manual REINDEX CONCURRENTLY operations during low-traffic windows.

The Write Cost of Indexes

Every index on a table adds overhead to write operations. For read-heavy tables (analytics dashboards, content sites), this rarely matters. For write-heavy tables (event streams, logging, high-throughput queues), it's significant.

Operation0 Indexes3 Indexes6 Indexes10 Indexes
INSERT1x (baseline)1.5-2x slower2-3x slower3-5x slower
UPDATE (indexed col)1x1.5-2x slower2-3x slower3-5x slower
DELETE1x1.2-1.5x slower1.5-2x slower2-3x slower

Warning: Don't index every column "just in case." Every unused index is wasted disk space and wasted write performance. Audit unused indexes quarterly: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND schemaname = 'public'

Managed PostgreSQL: Hosting and Pricing Comparison

Index behavior and maintenance differ across managed PostgreSQL providers. Some handle REINDEX automatically; others leave it to you. Here's a comparison of popular options for a production workload (4 vCPU, 16 GB RAM, 100 GB storage):

ProviderStarting Price/moAuto-Vacuum TuningRead ReplicasConnection Pooling
AWS RDS~$190ConfigurableUp to 15RDS Proxy ($)
Google Cloud SQL~$175ConfigurableUp to 10Built-in (AlloyDB)
Azure Database~$200ConfigurableUp to 5PgBouncer built-in
Supabase$25 (Pro)ManagedUp to 2Built-in (Supavisor)
Neon$19 (Launch)ManagedVia branchingBuilt-in
Crunchy Bridge~$100ManagedUp to 5Built-in

If you're running PostgreSQL in production, a managed service is almost always worth it. The operational overhead of handling backups, failovers, and vacuum tuning is real. Neon and Supabase are compelling options for startups. AWS RDS and Cloud SQL remain the safe choices for enterprise workloads where you need fine-grained control over index maintenance and vacuum settings.

Practical Index Strategy: A Decision Framework

When deciding what to index, follow this order:

  1. Start with no indexes beyond the primary key. Measure actual query performance first.
  2. Check slow query logs -- focus on queries that run frequently, not just the slowest one-off reports.
  3. Add B-Tree indexes on columns in WHERE clauses and JOIN conditions.
  4. Consider partial indexes whenever you're filtering on a status column with skewed distribution.
  5. Use GIN for JSONB and full-text search -- don't try to force B-Tree on these data types.
  6. Switch to BRIN for time-series tables with hundreds of millions of rows where data arrives in order.
  7. Verify with EXPLAIN ANALYZE after every index change. Trust the planner's output, not your assumptions.

Frequently Asked Questions

How many indexes should a PostgreSQL table have?

There's no fixed limit, but most tables work well with 3-5 indexes. Every index adds write overhead and consumes disk space. Start with indexes on primary key and foreign keys, then add indexes based on actual slow queries from pg_stat_statements. Audit unused indexes quarterly and drop any with zero scans.

Does PostgreSQL automatically create indexes?

PostgreSQL automatically creates a unique index for every PRIMARY KEY and UNIQUE constraint. It does not create indexes on foreign key columns -- this is a common source of slow JOINs and you should add these manually. It also does not create indexes based on query patterns; that's your job.

When should I use GIN instead of B-Tree?

Use GIN when the column contains composite values: JSONB documents, arrays, or full-text search vectors. B-Tree can't index the internal structure of these types. If you need to query "find rows where the JSONB column contains this key" or "where the array includes these values," GIN is the only option that works efficiently.

What is index bloat and how do I fix it?

Index bloat occurs when dead tuples from updates and deletes accumulate in the index structure. The index grows larger without containing more useful data. Fix it with REINDEX CONCURRENTLY, which rebuilds the index without locking the table. Monitor bloat by comparing the index size to the estimated ideal size using the pgstattuple extension.

Can I create an index without locking the table?

Yes. Use CREATE INDEX CONCURRENTLY. This builds the index in the background without blocking reads or writes. It takes longer than a regular CREATE INDEX because it scans the table twice, but it's essential for production databases. If the build fails partway through, you'll have an invalid index that you need to drop and retry.

Why is PostgreSQL not using my index?

The query planner may skip your index for several reasons: the table is small enough that a sequential scan is faster, the query returns too many rows (typically over 10-15% of the table), statistics are outdated (run ANALYZE), or the WHERE clause doesn't match the index columns due to type casting or function calls. Always check with EXPLAIN ANALYZE.

What is the difference between BRIN and B-Tree for time-series data?

BRIN stores min/max summaries per block range, making it 100-200x smaller than a B-Tree on the same column. It works well when data is physically ordered (high correlation). B-Tree is faster for point lookups and sorting but uses far more storage. For append-only time-series tables over 100 million rows, BRIN is usually the better trade-off.

A

Written by

Abhishek Patel

Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.

Enjoyed this article?

Get more like this in your inbox. No spam, unsubscribe anytime.

Comments

Loading comments...

Leave a comment

Stay in the loop

New articles delivered to your inbox. No spam.