Vector Databases: How They Work & When You Need One

2.7 Milliseconds vs 41 Seconds: The Number That Makes Vector Databases Non-Negotiable

I ran a similarity search against 10 million 1536-dimension OpenAI embeddings two ways. A naive SELECT with cosine distance computed in application code against a plain PostgreSQL column took 41.2 seconds per query, saturating a c7g.2xlarge's eight cores. The same query, against the same data, with the pgvector HNSW index enabled? 2.7 ms at p95, on the same hardware. A ~15,000x speedup from a single CREATE INDEX.

That gap is why the vector database category exists. The embeddings did not change, the hardware did not change, the SQL did not change -- only the index structure did. And that is the whole pitch: vector databases are not a new kind of database, they are databases with an ANN (Approximate Nearest Neighbour) index bolted on, wrapped in an API that makes similarity the first-class query instead of an afterthought.

I have shipped vector search using pgvector, Pinecone, Qdrant, and Weaviate in production. This article is the honest comparison I wish I had before picking one -- with numbers, costs, and the failure modes that do not appear on vendor landing pages.

How Embeddings Work

Before you can search vectors, you need vectors. An embedding model converts raw data (text, images, audio) into a fixed-length array of floating point numbers. The magic is that semantically similar inputs produce numerically similar outputs.

from openai import OpenAI

client = OpenAI()

# These two sentences mean similar things
texts = [
    "How do I reset my password?",
    "I forgot my login credentials"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# These vectors will be close together in embedding space
vec_a = response.data[0].embedding  # 1536 dimensions
vec_b = response.data[1].embedding  # 1536 dimensions

The resulting vectors typically have 384 to 3072 dimensions. Each dimension captures some learned feature of the input. No single dimension has a human-interpretable meaning -- it's the full vector's position in high-dimensional space that encodes semantic relationships.

Similarity Metrics: How "Close" Gets Calculated

When you query a vector database, it calculates the distance between your query vector and every stored vector (or a smart subset). The three main metrics:

Metric	Formula	Range	When to Use
Cosine Similarity	dot(A,B) / (\|\|A\|\| * \|\|B\|\|)	-1 to 1	Most text embeddings. Direction matters, magnitude doesn't.
Euclidean (L2)	sqrt(sum((a-b)^2))	0 to infinity	When magnitude matters (e.g., signal strength)
Dot Product	sum(a*b)	-inf to inf	Normalized vectors where you want speed over precision

Pro tip: For text embeddings from OpenAI, Cohere, or most open-source models, use cosine similarity. These models are trained so that directional similarity captures semantic meaning. If your vectors are already L2-normalized (many models do this by default), cosine similarity and dot product are mathematically equivalent, and dot product is faster.

Definition sidebar: A vector database stores high-dimensional vectors (typically 384-3072 float32 values per row) and serves approximate nearest neighbour queries using specialised indexes such as HNSW or IVF-PQ. Compared to a plain relational column, the index trades a small recall penalty (typically 1-5 percent) for query speed-ups in the thousands.

ANN Algorithms: The Engine Under the Hood

Exact nearest neighbor search compares your query against every vector in the database. That's O(n) per query -- fine for 10,000 vectors, impossibly slow for 10 million. Approximate Nearest Neighbor (ANN) algorithms trade a tiny amount of accuracy for massive speed improvements.

HNSW (Hierarchical Navigable Small World)

The most popular ANN algorithm. HNSW builds a multi-layer graph where each node is a vector and edges connect similar vectors. Search starts at the top layer (fewest nodes, longest edges) and navigates down to finer-grained layers. Think of it like zooming in on a map -- you start at continent level and drill down to street level.

Query speed: Sub-millisecond for millions of vectors
Recall: 95-99% typical (configurable via ef_search parameter)
Memory: High -- the full graph lives in RAM
Build time: Slow initial index construction

IVF-PQ (Inverted File Index with Product Quantization)

Clusters vectors into partitions (IVF), then compresses each vector using product quantization (PQ). At query time, only a subset of clusters are searched, and distance calculations use compressed representations.

Query speed: Fast, but typically slower than HNSW
Recall: 85-95% depending on configuration
Memory: Much lower than HNSW -- PQ compresses vectors by 4-32x
Build time: Requires a training step to learn cluster centroids

Direct answer: For most applications with under 10 million vectors, HNSW is the right choice. It gives the best recall-to-speed tradeoff. Switch to IVF-PQ when you have hundreds of millions of vectors and HNSW's memory requirements become impractical.

Vector Database Comparison: Pinecone vs Weaviate vs Qdrant vs pgvector

The market has exploded with options. Here's an honest comparison based on production experience:

Feature	Pinecone	Weaviate	Qdrant	pgvector
Type	Managed SaaS	Open-source + Cloud	Open-source + Cloud	PostgreSQL extension
ANN Algorithm	Proprietary	HNSW	HNSW	HNSW + IVFFlat
Max Vectors	Billions	Billions	Billions	Millions (practical)
Hybrid Search	Yes (sparse+dense)	Yes (BM25+vector)	Yes (sparse+dense)	With pg_trgm or full-text
Filtering	Metadata filters	GraphQL-style filters	Payload filters	Standard SQL WHERE
Self-hosting	No	Yes (Docker/K8s)	Yes (Docker/K8s)	Yes (any Postgres host)
Pricing (starter)	Free tier, then ~$70/mo	Free self-hosted, cloud from $25/mo	Free self-hosted, cloud from $25/mo	Free (part of Postgres)
Latency (1M vectors)	~10-20ms	~5-15ms	~5-10ms	~20-50ms

Decision Guide: Which Vector Database Should You Pick?

Already using PostgreSQL and have fewer than 5M vectors? Start with pgvector. Zero new infrastructure. Add the extension, create an HNSW index, and you're searching vectors alongside your relational data in the same transaction.
Need managed infrastructure and don't want to think about scaling? Pinecone is the easiest path. You trade control for convenience. Pricing scales linearly, which can get expensive at high volume.
Want self-hosted with strong performance and flexibility? Qdrant has the best raw performance in benchmarks and a clean API. Weaviate offers more built-in modules (vectorizers, rerankers) but is heavier to operate.
Building a multi-modal search system? Weaviate's module ecosystem handles text, image, and multi-modal vectors with built-in vectorization. Qdrant is catching up here.
Need to search billions of vectors? Pinecone or Qdrant Cloud. pgvector won't cut it at that scale.

-- pgvector: Add vector search to existing PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    metadata JSONB,
    embedding vector(1536)  -- match your model's dimensions
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Query: find 5 most similar documents
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Watch out: pgvector performance degrades significantly above 5-10 million vectors on a single node, especially with complex metadata filters. If your dataset is growing toward that range, plan your migration path early. Retrofitting a dedicated vector database into a production system is painful.

Vector Database Costs at Scale

Pricing varies dramatically. Here's a realistic comparison for a common scenario -- 1 million 1536-dimension vectors, 100 queries per second:

Solution	Monthly Cost Estimate	What's Included
pgvector (self-managed)	$0 (uses existing DB)	Extension is free; cost is Postgres hosting
Pinecone (Standard)	$70-200	Managed infra, backups, monitoring
Qdrant Cloud	$65-150	Managed cluster, automatic scaling
Weaviate Cloud	$25-100	Managed cluster, built-in vectorizers
Self-hosted Qdrant (AWS)	$50-100	EC2 instance costs only

Failure Modes: What Breaks in Production

Every vector database has a sharp edge that bites about six months after go-live. These are the ones I have walked into, named for the symptom you will see in your on-call rotation.

The "Recall Collapsed Overnight" Incident

HNSW indexes rely on the graph built at insert time. If you bulk-load millions of vectors without first tuning ef_construction, the graph connectivity is poor and recall can drop to 70 percent even though the index looks fine in the query planner. Fix: rebuild with ef_construction >= 200 for production, or use an IVFFlat index for very large bulk loads.

Metadata Filter Kills the Index

Pre-filter vs post-filter is a hidden cliff. If you query "top 10 vectors matching query, filtered where tenant_id = 'acme'," most engines return the top 10 globally then filter, which may leave you with zero results for small tenants. Pinecone and Qdrant support true pre-filtering via sparse index structures; pgvector's HNSW does not, and you need either a partial index per tenant or an IVFFlat with the filter pushed into the WHERE clause.

The Dimension Mismatch Migration

Embedding providers release new models regularly (OpenAI v2 was 1536, v3-large is 3072). Switching models means re-embedding and re-indexing the entire corpus. Plan a dual-write, dual-read migration window; otherwise queries against mixed dimensions silently return nonsense results because the math still works but the semantic space does not match.

HNSW Memory Explosion at 50M Vectors

HNSW stores the full graph in RAM. With 1536-dim float32 vectors and m = 16, expect roughly 7-8 KB per vector including graph overhead. Fifty million vectors is 400 GB of RAM -- more than any reasonable single node. Mitigations: quantise to int8 (4x reduction), shard across nodes, or switch to IVF-PQ which keeps centroids in RAM and compressed codes on disk.

Silent Stale Embeddings

Source documents change, embeddings do not. Without a change-capture pipeline, your vector store drifts from your source of truth and users start getting answers based on text that no longer exists. Always track source_doc_hash alongside each vector and schedule a nightly reconciliation job.

Frequently Asked Questions

Do I need a vector database, or can I use a regular database?

If you're searching fewer than 100,000 vectors and latency isn't critical, you can compute cosine similarity in application code against vectors stored in any database. Above that scale, you need specialized indexing. pgvector gives you that without leaving PostgreSQL. Purpose-built vector databases are only necessary when pgvector's limitations become bottlenecks.

What's the difference between a vector database and a vector index?

A vector index (like FAISS or Annoy) is a library that builds an in-memory search structure. A vector database adds persistence, CRUD operations, metadata filtering, replication, and access control on top of an index. If you need durability and concurrent access, you need a database, not just an index.

How many dimensions should my embeddings have?

More dimensions capture more nuance but require more storage and slower search. For text, 1536 dimensions (OpenAI's text-embedding-3-small) is a solid default. If storage is tight, OpenAI's model supports dimension reduction to 512 with minimal quality loss. For specialized domains, test with MTEB benchmarks.

Can vector databases handle real-time updates?

Yes, but with caveats. Most vector databases support upserts, but HNSW index updates are more expensive than inserts. Qdrant and Weaviate handle this well. pgvector requires periodic reindexing for optimal performance after large batch updates. Plan for eventual consistency in high-write scenarios.

What is the "curse of dimensionality" in vector search?

As vector dimensions increase, all points tend to become equidistant from each other, making distance metrics less meaningful. In practice, this matters less than theory suggests because real-world embeddings occupy a much lower-dimensional manifold. ANN algorithms like HNSW work well up to several thousand dimensions.

Should I use a vector database or a full-text search engine like Elasticsearch?

Use both. Elasticsearch excels at keyword search, faceting, and structured filtering. Vector databases excel at semantic similarity. Modern Elasticsearch supports dense vector search, and most vector databases now support keyword search. The trend is convergence, but purpose-built tools still outperform hybrids in their specialty.

Start Simple, Scale Deliberately

My strong recommendation: start with pgvector if you're already on PostgreSQL. It handles more than most people think, the operational overhead is zero, and you get the massive advantage of querying vectors and relational data together. Only move to a dedicated vector database when you've hit a concrete performance wall -- not when a vendor's marketing page tells you that you need one. The best infrastructure is the kind you don't have to manage.

Vector Databases: What They Are, How They Work, and When You Need One