AI/ML Engineering

Vector Databases: What They Are, How They Work, and When You Need One

A practical guide to vector databases -- how embeddings and ANN algorithms work, and an honest comparison of Pinecone, Weaviate, Qdrant, and pgvector.

A
Abhishek Patel9 min read

Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Vector Databases: What They Are, How They Work, and When You Need One
Vector Databases: What They Are, How They Work, and When You Need One

The Search Problem That SQL Can't Solve

Vector databases store and search high-dimensional numerical representations of data -- embeddings -- using similarity metrics instead of exact matches. If you've built anything with LLMs, recommendation systems, or image search, you've either used one already or you're about to. They're the backbone of RAG pipelines, semantic search, and any application where "find me something similar" matters more than "find me an exact match."

I've deployed vector search in production using everything from a hastily added pgvector extension to fully managed Pinecone clusters. The technology is simpler than vendors want you to believe, but the tradeoffs are real. Here's what actually matters when choosing and using a vector database.

What Is a Vector Database?

Definition: A vector database is a specialized data store designed to index, store, and query high-dimensional vectors (embeddings) efficiently. It enables approximate nearest neighbor (ANN) search to find the most similar vectors to a given query vector, typically using distance metrics like cosine similarity or Euclidean distance.

Traditional databases search by exact field values -- give me all users where email = 'x'. Vector databases search by proximity in embedding space -- give me the 10 vectors most similar to this query vector. That's a fundamentally different operation, and it requires fundamentally different data structures.

How Embeddings Work

Before you can search vectors, you need vectors. An embedding model converts raw data (text, images, audio) into a fixed-length array of floating point numbers. The magic is that semantically similar inputs produce numerically similar outputs.

from openai import OpenAI

client = OpenAI()

# These two sentences mean similar things
texts = [
    "How do I reset my password?",
    "I forgot my login credentials"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# These vectors will be close together in embedding space
vec_a = response.data[0].embedding  # 1536 dimensions
vec_b = response.data[1].embedding  # 1536 dimensions

The resulting vectors typically have 384 to 3072 dimensions. Each dimension captures some learned feature of the input. No single dimension has a human-interpretable meaning -- it's the full vector's position in high-dimensional space that encodes semantic relationships.

Similarity Metrics: How "Close" Gets Calculated

When you query a vector database, it calculates the distance between your query vector and every stored vector (or a smart subset). The three main metrics:

MetricFormulaRangeWhen to Use
Cosine Similaritydot(A,B) / (||A|| * ||B||)-1 to 1Most text embeddings. Direction matters, magnitude doesn't.
Euclidean (L2)sqrt(sum((a-b)^2))0 to infinityWhen magnitude matters (e.g., signal strength)
Dot Productsum(a*b)-inf to infNormalized vectors where you want speed over precision

Pro tip: For text embeddings from OpenAI, Cohere, or most open-source models, use cosine similarity. These models are trained so that directional similarity captures semantic meaning. If your vectors are already L2-normalized (many models do this by default), cosine similarity and dot product are mathematically equivalent, and dot product is faster.

ANN Algorithms: The Engine Under the Hood

Exact nearest neighbor search compares your query against every vector in the database. That's O(n) per query -- fine for 10,000 vectors, impossibly slow for 10 million. Approximate Nearest Neighbor (ANN) algorithms trade a tiny amount of accuracy for massive speed improvements.

HNSW (Hierarchical Navigable Small World)

The most popular ANN algorithm. HNSW builds a multi-layer graph where each node is a vector and edges connect similar vectors. Search starts at the top layer (fewest nodes, longest edges) and navigates down to finer-grained layers. Think of it like zooming in on a map -- you start at continent level and drill down to street level.

  • Query speed: Sub-millisecond for millions of vectors
  • Recall: 95-99% typical (configurable via ef_search parameter)
  • Memory: High -- the full graph lives in RAM
  • Build time: Slow initial index construction

IVF-PQ (Inverted File Index with Product Quantization)

Clusters vectors into partitions (IVF), then compresses each vector using product quantization (PQ). At query time, only a subset of clusters are searched, and distance calculations use compressed representations.

  • Query speed: Fast, but typically slower than HNSW
  • Recall: 85-95% depending on configuration
  • Memory: Much lower than HNSW -- PQ compresses vectors by 4-32x
  • Build time: Requires a training step to learn cluster centroids

Direct answer: For most applications with under 10 million vectors, HNSW is the right choice. It gives the best recall-to-speed tradeoff. Switch to IVF-PQ when you have hundreds of millions of vectors and HNSW's memory requirements become impractical.

Vector Database Comparison: Pinecone vs Weaviate vs Qdrant vs pgvector

The market has exploded with options. Here's an honest comparison based on production experience:

FeaturePineconeWeaviateQdrantpgvector
TypeManaged SaaSOpen-source + CloudOpen-source + CloudPostgreSQL extension
ANN AlgorithmProprietaryHNSWHNSWHNSW + IVFFlat
Max VectorsBillionsBillionsBillionsMillions (practical)
Hybrid SearchYes (sparse+dense)Yes (BM25+vector)Yes (sparse+dense)With pg_trgm or full-text
FilteringMetadata filtersGraphQL-style filtersPayload filtersStandard SQL WHERE
Self-hostingNoYes (Docker/K8s)Yes (Docker/K8s)Yes (any Postgres host)
Pricing (starter)Free tier, then ~$70/moFree self-hosted, cloud from $25/moFree self-hosted, cloud from $25/moFree (part of Postgres)
Latency (1M vectors)~10-20ms~5-15ms~5-10ms~20-50ms

Decision Guide: Which Vector Database Should You Pick?

  1. Already using PostgreSQL and have fewer than 5M vectors? Start with pgvector. Zero new infrastructure. Add the extension, create an HNSW index, and you're searching vectors alongside your relational data in the same transaction.
  2. Need managed infrastructure and don't want to think about scaling? Pinecone is the easiest path. You trade control for convenience. Pricing scales linearly, which can get expensive at high volume.
  3. Want self-hosted with strong performance and flexibility? Qdrant has the best raw performance in benchmarks and a clean API. Weaviate offers more built-in modules (vectorizers, rerankers) but is heavier to operate.
  4. Building a multi-modal search system? Weaviate's module ecosystem handles text, image, and multi-modal vectors with built-in vectorization. Qdrant is catching up here.
  5. Need to search billions of vectors? Pinecone or Qdrant Cloud. pgvector won't cut it at that scale.
-- pgvector: Add vector search to existing PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    metadata JSONB,
    embedding vector(1536)  -- match your model's dimensions
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Query: find 5 most similar documents
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Watch out: pgvector performance degrades significantly above 5-10 million vectors on a single node, especially with complex metadata filters. If your dataset is growing toward that range, plan your migration path early. Retrofitting a dedicated vector database into a production system is painful.

Vector Database Costs at Scale

Pricing varies dramatically. Here's a realistic comparison for a common scenario -- 1 million 1536-dimension vectors, 100 queries per second:

SolutionMonthly Cost EstimateWhat's Included
pgvector (self-managed)$0 (uses existing DB)Extension is free; cost is Postgres hosting
Pinecone (Standard)$70-200Managed infra, backups, monitoring
Qdrant Cloud$65-150Managed cluster, automatic scaling
Weaviate Cloud$25-100Managed cluster, built-in vectorizers
Self-hosted Qdrant (AWS)$50-100EC2 instance costs only

Frequently Asked Questions

Do I need a vector database, or can I use a regular database?

If you're searching fewer than 100,000 vectors and latency isn't critical, you can compute cosine similarity in application code against vectors stored in any database. Above that scale, you need specialized indexing. pgvector gives you that without leaving PostgreSQL. Purpose-built vector databases are only necessary when pgvector's limitations become bottlenecks.

What's the difference between a vector database and a vector index?

A vector index (like FAISS or Annoy) is a library that builds an in-memory search structure. A vector database adds persistence, CRUD operations, metadata filtering, replication, and access control on top of an index. If you need durability and concurrent access, you need a database, not just an index.

How many dimensions should my embeddings have?

More dimensions capture more nuance but require more storage and slower search. For text, 1536 dimensions (OpenAI's text-embedding-3-small) is a solid default. If storage is tight, OpenAI's model supports dimension reduction to 512 with minimal quality loss. For specialized domains, test with MTEB benchmarks.

Can vector databases handle real-time updates?

Yes, but with caveats. Most vector databases support upserts, but HNSW index updates are more expensive than inserts. Qdrant and Weaviate handle this well. pgvector requires periodic reindexing for optimal performance after large batch updates. Plan for eventual consistency in high-write scenarios.

What is the "curse of dimensionality" in vector search?

As vector dimensions increase, all points tend to become equidistant from each other, making distance metrics less meaningful. In practice, this matters less than theory suggests because real-world embeddings occupy a much lower-dimensional manifold. ANN algorithms like HNSW work well up to several thousand dimensions.

Should I use a vector database or a full-text search engine like Elasticsearch?

Use both. Elasticsearch excels at keyword search, faceting, and structured filtering. Vector databases excel at semantic similarity. Modern Elasticsearch supports dense vector search, and most vector databases now support keyword search. The trend is convergence, but purpose-built tools still outperform hybrids in their specialty.

Start Simple, Scale Deliberately

My strong recommendation: start with pgvector if you're already on PostgreSQL. It handles more than most people think, the operational overhead is zero, and you get the massive advantage of querying vectors and relational data together. Only move to a dedicated vector database when you've hit a concrete performance wall -- not when a vendor's marketing page tells you that you need one. The best infrastructure is the kind you don't have to manage.

A

Written by

Abhishek Patel

Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.

Related Articles

Enjoyed this article?

Get more like this in your inbox. No spam, unsubscribe anytime.

Comments

Loading comments...

Leave a comment

Stay in the loop

New articles delivered to your inbox. No spam.