Vector Databases: What They Are, How They Work, and When You Need One
A practical guide to vector databases -- how embeddings and ANN algorithms work, and an honest comparison of Pinecone, Weaviate, Qdrant, and pgvector.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

The Search Problem That SQL Can't Solve
Vector databases store and search high-dimensional numerical representations of data -- embeddings -- using similarity metrics instead of exact matches. If you've built anything with LLMs, recommendation systems, or image search, you've either used one already or you're about to. They're the backbone of RAG pipelines, semantic search, and any application where "find me something similar" matters more than "find me an exact match."
I've deployed vector search in production using everything from a hastily added pgvector extension to fully managed Pinecone clusters. The technology is simpler than vendors want you to believe, but the tradeoffs are real. Here's what actually matters when choosing and using a vector database.
What Is a Vector Database?
Definition: A vector database is a specialized data store designed to index, store, and query high-dimensional vectors (embeddings) efficiently. It enables approximate nearest neighbor (ANN) search to find the most similar vectors to a given query vector, typically using distance metrics like cosine similarity or Euclidean distance.
Traditional databases search by exact field values -- give me all users where email = 'x'. Vector databases search by proximity in embedding space -- give me the 10 vectors most similar to this query vector. That's a fundamentally different operation, and it requires fundamentally different data structures.
How Embeddings Work
Before you can search vectors, you need vectors. An embedding model converts raw data (text, images, audio) into a fixed-length array of floating point numbers. The magic is that semantically similar inputs produce numerically similar outputs.
from openai import OpenAI
client = OpenAI()
# These two sentences mean similar things
texts = [
"How do I reset my password?",
"I forgot my login credentials"
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
# These vectors will be close together in embedding space
vec_a = response.data[0].embedding # 1536 dimensions
vec_b = response.data[1].embedding # 1536 dimensions
The resulting vectors typically have 384 to 3072 dimensions. Each dimension captures some learned feature of the input. No single dimension has a human-interpretable meaning -- it's the full vector's position in high-dimensional space that encodes semantic relationships.
Similarity Metrics: How "Close" Gets Calculated
When you query a vector database, it calculates the distance between your query vector and every stored vector (or a smart subset). The three main metrics:
| Metric | Formula | Range | When to Use |
|---|---|---|---|
| Cosine Similarity | dot(A,B) / (||A|| * ||B||) | -1 to 1 | Most text embeddings. Direction matters, magnitude doesn't. |
| Euclidean (L2) | sqrt(sum((a-b)^2)) | 0 to infinity | When magnitude matters (e.g., signal strength) |
| Dot Product | sum(a*b) | -inf to inf | Normalized vectors where you want speed over precision |
Pro tip: For text embeddings from OpenAI, Cohere, or most open-source models, use cosine similarity. These models are trained so that directional similarity captures semantic meaning. If your vectors are already L2-normalized (many models do this by default), cosine similarity and dot product are mathematically equivalent, and dot product is faster.
ANN Algorithms: The Engine Under the Hood
Exact nearest neighbor search compares your query against every vector in the database. That's O(n) per query -- fine for 10,000 vectors, impossibly slow for 10 million. Approximate Nearest Neighbor (ANN) algorithms trade a tiny amount of accuracy for massive speed improvements.
HNSW (Hierarchical Navigable Small World)
The most popular ANN algorithm. HNSW builds a multi-layer graph where each node is a vector and edges connect similar vectors. Search starts at the top layer (fewest nodes, longest edges) and navigates down to finer-grained layers. Think of it like zooming in on a map -- you start at continent level and drill down to street level.
- Query speed: Sub-millisecond for millions of vectors
- Recall: 95-99% typical (configurable via
ef_searchparameter) - Memory: High -- the full graph lives in RAM
- Build time: Slow initial index construction
IVF-PQ (Inverted File Index with Product Quantization)
Clusters vectors into partitions (IVF), then compresses each vector using product quantization (PQ). At query time, only a subset of clusters are searched, and distance calculations use compressed representations.
- Query speed: Fast, but typically slower than HNSW
- Recall: 85-95% depending on configuration
- Memory: Much lower than HNSW -- PQ compresses vectors by 4-32x
- Build time: Requires a training step to learn cluster centroids
Direct answer: For most applications with under 10 million vectors, HNSW is the right choice. It gives the best recall-to-speed tradeoff. Switch to IVF-PQ when you have hundreds of millions of vectors and HNSW's memory requirements become impractical.
Vector Database Comparison: Pinecone vs Weaviate vs Qdrant vs pgvector
The market has exploded with options. Here's an honest comparison based on production experience:
| Feature | Pinecone | Weaviate | Qdrant | pgvector |
|---|---|---|---|---|
| Type | Managed SaaS | Open-source + Cloud | Open-source + Cloud | PostgreSQL extension |
| ANN Algorithm | Proprietary | HNSW | HNSW | HNSW + IVFFlat |
| Max Vectors | Billions | Billions | Billions | Millions (practical) |
| Hybrid Search | Yes (sparse+dense) | Yes (BM25+vector) | Yes (sparse+dense) | With pg_trgm or full-text |
| Filtering | Metadata filters | GraphQL-style filters | Payload filters | Standard SQL WHERE |
| Self-hosting | No | Yes (Docker/K8s) | Yes (Docker/K8s) | Yes (any Postgres host) |
| Pricing (starter) | Free tier, then ~$70/mo | Free self-hosted, cloud from $25/mo | Free self-hosted, cloud from $25/mo | Free (part of Postgres) |
| Latency (1M vectors) | ~10-20ms | ~5-15ms | ~5-10ms | ~20-50ms |
Decision Guide: Which Vector Database Should You Pick?
- Already using PostgreSQL and have fewer than 5M vectors? Start with pgvector. Zero new infrastructure. Add the extension, create an HNSW index, and you're searching vectors alongside your relational data in the same transaction.
- Need managed infrastructure and don't want to think about scaling? Pinecone is the easiest path. You trade control for convenience. Pricing scales linearly, which can get expensive at high volume.
- Want self-hosted with strong performance and flexibility? Qdrant has the best raw performance in benchmarks and a clean API. Weaviate offers more built-in modules (vectorizers, rerankers) but is heavier to operate.
- Building a multi-modal search system? Weaviate's module ecosystem handles text, image, and multi-modal vectors with built-in vectorization. Qdrant is catching up here.
- Need to search billions of vectors? Pinecone or Qdrant Cloud. pgvector won't cut it at that scale.
-- pgvector: Add vector search to existing PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB,
embedding vector(1536) -- match your model's dimensions
);
-- Create HNSW index for fast similarity search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query: find 5 most similar documents
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;
Watch out: pgvector performance degrades significantly above 5-10 million vectors on a single node, especially with complex metadata filters. If your dataset is growing toward that range, plan your migration path early. Retrofitting a dedicated vector database into a production system is painful.
Vector Database Costs at Scale
Pricing varies dramatically. Here's a realistic comparison for a common scenario -- 1 million 1536-dimension vectors, 100 queries per second:
| Solution | Monthly Cost Estimate | What's Included |
|---|---|---|
| pgvector (self-managed) | $0 (uses existing DB) | Extension is free; cost is Postgres hosting |
| Pinecone (Standard) | $70-200 | Managed infra, backups, monitoring |
| Qdrant Cloud | $65-150 | Managed cluster, automatic scaling |
| Weaviate Cloud | $25-100 | Managed cluster, built-in vectorizers |
| Self-hosted Qdrant (AWS) | $50-100 | EC2 instance costs only |
Frequently Asked Questions
Do I need a vector database, or can I use a regular database?
If you're searching fewer than 100,000 vectors and latency isn't critical, you can compute cosine similarity in application code against vectors stored in any database. Above that scale, you need specialized indexing. pgvector gives you that without leaving PostgreSQL. Purpose-built vector databases are only necessary when pgvector's limitations become bottlenecks.
What's the difference between a vector database and a vector index?
A vector index (like FAISS or Annoy) is a library that builds an in-memory search structure. A vector database adds persistence, CRUD operations, metadata filtering, replication, and access control on top of an index. If you need durability and concurrent access, you need a database, not just an index.
How many dimensions should my embeddings have?
More dimensions capture more nuance but require more storage and slower search. For text, 1536 dimensions (OpenAI's text-embedding-3-small) is a solid default. If storage is tight, OpenAI's model supports dimension reduction to 512 with minimal quality loss. For specialized domains, test with MTEB benchmarks.
Can vector databases handle real-time updates?
Yes, but with caveats. Most vector databases support upserts, but HNSW index updates are more expensive than inserts. Qdrant and Weaviate handle this well. pgvector requires periodic reindexing for optimal performance after large batch updates. Plan for eventual consistency in high-write scenarios.
What is the "curse of dimensionality" in vector search?
As vector dimensions increase, all points tend to become equidistant from each other, making distance metrics less meaningful. In practice, this matters less than theory suggests because real-world embeddings occupy a much lower-dimensional manifold. ANN algorithms like HNSW work well up to several thousand dimensions.
Should I use a vector database or a full-text search engine like Elasticsearch?
Use both. Elasticsearch excels at keyword search, faceting, and structured filtering. Vector databases excel at semantic similarity. Modern Elasticsearch supports dense vector search, and most vector databases now support keyword search. The trend is convergence, but purpose-built tools still outperform hybrids in their specialty.
Start Simple, Scale Deliberately
My strong recommendation: start with pgvector if you're already on PostgreSQL. It handles more than most people think, the operational overhead is zero, and you get the massive advantage of querying vectors and relational data together. Only move to a dedicated vector database when you've hit a concrete performance wall -- not when a vendor's marketing page tells you that you need one. The best infrastructure is the kind you don't have to manage.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
AI Observability: How to Monitor and Debug LLM Applications
A practical guide to monitoring LLM applications in production -- input/output logging, cost tracking, quality metrics, and a comparison of LangSmith, Langfuse, and Arize.
10 min read
AI/ML EngineeringDeploying ML Models in Production: From Notebook to Kubernetes
End-to-end guide to deploying ML models -- from ONNX export and FastAPI serving to Kubernetes GPU workloads, canary deployments, and Prometheus monitoring.
9 min read
AI/ML EngineeringFine-Tuning vs Prompt Engineering: Choosing the Right Approach
A practical guide to choosing between prompt engineering and fine-tuning for LLMs -- techniques, costs, LoRA/QLoRA, and a decision framework for production systems.
10 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.