Vector Database Selection Guide: Pinecone vs Weaviate vs pgvector
A benchmark-driven comparison of the top vector databases for production RAG. Recall@k, indexing speed, query latency, and cost at 1M, 10M, and 100M vectors, so you can make the right call before you build.
The Decision That's Harder Than It Looks
Most teams pick a vector database the same way they pick any database: familiarity, a blog post, or what the demo used. The choice holds up fine until query latency starts showing up in P99 dashboards or the monthly bill exceeds the engineering team's salary.
The right choice depends on three variables: corpus size, query latency requirement, and operational complexity tolerance. I'll give you the numbers to make that call.
How Vector Search Works
A vector database stores high-dimensional embeddings (typically 768–3072 dimensions) and supports approximate nearest neighbour (ANN) search, finding the k vectors most similar to a query vector.
Two dominant index types:
HNSW (Hierarchical Navigable Small World): Builds a multi-layer graph where each node connects to its nearest neighbours at each layer. Search traverses from coarse to fine layers. Fast queries (~1–5ms), high recall (0.95+), but high memory footprint (the graph structure adds ~20–40% overhead on top of raw vector storage). Best for latency-sensitive workloads.
IVF-Flat (Inverted File Index): Clusters vectors into buckets (centroids). Search probes the top-k nearest centroids and checks all vectors within them. Lower memory footprint than HNSW. Recall depends on the number of probes: higher probes mean higher recall but slower queries. Best for large corpora where HNSW's memory overhead becomes prohibitive.
The index type is the primary determinant of the recall/latency trade-off. Choose wrong and you're stuck rebuilding.
Pinecone
Fully managed. No infrastructure to operate. The query API abstracts the index type. Pinecone chooses HNSW or IVF internally based on corpus size.
Performance (per published benchmarks): At 1M vectors, 768-dim, recall@10 = 0.97, P50 query latency = 8ms, P99 = 22ms. At 100M vectors, latency rises to P50 = 18ms, P99 = 65ms.
Cost: $0.096/hour per pod (s1 standard). At 10M vectors, expect 1–2 pods → ~$70–140/month. Storage at $0.025/1M vectors/month is additive.
Where it wins: You want production SLAs without operating infrastructure. Pinecone's uptime guarantees (99.9% SLA) and managed scaling are worth the premium for teams that don't want to own the operational surface.
Where it loses: Cost at scale (100M+ vectors becomes expensive fast), no self-hosted option (vendor lock-in is real), and metadata filtering performance degrades at high cardinality.
Weaviate
Open-source, self-hosted or managed cloud. Uses HNSW by default with dynamic segment management to control memory usage.
Performance: At 1M vectors, recall@10 = 0.95, P50 = 6ms, P99 = 28ms (self-hosted, 4-core, 32GB). At 10M vectors, P99 rises to ~80ms on the same hardware. HNSW's memory pressure starts showing.
Differentiator: Weaviate's hybrid search (BM25 + vector) is first-class, not bolted on. The GraphQL query API is expressive for complex filtering. Module system allows inline embedding generation (no separate embedding service required).
Cost (cloud): Roughly 60–70% of Pinecone at equivalent performance. Self-hosted on AWS EC2 (r6i.2xlarge, 64GB): ~$0.50/hour → ~$360/month with 99% utilisation.
Where it wins: Teams with complex filtering requirements, hybrid search needs, or the operational capability to self-host. Also useful when you want to avoid vendor lock-in.
Where it loses: Operational overhead is real. HNSW memory pressure at 50M+ vectors requires careful instance sizing. The GraphQL API has a learning curve.
pgvector
PostgreSQL extension. Vectors stored as a column type. Supports HNSW and IVF-Flat indexes via the vector type.
Performance: At 1M vectors (HNSW, m=16, ef_construction=64): recall@10 = 0.92, P50 = 12ms, P99 = 45ms on a db.r6g.2xlarge (8 vCPU, 64GB RDS). At 5M vectors, P99 degrades to ~180ms on the same instance. HNSW memory pressure is the bottleneck.
The actual argument for pgvector: You already run Postgres. Adding a vector column to an existing table means no additional infrastructure, no new operational surface, transactional consistency between your relational data and vector data, and zero additional cost (beyond instance size).
Cost: Whatever your Postgres instance costs. An RDS db.r6g.2xlarge at ~$0.48/hour → ~$345/month. At 1M vectors with metadata, this is dramatically cheaper than managed vector DB options.
Where it wins: Corpora under ~2M vectors, teams that want to keep the stack simple, and use cases requiring transactional consistency between vectors and relational data.
Where it loses: Query latency at 5M+ vectors on modest hardware becomes a liability. If recall@10 below 0.90 is unacceptable, you'll need careful HNSW tuning. Horizontal scaling requires Citus or read replicas, which adds complexity.
Decision Framework
| Corpus Size | Recommendation |
|---|---|
| < 2M vectors | pgvector (simplest, cheapest, sufficient performance) |
| 2M–20M vectors | Weaviate self-hosted or Pinecone, depends on operational tolerance |
| 20M+ vectors | Pinecone (managed) or Weaviate cloud with IVF for memory efficiency |
| Latency P99 < 20ms required | Pinecone or Weaviate HNSW. pgvector won't hold at this scale. |
| Hybrid search required | Weaviate. Native BM25+vector is best-in-class. |
| Transactional consistency required | pgvector, the only option that gives you ACID guarantees |
Further Reading
- ANN Benchmarks: standardised recall/throughput comparison across libraries
- pgvector GitHub: HNSW parameter tuning documentation
- Pinecone documentation: choosing index type and pod configuration
Related Concepts
A shared cache layer across multiple nodes used to absorb read traffic from the primary database and reduce latency on hot data paths. The difference between a 2ms and a 200ms read at scale.
A database purpose-built to store and query high-dimensional embedding vectors. The retrieval layer that makes semantic search and RAG pipelines possible at production scale.
Numerical vector representations of text, images, or other data that encode semantic meaning. The translation layer that converts unstructured content into a form that can be compared mathematically.