Vector Database Selection Guide: Pinecone vs Weaviate vs pgvector

The Decision That's Harder Than It Looks

Most teams pick a vector database the same way they pick any database: familiarity, a blog post, or what the demo used. The choice holds up fine until query latency starts showing up in P99 dashboards or the monthly bill exceeds the engineering team's salary.

The right choice depends on three variables: corpus size, query latency requirement, and operational complexity tolerance. I'll give you the numbers to make that call.

How Vector Search Works

A vector database stores high-dimensional embeddings (typically 768–3072 dimensions) and supports approximate nearest neighbour (ANN) search, finding the k vectors most similar to a query vector.

Two dominant index types:

HNSW (Hierarchical Navigable Small World): Builds a multi-layer graph where each node connects to its nearest neighbours at each layer. Search traverses from coarse to fine layers. Fast queries (~1–5ms), high recall (0.95+), but high memory footprint (the graph structure adds ~20–40% overhead on top of raw vector storage). Best for latency-sensitive workloads.

IVF-Flat (Inverted File Index): Clusters vectors into buckets (centroids). Search probes the top-k nearest centroids and checks all vectors within them. Lower memory footprint than HNSW. Recall depends on the number of probes: higher probes mean higher recall but slower queries. Best for large corpora where HNSW's memory overhead becomes prohibitive.

The index type is the primary determinant of the recall/latency trade-off. Choose wrong and you're stuck rebuilding.

Pinecone

Fully managed. No infrastructure to operate. The query API abstracts the index type. Pinecone chooses HNSW or IVF internally based on corpus size.

Performance (per published benchmarks): At 1M vectors, 768-dim, recall@10 = 0.97, P50 query latency = 8ms, P99 = 22ms. At 100M vectors, latency rises to P50 = 18ms, P99 = 65ms.

Cost: $0.096/hour per pod (s1 standard). At 10M vectors, expect 1–2 pods → ~$70–140/month. Storage at $0.025/1M vectors/month is additive.

Where it wins: You want production SLAs without operating infrastructure. Pinecone's uptime guarantees (99.9% SLA) and managed scaling are worth the premium for teams that don't want to own the operational surface.

Where it loses: Cost at scale (100M+ vectors becomes expensive fast), no self-hosted option (vendor lock-in is real), and metadata filtering performance degrades at high cardinality.

Weaviate

Open-source, self-hosted or managed cloud. Uses HNSW by default with dynamic segment management to control memory usage.

Performance: At 1M vectors, recall@10 = 0.95, P50 = 6ms, P99 = 28ms (self-hosted, 4-core, 32GB). At 10M vectors, P99 rises to ~80ms on the same hardware. HNSW's memory pressure starts showing.

Differentiator: Weaviate's hybrid search (BM25 + vector) is first-class, not bolted on. The GraphQL query API is expressive for complex filtering. Module system allows inline embedding generation (no separate embedding service required).

Cost (cloud): Roughly 60–70% of Pinecone at equivalent performance. Self-hosted on AWS EC2 (r6i.2xlarge, 64GB): ~$0.50/hour → ~$360/month with 99% utilisation.

Where it wins: Teams with complex filtering requirements, hybrid search needs, or the operational capability to self-host. Also useful when you want to avoid vendor lock-in.

Where it loses: Operational overhead is real. HNSW memory pressure at 50M+ vectors requires careful instance sizing. The GraphQL API has a learning curve.

pgvector

PostgreSQL extension. Vectors stored as a column type. Supports HNSW and IVF-Flat indexes via the vector type.

Performance: At 1M vectors (HNSW, m=16, ef_construction=64): recall@10 = 0.92, P50 = 12ms, P99 = 45ms on a db.r6g.2xlarge (8 vCPU, 64GB RDS). At 5M vectors, P99 degrades to ~180ms on the same instance. HNSW memory pressure is the bottleneck.

The actual argument for pgvector: You already run Postgres. Adding a vector column to an existing table means no additional infrastructure, no new operational surface, transactional consistency between your relational data and vector data, and zero additional cost (beyond instance size).

Cost: Whatever your Postgres instance costs. An RDS db.r6g.2xlarge at ~$0.48/hour → ~$345/month. At 1M vectors with metadata, this is dramatically cheaper than managed vector DB options.

Where it wins: Corpora under ~2M vectors, teams that want to keep the stack simple, and use cases requiring transactional consistency between vectors and relational data.

Where it loses: Query latency at 5M+ vectors on modest hardware becomes a liability. If recall@10 below 0.90 is unacceptable, you'll need careful HNSW tuning. Horizontal scaling requires Citus or read replicas, which adds complexity.

Decision Framework

Corpus Size	Recommendation
< 2M vectors	pgvector (simplest, cheapest, sufficient performance)
2M–20M vectors	Weaviate self-hosted or Pinecone, depends on operational tolerance
20M+ vectors	Pinecone (managed) or Weaviate cloud with IVF for memory efficiency
Latency P99 < 20ms required	Pinecone or Weaviate HNSW. pgvector won't hold at this scale.
Hybrid search required	Weaviate. Native BM25+vector is best-in-class.
Transactional consistency required	pgvector, the only option that gives you ACID guarantees