Glossary AI Systems

Vector Database

A database purpose-built to store and query high-dimensional embedding vectors. The retrieval layer that makes semantic search and RAG pipelines possible at production scale.

What Problem It Solves

Standard databases filter rows by exact match or range predicates. A vector database answers a different question: "Which stored vectors are most similar to this query vector?" That operation, approximate nearest neighbour (ANN) search, is the retrieval primitive behind semantic search, recommendation systems, and RAG pipelines.

Without a vector database, similarity search over 10M embeddings requires brute-force cosine distance computation against every stored vector. At 768 dimensions, that is roughly 15 billion floating-point multiplications per query. Impractical at any meaningful latency budget.

How It Works

Vector databases build an index over the embedding space that allows fast approximate retrieval at the cost of occasionally missing the true nearest neighbour.

HNSW (Hierarchical Navigable Small World): The dominant index type for latency-sensitive workloads. Builds a multi-layer graph where each node connects to its nearest neighbours at each layer. Search starts at the top layer (coarse), traverses down to fine layers. Query latency: 1–5ms at recall@10 of 0.95+. Memory-intensive: the graph structure adds 20–40% overhead on top of raw vector storage.

IVF-Flat (Inverted File Index): Clusters vectors into buckets. Search probes the top-k nearest centroids and checks all vectors within them. Lower memory footprint than HNSW. Recall depends on probe count. Better for large corpora where HNSW's memory cost becomes prohibitive (above 50M vectors on standard hardware).

The index type is the primary determinant of the recall/latency tradeoff. HNSW for low latency, IVF for memory efficiency at scale.

The Recall vs Speed Tradeoff

ANN search trades exactness for speed. Recall@10 of 0.95 means 95% of queries return the true top-10 nearest neighbours. The 5% miss rate is acceptable in most production RAG pipelines because the LLM compensates for slight retrieval imprecision. If you need 0.99+ recall, you pay a significant latency premium: roughly 2–5x slower queries depending on the index type and parameters.

Metadata Filtering

Production systems rarely do pure vector search. Queries typically combine ANN search with metadata filters: "find semantically similar documents that belong to tenant_id=abc and were published after 2024-01-01." The implementation matters significantly: pre-filtering (apply metadata filter before vector search) reduces the effective corpus, which hurts recall. Post-filtering (run ANN on the full corpus, then filter results) can waste compute returning results that are subsequently discarded. Most mature vector databases (Pinecone, Weaviate, Qdrant) implement filtered HNSW, which integrates metadata constraints directly into the graph traversal.

When to Use pgvector Instead

pgvector (Postgres extension) handles vector search adequately up to ~2M vectors at 768 dimensions with HNSW indexing. P99 query latency stays under 50ms on a well-provisioned instance. The argument for pgvector: you already run Postgres, transactional consistency between vectors and relational data is free, and operational complexity stays flat. Beyond 5M vectors or with strict latency requirements under load, dedicated vector databases outperform pgvector consistently.

Interview Tip

The question that separates surface-level answers from strong ones: "what happens to recall when you filter by metadata?" Candidates who understand pre-filtering vs. post-filtering vs. filtered HNSW show they have thought about production query patterns, not just the happy-path similarity search. At L5+, interviewers also expect you to know the rough vector count threshold where pgvector becomes inadequate and a dedicated store is warranted.