Distributed Caching
A shared cache layer across multiple nodes used to absorb read traffic from the primary database and reduce latency on hot data paths. The difference between a 2ms and a 200ms read at scale.
What It Actually Does
Distributed caching intercepts read requests before they reach the database. On a cache hit, you return data in ~1ms instead of ~10–50ms (for a well-tuned database) or ~200ms+ (for a slow query or cold disk). At high read QPS, the database never sees most of the traffic. The cache doesn't make individual queries faster. It eliminates them.
Redis vs Memcached
Memcached is a pure key-value cache. It's fast, simple, and multi-threaded. If you need horizontal write throughput on simple string values, Memcached scales cleanly.
Redis is a data structure server that also functions as a cache. It supports strings, hashes, sorted sets, lists, bitmaps, and streams. It's single-threaded per instance (Redis 6+ added I/O threading, but command execution is still serial). The richer data model makes Redis the default choice for most use cases: leaderboards (sorted sets), session storage (hashes), pub/sub, and rate limiting (INCR with TTL).
In practice: choose Memcached only if you've benchmarked and found Redis's threading model is the bottleneck. That's rare.
Eviction Policies
LRU (Least Recently Used): Evicts the key that hasn't been accessed for the longest time. The right default for most workloads.
LFU (Least Frequently Used): Evicts the key accessed fewest times. Better for access patterns where some keys are consistently hot over long periods and LRU would evict them during a one-time scan.
TTL (Time to Live): Keys expire after a set duration regardless of access. Use this when cache freshness is a correctness requirement, not just a preference.
Cache Invalidation: The Hard Problem
There are two strategies and both have failure modes:
Cache-aside (lazy loading): The application checks the cache first; on a miss, reads from the DB and populates the cache. Simple, but risks a thundering herd on cold start. If your cache restarts, every concurrent request is a miss and hammers the database simultaneously.
Write-through: On every DB write, update the cache immediately. Keeps the cache warm but adds ~1ms write latency and risks write amplification under high write throughput.
Write-behind (write-back): Write to cache first, flush to DB asynchronously. Lowest write latency, but you risk data loss if the cache node fails before flushing. Not appropriate when durability matters.
Consistent Hashing for Cache Key Distribution
When you have multiple cache nodes, you need a deterministic way to route "key X always goes to node Y." Consistent hashing solves this with minimal key remapping when nodes are added or removed. Without it, scaling the cache cluster causes a full cache miss storm as keys remap to new nodes.
Interview Tip
The question that separates good answers from great ones: "what happens when your cache restarts?" If you can describe the thundering herd problem on cold start and name a mitigation (request coalescing, probabilistic early expiration, or a warm-up script that pre-populates from the DB), you've demonstrated production thinking that most candidates don't show.
Related Concepts
A distributed hashing scheme that minimizes key remapping when nodes are added or removed.
Horizontal partitioning of a database across multiple machines to distribute load beyond a single server's capacity.
Distributes incoming traffic across multiple servers to prevent any single node from becoming a bottleneck. The mechanism that makes horizontal scaling functional in practice.