GlossarySystem Design

Message Queue

Asynchronous communication buffer between services. Decouples producers from consumers and provides durability during traffic spikes.

The Problem Message Queues Solve

Synchronous service-to-service calls create a tight dependency: if the downstream service is slow, the caller blocks. If the downstream service is down, the caller fails. If both services are healthy but traffic spikes, the downstream service gets overwhelmed because it must process every request at the rate they arrive.

A message queue breaks all three of these couplings. The producer writes a message to the queue and continues immediately. The consumer reads from the queue at whatever rate it can sustain. The queue absorbs the difference. This pattern shifts the failure mode from cascading errors (synchronous) to queue depth growth (asynchronous), which is far easier to monitor and recover from.

Core Mechanics

Producers write messages to a named queue or topic. The write is acknowledged when the broker has durably persisted the message (for durable queues), not when a consumer processes it. Producer throughput is independent of consumer throughput.

Consumers pull messages from the queue (or receive them via push, depending on the system). After processing, the consumer sends an acknowledgement. The broker deletes the message only after receiving this acknowledgement. If the consumer crashes before acknowledging, the broker re-delivers the message to another consumer.

This acknowledgement model is called at-least-once delivery: every message is delivered at minimum once, but may be delivered more than once if the consumer crashes after processing but before acknowledging. The implication is critical: consumers must be idempotent. Processing the same message twice must produce the same result as processing it once.

Delivery Guarantees

Guarantee Meaning Tradeoff
At-most-once Message delivered 0 or 1 times Possible data loss, no duplicates
At-least-once Message delivered 1 or more times No data loss, possible duplicates
Exactly-once Message delivered exactly once Highest complexity, significant overhead

Exactly-once delivery is achievable but expensive. Kafka supports it via idempotent producers and transactional APIs, but the throughput cost is meaningful: roughly 20-40% lower than at-least-once. Most production systems design for at-least-once delivery with idempotent consumers rather than paying the exactly-once overhead.

Kafka vs SQS vs RabbitMQ

Kafka SQS RabbitMQ
Model Distributed log (pull) Managed queue (poll) Message broker (push/pull)
Retention Time-based (7 days default, configurable) Until consumed (max 14 days) Until consumed
Ordering Strict within a partition Best-effort (FIFO queue option) Strict within a queue
Throughput 1M+ messages/second per broker ~3,000 messages/second per queue ~50,000 messages/second
Consumer model Consumer groups (each group gets all messages) Competing consumers (each message to one) Competing consumers
Replay Yes, consumers can rewind to any offset No, consumed messages are gone No
Use case Event streaming, audit log, multi-consumer fan-out Task queues, AWS-native workloads Complex routing, RPC, per-message TTL

The architectural difference that matters most: Kafka is a log, not a queue. Every message is retained for a configurable period. Multiple independent consumer groups each get the full stream. This makes Kafka the right choice when multiple downstream systems need the same events (analytics pipeline, real-time processing, and audit logging all consuming the same event stream). SQS and RabbitMQ are point-to-point: once consumed, the message is gone.

Consumer Groups and Partitioning (Kafka)

Kafka partitions each topic across N partitions. Each partition is an ordered, append-only log. Within a consumer group, each partition is assigned to exactly one consumer. Adding consumers up to the partition count scales read throughput linearly. Adding more consumers than partitions does nothing (the extras sit idle).

Partition assignment is the primary lever for Kafka throughput tuning. A topic with 1 partition is processed serially by one consumer per group, regardless of how many consumers exist. Increasing to 32 partitions lets 32 consumers process in parallel, multiplying throughput by 32x (assuming the bottleneck is consumer processing, not the broker).

Ordering is guaranteed only within a partition. If event ordering across a set of related messages is required (all events for a given user ID must be processed in order), use the user ID as the partition key. All messages with the same key land on the same partition and are delivered in order.

Backpressure and Dead Letter Queues

Backpressure occurs when consumers are slower than producers: queue depth grows unboundedly. The right response is to monitor queue depth as a primary operational metric and scale consumers horizontally when it exceeds a threshold. SQS queue depth is directly available in CloudWatch. Kafka consumer lag (the number of unconsumed messages per consumer group) is the equivalent metric.

Dead letter queues (DLQs) hold messages that failed processing after N retry attempts. Without a DLQ, a poison message (one that always fails, due to a bug or malformed payload) blocks consumer processing indefinitely or gets retried forever. A DLQ sidelines the bad message so normal processing continues. Operators can inspect DLQ contents to diagnose and fix the underlying issue, then replay or discard the messages.

When to Use a Message Queue

Propose a message queue in system design whenever:

  • An operation is inherently async (email delivery, image resizing, push notifications)
  • A producer emits events that multiple independent consumers need (fan-out)
  • You need to absorb traffic spikes without dropping requests
  • You need durable handoff between two services with different availability SLAs

Do not use a message queue when:

  • The caller needs the result synchronously (use RPC or HTTP)
  • Latency must be under ~5ms end-to-end (queue serialization and network add overhead)
  • The operation is stateless and idempotency is not achievable

Interview Tip

Most candidates mention a message queue when an async operation appears in the design, which is correct. The follow-up that separates strong answers: "what if the consumer crashes after processing but before acknowledging?" The expected answer is at-least-once delivery and idempotent consumer design. Concretely: use a database unique constraint or a Redis SET with the message ID to deduplicate. If you also mention dead letter queues and consumer lag as the primary operational metric, you have covered the production concerns that distinguish senior-level answers.