Glossary Data Systems

Blob Storage

Object storage for unstructured binary data: images, videos, documents, ML model weights. Designed for durability and throughput at scale, not low-latency random access.

Object Storage vs Block Storage vs File Storage

These three storage paradigms are frequently confused in system design discussions, and the confusion costs candidates points.

Block storage (AWS EBS, GCP Persistent Disk) presents raw storage volumes to a host. The OS mounts it like a local disk. Low latency (~0.1–1ms), supports random reads and writes, but attached to a single instance. Use it for databases and OS volumes.

File storage (NFS, AWS EFS) is a shared filesystem mounted by multiple instances simultaneously. Higher latency than block storage (~1–10ms). Use it when multiple services need access to the same directory tree.

Object storage (S3, GCS, Azure Blob) stores discrete objects (files) identified by a key. No directory hierarchy (prefixes simulate one, but it's flat under the hood). Eventual consistency on overwrites (though S3 has had strong consistency since December 2020). Designed for durability (S3 targets 11 nines: 99.999999999%) and throughput, not latency. Expect 20–200ms per GET request. Scales to exabytes.

The rule of thumb: if you're storing user-uploaded content, ML artifacts, backups, or static assets, it's object storage. If you're running a database, it's block storage.

How Uploads Work at Scale

For objects under ~100MB, a standard PUT works fine. Above that, use multipart upload: the client splits the object into chunks (typically 5–100MB each), uploads them in parallel, and the storage service reassembles them. A 10GB video upload with 100MB chunks runs 100 parallel part uploads, cutting wall-clock time from ~10 minutes to ~1 minute on a good connection.

Never stream large uploads through your application server. Generate a presigned URL on the backend (an authenticated S3/GCS URL valid for a short window) and let the client upload directly to the storage service. This eliminates your server from the data path entirely.

Storage Classes and Cost

Object storage tiers by access frequency:

Class Access Latency Cost/GB/month Use Case
Standard (hot) ~20ms ~$0.023 Active user content
Infrequent Access ~20ms ~$0.0125 Backups, old media
Glacier (cold) Minutes–hours ~$0.004 Compliance archives

In practice: set lifecycle rules to transition objects to cheaper tiers automatically after 30/90/365 days.

When Not to Use Object Storage for Reads

Object storage latency is unsuitable for high-frequency reads on small objects. Serving profile images directly from S3 at 10k RPS means 10k requests at 20–100ms each, which is fine for throughput, but the latency shows. The standard pattern is S3 as the origin with a CDN in front: the CDN caches the object at the edge, and S3 is only hit on cache misses.

Interview Tip

The tell that separates good answers from surface-level ones is presigned URLs. Most candidates describe uploading files through the application server. When you suggest presigned URLs to bypass the server entirely, you're signalling that you understand the data path cost (bandwidth, memory, and processing), not just the happy path. Pair it with multipart upload for large objects and CDN for read serving, and you've covered the full production picture.