System Design Interview – An Insider's Guide
Alex Xu's book is a practical playbook for system design interviews. Its real value is a repeatable framework for decomposing large, open-ended problems, backed by concrete examples of real-world systems.
---
The Interview Framework
Four steps to structure any system design interview:
Step 1 — Understand the problem (3–10 min)
Ask before designing. Never assume.
Step 2 — High-level design (10–15 min)
Sketch the major components and data flow. Get buy-in before diving deep.
Client → Load Balancer → API Servers → Cache → Database
→ Message Queue → Workers
→ CDN (for static assets)
Step 3 — Deep dive (10–25 min)
Pick 2–3 components that matter most for this problem and go deep:
Step 4 — Wrap up (3–5 min)
---
Back-of-Envelope Estimation
Quick math to validate scale assumptions before designing.
Common Numbers to Memorize
| Resource | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| Main memory reference | 100 ns |
| SSD random read | 150 µs |
| HDD seek | 10 ms |
| Round-trip within same datacenter | 0.5 ms |
| Round-trip US to Europe | 150 ms |
| Unit | Value |
|---|---|
| 1 million req/day | ~12 req/sec |
| 1 billion req/day | ~11,500 req/sec |
| 1 byte | 1 B |
| 1 KB | 10³ B |
| 1 MB | 10⁶ B |
| 1 GB | 10⁹ B |
| 1 TB | 10¹² B |
Example: Twitter-like System
300M MAU, 50% daily active → 150M DAU
Each user tweets 2x/day → 300M tweets/day
Tweet size: 140 chars ≈ 280 bytes
Write QPS = 300M / 86400 ≈ 3500 tweets/sec
Peak QPS = 3500 × 2 = 7000 tweets/sec
Storage/day = 300M × 280B = 84 GB/day
Storage/5yr = 84 × 365 × 5 ≈ 153 TB
---
Building Blocks
Load Balancer
Distributes traffic across multiple servers. Prevents any single server from becoming a bottleneck.
Client
↓
Load Balancer (round-robin / least-connections / IP hash)
↓ ↓ ↓
Server 1 Server 2 Server 3
CDN (Content Delivery Network)
Caches static content (images, JS, CSS, video) at edge nodes geographically close to users.
User (Jakarta) → CDN edge (Singapore) → Cache hit → served locally
→ Cache miss → Origin server → CDN → User
Use CDN for: static assets, video content, anything that doesn't change per user.
Caching
Reduces database load. Always ask: what's the cache invalidation strategy?
| Strategy | How it works | Best for |
|---|---|---|
| Cache-aside | App checks cache first, loads from DB on miss, writes to cache | Read-heavy workloads |
| Write-through | Write to cache and DB simultaneously | Data that must be consistent |
| Write-back | Write to cache only, flush to DB asynchronously | Write-heavy, can tolerate brief data loss |
| Read-through | Cache sits in front of DB, handles misses automatically | Simpler app code |
Cache-aside:
Read: Cache hit → return
Cache miss → DB → store in cache → return
Write: Update DB → invalidate cache entry
Eviction policies: LRU (Least Recently Used), LFU (Least Frequently Used), TTL-based.
Database Replication
Primary (writes) → Replica 1 (reads)
→ Replica 2 (reads)
→ Replica 3 (reads)
Database Sharding
Split data horizontally across multiple database instances.
User IDs 0–9M → Shard 1
User IDs 10–19M → Shard 2
User IDs 20–29M → Shard 3
Challenges:
---
Consistent Hashing
Solves the resharding problem when adding or removing nodes.
Virtual ring: 0 ────────────────────── 360°
Nodes placed at hash(nodeId) positions on the ring
Keys assigned to the next clockwise node
Adding a node: only keys between previous node and new node move
Removing a node: only that node's keys move to its successor
Used by: Cassandra, DynamoDB, Amazon ElastiCache
---
Rate Limiter
Protects services from being overwhelmed by too many requests.
Algorithms
Token Bucket — a bucket holds tokens, each request consumes one, tokens refill at a fixed rate.
Bucket capacity: 10 tokens
Refill rate: 2 tokens/sec
Burst of 10 requests → consumed immediately
Next request → must wait for refill
Allows bursting. Used by: AWS, Stripe.
Leaky Bucket — requests enter a queue, processed at a fixed output rate. Queue drops excess.
Incoming: burst of 100 req
Queue: holds 10
Output: 2 req/sec steady
→ 90 requests dropped
Smooths output. No bursting.
Fixed Window Counter — count requests per fixed time window (e.g., per minute).
Problem: burst at window boundary counts across two windows.
Sliding Window Log — store timestamp of each request; count requests in the last N seconds.
Most accurate, but memory-heavy.
Sliding Window Counter — hybrid: fixed window counter + weighted count from previous window.
Rate = current_window_count + prev_window_count × overlap_ratio
Where to Store Counters
Redis with atomic increment (INCR) + TTL is the standard solution for distributed rate limiting.
---
Unique ID Generator
Requirements
Approaches
UUID — 128-bit random identifier.
550e8400-e29b-41d4-a716-446655440000
Simple, no coordination. Not sortable, not compact.
Database auto-increment — does not work across multiple databases.
Snowflake ID (Twitter) — 64-bit integer composed of:
| 1 bit (sign) | 41 bits (timestamp ms) | 10 bits (machine ID) | 12 bits (sequence) |
Used by: Twitter, Discord, Instagram
---
URL Shortener
Core flow:
POST /shorten { url: "https://long-url.com/..." }
→ generate short code
→ store { short_code → long_url } in DB
→ return "https://short.ly/abc123"
GET /abc123
→ lookup short_code in cache
→ 301 (permanent) or 302 (temporary) redirect to long_url
Short code generation:
Base-62 encoding of auto-incremented ID
Characters: [0-9 A-Z a-z] = 62 symbols
ID = 11157310 → base62 → "B3D3"
7 chars → 62^7 = 3.5 trillion URLs
Schema:
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_key VARCHAR(8) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
---
News Feed (Timeline)
Two core models for delivering posts to followers.
Fanout on Write (Push)
When a user posts, immediately push to all followers' feed caches.
User A posts → fetch A's followers → write to each follower's feed cache
Fanout on Read (Pull)
When a user opens their feed, fetch recent posts from everyone they follow.
User B opens feed → fetch B's followees → merge their recent posts → return
Hybrid (Used by Twitter, Instagram)
Feed = precomputed_feed_cache + real-time_celebrity_posts (fetched on read)
---
Chat System
Transport
Use WebSocket for full-duplex, low-latency messaging.
Client ←→ WebSocket connection ←→ Chat Server
HTTP long-polling is an alternative but wastes connections.
Message Storage
Message ID requirements:
Use Snowflake IDs or local sequence numbers per chat room.
Presence (Online Status)
user:{id}:status = online in Redis with a TTLGroup Chat Delivery
Message sent to group → message stored → one copy per group, not per member
Member reads → fetch messages after their last_read_message_id
Avoid fanout to every member on send — instead track read position per member.
---
Search Autocomplete
Suggest completions as the user types.
Trie Data Structure
Input: "be"
root
├── b
│ └── e (count: 40)
│ ├── a (count: 35) → "bea..."
│ └── e (count: 20) → "bee..."
Scaling
Trie fits in memory for a single server. At scale:
Shard 1: a–f prefixes
Shard 2: g–m prefixes
Shard 3: n–z prefixes
---
Video Streaming (YouTube-like)
Upload Flow
Client → Chunked upload → Raw storage (S3)
→ Transcoding workers (multiple resolutions: 360p, 720p, 1080p, 4K)
→ Transcoded storage (S3)
→ CDN distribution
→ Metadata DB (title, description, author)
Streaming Flow
Client → CDN edge node → cache hit → video chunks served
→ cache miss → origin S3 → CDN → client
Protocol: DASH or HLS (adaptive bitrate — adjusts quality based on bandwidth)
Why chunked upload?
---
Notification System
Deliver push notifications, SMS, and email across multiple channels.
Architecture
Event sources (API servers, schedulers)
↓
Notification Service
↓
Message Queue (decouple and buffer)
↓
Workers per channel:
- iOS/Android → APNs / FCM
- SMS → Twilio, SNS
- Email → SendGrid, SES
Key Design Decisions
---
Design Trade-off Cheat Sheet
| Decision | Option A | Option B | Pick A when… |
|---|---|---|---|
| Consistency vs availability | Strong (CP) | Eventually consistent (AP) | Finance, inventory |
| SQL vs NoSQL | Relational | Document / Wide-column | Structured + joins vs scale + flexibility |
| Cache invalidation | TTL | Event-driven invalidation | Acceptable staleness vs strict freshness |
| Fanout | On write (push) | On read (pull) | Read-heavy vs write-heavy, few vs many followers |
| Sync vs async | Synchronous API | Message queue | Low latency vs high throughput / resilience |
| Monolith vs microservices | Single service | Independent services | Small team / early stage vs scale / team autonomy |