System Design Interview – An Insider's Guide

Alex Xu's book is a practical playbook for system design interviews. Its real value is a repeatable framework for decomposing large, open-ended problems, backed by concrete examples of real-world systems.

---

The Interview Framework

Four steps to structure any system design interview:

Step 1 — Understand the problem (3–10 min)

Ask before designing. Never assume.

Who are the users? What do they do?

What scale? (DAU, QPS, storage)

What features are in scope?

What are the consistency, availability, latency requirements?

Step 2 — High-level design (10–15 min)

Sketch the major components and data flow. Get buy-in before diving deep.

Client → Load Balancer → API Servers → Cache → Database
                                     → Message Queue → Workers
                                     → CDN (for static assets)

Step 3 — Deep dive (10–25 min)

Pick 2–3 components that matter most for this problem and go deep:

Data model and schema

Critical algorithms or data structures

Bottlenecks and how to resolve them

Step 4 — Wrap up (3–5 min)

Identify bottlenecks and future scaling concerns

Summarize design decisions and trade-offs made

---

Back-of-Envelope Estimation

Quick math to validate scale assumptions before designing.

Common Numbers to Memorize

Resource	Latency
L1 cache reference	0.5 ns
Main memory reference	100 ns
SSD random read	150 µs
HDD seek	10 ms
Round-trip within same datacenter	0.5 ms
Round-trip US to Europe	150 ms

Unit	Value
1 million req/day	~12 req/sec
1 billion req/day	~11,500 req/sec
1 byte	1 B
1 KB	10³ B
1 MB	10⁶ B
1 GB	10⁹ B
1 TB	10¹² B

Example: Twitter-like System

300M MAU, 50% daily active → 150M DAU
Each user tweets 2x/day → 300M tweets/day
Tweet size: 140 chars ≈ 280 bytes

Write QPS = 300M / 86400 ≈ 3500 tweets/sec
Peak QPS  = 3500 × 2     = 7000 tweets/sec

Storage/day = 300M × 280B = 84 GB/day
Storage/5yr = 84 × 365 × 5 ≈ 153 TB

---

Building Blocks

Load Balancer

Distributes traffic across multiple servers. Prevents any single server from becoming a bottleneck.

Client
  ↓
Load Balancer (round-robin / least-connections / IP hash)
  ↓         ↓         ↓
Server 1  Server 2  Server 3

Round-robin — even distribution, simple

Least connections — routes to server with fewest active connections

IP hash — same client always hits same server (useful for session stickiness)

CDN (Content Delivery Network)

Caches static content (images, JS, CSS, video) at edge nodes geographically close to users.

User (Jakarta) → CDN edge (Singapore) → Cache hit → served locally
                                      → Cache miss → Origin server → CDN → User

Use CDN for: static assets, video content, anything that doesn't change per user.

Caching

Reduces database load. Always ask: what's the cache invalidation strategy?

Strategy	How it works	Best for
Cache-aside	App checks cache first, loads from DB on miss, writes to cache	Read-heavy workloads
Write-through	Write to cache and DB simultaneously	Data that must be consistent
Write-back	Write to cache only, flush to DB asynchronously	Write-heavy, can tolerate brief data loss
Read-through	Cache sits in front of DB, handles misses automatically	Simpler app code

Cache-aside:
  Read:  Cache hit → return
         Cache miss → DB → store in cache → return
  Write: Update DB → invalidate cache entry

Eviction policies: LRU (Least Recently Used), LFU (Least Frequently Used), TTL-based.

Database Replication

Primary (writes) → Replica 1 (reads)
                → Replica 2 (reads)
                → Replica 3 (reads)

Improves read throughput and fault tolerance

Introduces replication lag — reads from replicas may be stale

Database Sharding

Split data horizontally across multiple database instances.

User IDs 0–9M   → Shard 1
User IDs 10–19M → Shard 2
User IDs 20–29M → Shard 3

Challenges:

Resharding — moving data when a shard is full or imbalanced

Celebrity (hotspot) problem — one shard overloaded by popular data

Cross-shard joins — must be done in application layer

---

Consistent Hashing

Solves the resharding problem when adding or removing nodes.

Virtual ring: 0 ────────────────────── 360°

Nodes placed at hash(nodeId) positions on the ring
Keys assigned to the next clockwise node

Adding a node: only keys between previous node and new node move
Removing a node: only that node's keys move to its successor

Minimizes key redistribution when scaling

Virtual nodes (vnodes) — each physical node gets multiple positions on the ring to balance load

Used by: Cassandra, DynamoDB, Amazon ElastiCache

---

Rate Limiter

Protects services from being overwhelmed by too many requests.

Algorithms

Token Bucket — a bucket holds tokens, each request consumes one, tokens refill at a fixed rate.

Bucket capacity: 10 tokens
Refill rate: 2 tokens/sec

Burst of 10 requests → consumed immediately
Next request → must wait for refill

Allows bursting. Used by: AWS, Stripe.

Leaky Bucket — requests enter a queue, processed at a fixed output rate. Queue drops excess.

Incoming: burst of 100 req
Queue: holds 10
Output: 2 req/sec steady

→ 90 requests dropped

Smooths output. No bursting.

Fixed Window Counter — count requests per fixed time window (e.g., per minute).

Problem: burst at window boundary counts across two windows.

Sliding Window Log — store timestamp of each request; count requests in the last N seconds.

Most accurate, but memory-heavy.

Sliding Window Counter — hybrid: fixed window counter + weighted count from previous window.

Rate = current_window_count + prev_window_count × overlap_ratio

Where to Store Counters

Redis with atomic increment (INCR) + TTL is the standard solution for distributed rate limiting.

---

Unique ID Generator

Requirements

Globally unique

Sortable by time (optional but often required)

High throughput

Approaches

UUID — 128-bit random identifier.

550e8400-e29b-41d4-a716-446655440000

Simple, no coordination. Not sortable, not compact.

Database auto-increment — does not work across multiple databases.

Snowflake ID (Twitter) — 64-bit integer composed of:

| 1 bit (sign) | 41 bits (timestamp ms) | 10 bits (machine ID) | 12 bits (sequence) |

Sortable by time

~4096 IDs/ms per machine

Requires synchronized clocks (NTP)

Used by: Twitter, Discord, Instagram

---

URL Shortener

Core flow:

POST /shorten  { url: "https://long-url.com/..." }
  → generate short code
  → store { short_code → long_url } in DB
  → return "https://short.ly/abc123"

GET /abc123
  → lookup short_code in cache
  → 301 (permanent) or 302 (temporary) redirect to long_url

Short code generation:

Base-62 encoding of auto-incremented ID
Characters: [0-9 A-Z a-z] = 62 symbols

ID = 11157310 → base62 → "B3D3"
7 chars → 62^7 = 3.5 trillion URLs

Use 301 for SEO (browser caches redirect, reduces server load)

Use 302 for analytics (every click hits server, can track)

Schema:

CREATE TABLE urls (
  id        BIGINT PRIMARY KEY AUTO_INCREMENT,
  short_key VARCHAR(8) UNIQUE NOT NULL,
  long_url  TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

---

News Feed (Timeline)

Two core models for delivering posts to followers.

Fanout on Write (Push)

When a user posts, immediately push to all followers' feed caches.

User A posts → fetch A's followers → write to each follower's feed cache

Reads are fast (pre-computed feed)

Writes are expensive for users with millions of followers (celebrities)

Fanout on Read (Pull)

When a user opens their feed, fetch recent posts from everyone they follow.

User B opens feed → fetch B's followees → merge their recent posts → return

Writes are cheap

Reads are slow and heavy

Hybrid (Used by Twitter, Instagram)

Regular users: fanout on write

Celebrity users (>N followers): fanout on read

Feed = precomputed_feed_cache + real-time_celebrity_posts (fetched on read)

---

Chat System

Transport

Use WebSocket for full-duplex, low-latency messaging.

Client ←→ WebSocket connection ←→ Chat Server

HTTP long-polling is an alternative but wastes connections.

Message Storage

Recent messages: in-memory cache (Redis) for fast retrieval

History: append-only storage (Cassandra or HBase) — high write throughput, time-range scans

Message ID requirements:

Unique within a chat

Sortable by time

Use Snowflake IDs or local sequence numbers per chat room.

Presence (Online Status)

On connect: set user:{id}:status = online in Redis with a TTL

Heartbeat every 30s resets TTL

On disconnect / TTL expiry: status becomes offline

Fanout presence updates to friends via pub/sub (Redis Pub/Sub)

Group Chat Delivery

Message sent to group → message stored → one copy per group, not per member
Member reads → fetch messages after their last_read_message_id

Avoid fanout to every member on send — instead track read position per member.

---

Search Autocomplete

Suggest completions as the user types.

Trie Data Structure

Input: "be"

        root
        ├── b
        │   └── e (count: 40)
        │       ├── a (count: 35)  → "bea..."
        │       └── e (count: 20)  → "bee..."

Each node stores prefix counts

Traverse to current prefix node, collect top-k children by count

Scaling

Trie fits in memory for a single server. At scale:

Precompute top-k completions per prefix and cache them

Update asynchronously (rebuild trie from query logs daily/hourly, not on every search)

Shard trie by first character(s) of prefix

Shard 1: a–f prefixes
Shard 2: g–m prefixes
Shard 3: n–z prefixes

---

Video Streaming (YouTube-like)

Upload Flow

Client → Chunked upload → Raw storage (S3)
                        → Transcoding workers (multiple resolutions: 360p, 720p, 1080p, 4K)
                        → Transcoded storage (S3)
                        → CDN distribution
                        → Metadata DB (title, description, author)

Streaming Flow

Client → CDN edge node → cache hit → video chunks served
                       → cache miss → origin S3 → CDN → client

Protocol: DASH or HLS (adaptive bitrate — adjusts quality based on bandwidth)

Why chunked upload?

Resumable on failure

Can parallelize transcoding per chunk

---

Notification System

Deliver push notifications, SMS, and email across multiple channels.

Architecture

Event sources (API servers, schedulers)
  ↓
Notification Service
  ↓
Message Queue (decouple and buffer)
  ↓
Workers per channel:
  - iOS/Android → APNs / FCM
  - SMS        → Twilio, SNS
  - Email      → SendGrid, SES

Key Design Decisions

At-least-once delivery — use idempotent operations; recipients may get duplicates during retries

Priority queues — urgent alerts (security, payment) get a dedicated high-priority queue

Rate limiting — cap notifications per user per day to avoid spam

Opt-out support — check user preference before sending each notification

---

Design Trade-off Cheat Sheet

Decision	Option A	Option B	Pick A when…
Consistency vs availability	Strong (CP)	Eventually consistent (AP)	Finance, inventory
SQL vs NoSQL	Relational	Document / Wide-column	Structured + joins vs scale + flexibility
Cache invalidation	TTL	Event-driven invalidation	Acceptable staleness vs strict freshness
Fanout	On write (push)	On read (pull)	Read-heavy vs write-heavy, few vs many followers
Sync vs async	Synchronous API	Message queue	Low latency vs high throughput / resilience
Monolith vs microservices	Single service	Independent services	Small team / early stage vs scale / team autonomy

---

Resources

System Design Interview Vol. 1 — Alex Xu

System Design Interview Vol. 2 — Alex Xu & Sahn Lam

ByteByteGo Newsletter

System Design Interview – An Insider's Guide

On This Page

System Design Interview – An Insider's Guide

The Interview Framework

Back-of-Envelope Estimation

Common Numbers to Memorize

Example: Twitter-like System

Building Blocks

Load Balancer

CDN (Content Delivery Network)

Caching

Database Replication

Database Sharding

Consistent Hashing

Rate Limiter

Algorithms

Where to Store Counters

Unique ID Generator

Requirements

Approaches

URL Shortener

News Feed (Timeline)

Fanout on Write (Push)

Fanout on Read (Pull)

Hybrid (Used by Twitter, Instagram)

Chat System

Transport

Message Storage

Presence (Online Status)

Group Chat Delivery

Search Autocomplete

Trie Data Structure

Scaling

Video Streaming (YouTube-like)

Upload Flow

Streaming Flow

Notification System

Architecture

Key Design Decisions

Design Trade-off Cheat Sheet

Resources

Topics

Found This Helpful?