System Design Interview Guide 2026

How to Approach Any System Design Question — A 45-Minute Framework

System design interviews test whether you can think like a senior engineer — not just write code, but design scalable, fault-tolerant systems from scratch in 45 minutes. Most candidates fail not because they lack knowledge, but because they lack a structured approach.

This guide gives you a proven 7-step framework that works for any system design question — URL shortener, chat app, payment system, notification service, search engine, or social media feed. Use it as a template in every interview.

What this guide covers: The full 45-minute framework, capacity estimation math, common architecture patterns, trade-off vocabulary, and red flags to avoid.

The 45-Minute Interview Timeline

0–5 min
Clarify requirements
5–10 min
Capacity estimation
10–20 min
High-level design
20–25 min
API design
25–35 min
Deep dive
35–45 min
Trade-offs & scale

These are rough guides, not rigid rules. Let the interviewer's interests guide where you spend more time. If they keep asking about the database, stay there longer.

1
Clarify Requirements (0–5 min)
~5 min

Never start designing immediately. Ask clarifying questions first. This demonstrates senior engineering thinking and prevents you from designing the wrong system.

Functional requirements — what the system must do:

  • What are the core features? (e.g., for URL shortener: shorten a URL, redirect to original)
  • What features are out of scope? (Don't gold-plate — 45 min is short)
  • Who are the users? (consumers, businesses, internal systems?)

Non-functional requirements — how the system must perform:

  • Scale: How many users? Daily active users (DAU)? Requests per second (RPS)?
  • Latency: What's the acceptable p99 latency? (e.g., <100ms for search autocomplete)
  • Availability: 99.9%? 99.99%? (99.9% = 8.7 hrs downtime/yr; 99.99% = 52 min/yr)
  • Consistency: Strong consistency or eventual consistency acceptable?
  • Durability: Can we lose any data? (financial = no; analytics = maybe)
  • Read/write ratio: Read-heavy or write-heavy? (URL shortener: 100:1 read-heavy)
Write your requirements on the whiteboard/shared doc so both you and the interviewer are aligned. Refer back to them when making design decisions.
2
Capacity Estimation (5–10 min)
~5 min

Back-of-envelope calculations show you think in systems, not just code. Interviewers at FAANG explicitly look for this. You don't need exact numbers — order-of-magnitude estimates are fine.

Key numbers to memorise:

Power of 2:
  2^10 = 1K,  2^20 = 1M,  2^30 = 1B

Latency (approximate):
  L1 cache hit:        0.5 ns
  Main memory access:  100 ns
  SSD random read:     100 µs
  HDD random seek:     10 ms
  Network round-trip (same DC): 0.5 ms
  Network round-trip (cross-continent): 150 ms

Throughput (rough):
  SSD read:   500 MB/s
  Network:    1-10 Gbps = 125 MB/s – 1.25 GB/s
  One server: ~10K–100K RPS (depends on workload)

Example estimation — URL Shortener at 100M URLs, 1B redirects/day:

# Write QPS:
100M URLs / (365 days × 86400 sec) ≈ 3 URLs/sec (very low)

# Read QPS (100:1 read:write ratio):
1B redirects / 86400 sec ≈ 11,500 redirects/sec ≈ ~12K RPS

# Storage (5 years, 100M URLs/yr):
URL record ≈ 500 bytes
100M URLs × 500 bytes = 50 GB/yr → 250 GB over 5 years

# Cache (80/20 rule — top 20% URLs get 80% traffic):
12K RPS × 500 bytes = 6 MB/sec → cache 20% = ~50 GB in Redis

# Bandwidth:
Read: 12K RPS × 500 bytes ≈ 6 MB/s → trivial
Write: 3 RPS × 500 bytes ≈ negligible
Show your math step-by-step on the board. Don't just say "we'll need Redis" — say "we have 12K RPS, caching 20% of 100M URLs = 10GB of hot URLs in Redis."
3
High-Level Design (10–20 min)
~10 min

Draw the skeleton of the system — the key components and how data flows between them. Start simple, then add complexity.

Standard HLD template (works for almost every system):

Client → CDN (static assets / global edge caching)
       ↓
     DNS → Load Balancer (L4/L7)
       ↓
   API Gateway (auth, rate limiting, routing)
       ↓
   App Servers (stateless, horizontally scalable)
       ↓
   ┌──────────────────────────────────────┐
   │  Cache Layer   │  Message Queue      │
   │  (Redis/MC)    │  (Kafka/RabbitMQ)   │
   └──────────────────────────────────────┘
       ↓
   Primary DB (PostgreSQL / MySQL / DynamoDB)
   + Read Replicas
   + Object Store (S3 for files/media)
  • Start left to right: client → network → server → storage
  • Label data flows: show what data moves where and why
  • Keep it simple: no more than 5–7 boxes in HLD; drill down in deep dive
  • Explain as you draw: narrate your thinking — "I'm putting a cache here because our read:write ratio is 100:1"
Common mistake: Jumping to the database choice before establishing the HLD. Draw the skeleton first, then decide components.
4
API Design (20–25 min)
~5 min

Define the core API endpoints. This forces clarity on what the system actually does and what data it needs.

REST API pattern (use for most systems):

# URL Shortener API:
POST   /api/v1/urls
  Body: { "original_url": "https://...", "custom_alias": "mylink", "expire_at": "2027-01-01" }
  Response: { "short_url": "https://short.ly/abc123", "short_code": "abc123" }

GET    /api/v1/urls/{short_code}
  Response: 301 Redirect to original_url (or 302 for analytics tracking)

GET    /api/v1/urls/{short_code}/stats
  Response: { "clicks": 12500, "created_at": "...", "last_accessed": "..." }

DELETE /api/v1/urls/{short_code}
  Response: 204 No Content

Key API design decisions to mention:

  • 301 vs 302 redirect: 301 (permanent) is cached by browser — fewer server hits but can't track analytics. 302 (temporary) hits server every time — enables analytics, disable caching.
  • Idempotency: POST to create URL — is the same URL shortened twice idempotent? (deduplicate or create new?)
  • Pagination: for list endpoints, always paginate. Use cursor-based (not offset) for large datasets.
  • Versioning: /api/v1/ — always version your API from day 1.
  • Auth: API key in header (X-API-Key) or Bearer JWT token.
5
Data Model & Storage (Deep Dive Part 1)
~5 min

Choose your database and define the schema. Explain why you chose this database — this is where most interviewers probe.

Database decision framework:

Relational (PostgreSQL, MySQL)

Structured data, ACID transactions, complex queries, <10TB

Document (MongoDB)

Flexible schema, nested objects, varied data shapes

Key-Value (DynamoDB, Redis)

Simple lookups by ID, ultra-high throughput, low latency

Wide-Column (Cassandra, HBase)

Time-series, write-heavy, multi-datacenter, 100M+ rows

Graph (Neo4j)

Relationship-heavy data: social graph, fraud detection

Search (Elasticsearch)

Full-text search, faceted filtering, log analytics

Example — URL Shortener schema:

-- SQL (works fine at our scale: 250GB over 5 years)
CREATE TABLE urls (
  id          BIGINT PRIMARY KEY,          -- auto-increment or snowflake ID
  short_code  VARCHAR(8)  NOT NULL UNIQUE, -- "abc123"
  original_url TEXT       NOT NULL,
  user_id     BIGINT,                      -- nullable (anonymous allowed)
  created_at  TIMESTAMP   DEFAULT NOW(),
  expire_at   TIMESTAMP,
  click_count BIGINT      DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);  -- lookup by short code

-- For analytics (separate service, avoid write contention):
CREATE TABLE clicks (
  id          BIGINT PRIMARY KEY,
  short_code  VARCHAR(8),
  clicked_at  TIMESTAMP,
  ip_country  VARCHAR(3),
  referrer    TEXT
);
Separate the hot-path (redirect: read urls table) from analytics (write clicks table). Never do analytics writes in the critical path of a redirect.
6
Deep Dive — Core Components (25–35 min)
~10 min

Pick 2–3 of the most interesting or challenging components and explain them in depth. Let the interviewer's follow-up questions guide you.

Common deep-dive areas by system type:

  • URL Shortener: hash generation algorithm (MD5 vs counter vs Base62), collision handling, caching strategy
  • Chat System: WebSocket connection management, message delivery guarantees, offline message delivery
  • Rate Limiter: token bucket vs sliding window algorithm, distributed implementation with Redis
  • Search Autocomplete: trie data structure, ranking algorithm, real-time update pipeline
  • Notification System: fan-out strategies, third-party provider integration, retry logic

Deep dive example — URL short code generation:

# Option 1: MD5 hash of original URL → take first 7 chars → Base62 encode
# Problem: collision risk; same URL → same code (dedup needed)

# Option 2: Auto-increment ID → Base62 encode
# ID: 1234567 → Base62: "5f3k2" (7 chars → 3.5 trillion combinations)
# Advantages: no collision; predictable; short and ordered
# Problem: single point of failure for ID generation

# Option 3: Distributed ID generator (Snowflake-style)
# 64-bit ID: [timestamp(41)] [datacenter(5)] [machine(5)] [sequence(12)]
# Generates ~4096 unique IDs/millisecond/machine
# No coordination needed between nodes

# Base62 encoding:
CHARS = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
def to_base62(num):
    result = []
    while num:
        result.append(CHARS[num % 62])
        num //= 62
    return ''.join(reversed(result)) or '0'

to_base62(123456789)   # → "8m0Kx" (5 chars → enough for billions of URLs)

Deep dive example — caching strategy:

# Cache-aside (most common):
# 1. GET /r/abc123 → check Redis cache
# 2. Cache hit → return original URL immediately (sub-ms)
# 3. Cache miss → query DB → write to cache → return URL
# TTL: 24h for popular URLs; 1h for unpopular

# What to cache: top 20% of URLs get 80% of traffic (Pareto principle)
# Cache size: 10M hot URLs × 500 bytes = 5GB → fits in one Redis node
# Cache eviction: LRU (least recently used)

# Cache warming on startup: pre-load most-clicked URLs from DB
# Write-through: on new URL creation, also write to cache (avoids cold start)
7
Scale, Fault Tolerance & Trade-offs (35–45 min)
~10 min

Proactively identify bottlenecks and explain how your design handles them. This is where senior vs junior candidates diverge — juniors build a system; seniors build a resilient system.

Bottlenecks to identify and address:

  • Single point of failure (SPOF): every component should have redundancy. No SPOF = multiple instances behind a load balancer, multi-AZ deployment.
  • Database hotspot: heavy reads on one server → read replicas. Heavy writes → sharding (by user ID, geographic region, or consistent hash).
  • Cache stampede: many requests hit DB simultaneously on cache miss → use mutex lock or probabilistic early expiry.
  • Thundering herd: cache key expires → many concurrent requests to DB → use cache aside with jitter on TTL.

Scaling levers to know:

Vertical scaling:   bigger machines (quick but expensive, has limits)
Horizontal scaling: more machines (preferred, infinite in theory)
Caching:            CDN → L1 cache (app) → L2 cache (Redis) → DB
Database:
  - Read replicas:  offload read traffic from primary
  - Sharding:       partition data across multiple DB instances
  - Index tuning:   right indexes on the right columns
Async processing:   move non-critical work to message queue (email, analytics)
CDN:                static assets + geographically distributed caching

Key trade-offs to demonstrate:

  • Consistency vs Availability (CAP): "For this URL shortener, we choose availability — a slightly stale redirect is acceptable. We use eventual consistency for analytics counts."
  • Latency vs Durability: "We could skip the cache write on URL creation to reduce latency, but we'd have more DB hits. Given our 100:1 read ratio, we write through to cache — the latency cost is worth it."
  • Simplicity vs Scalability: "A monolith works fine for 1K RPS. At 100K RPS we'd extract the redirect service since it's by far the hottest code path."
  • Cost vs Performance: "We could cache everything in Redis (fastest) or use a smaller cache with DB fallback (cheaper). At $0.02/GB/hr for ElastiCache, 50GB = ~$22/hr — we'd evaluate this against DB cost."
End with: "Given more time I would also design X, Y, Z" — shows you can think beyond the 45-minute scope. Good candidates have follow-up ideas ready.

Common Architecture Patterns Every Candidate Should Know

CDN

Cache static assets and API responses at edge nodes near users. Reduces latency from 150ms to <10ms for cached content.

Read Replicas

Copy primary DB to 1–3 read-only replicas. Route all reads to replicas; writes to primary. Scales reads 3–5×.

Message Queue

Decouple producer/consumer. Absorbs traffic spikes. Enables async processing (email, analytics, notifications).

Consistent Hashing

Distribute data across nodes so adding/removing a node only remaps 1/N of keys. Essential for cache clusters and sharding.

Bloom Filter

Space-efficient probabilistic set membership test. Use to avoid DB lookups for definitely non-existent keys (zero false negatives).

Database Sharding

Partition data horizontally across multiple DB instances. Shard key choice is critical — poor choice creates hot shards.

Circuit Breaker

Stop calling a failing downstream service. Fail fast + return cached/default response. Prevents cascade failures.

Leader Election

One node is "leader" for writes; others are followers. Used in databases, distributed locks (ZooKeeper, etcd).

Event Sourcing

Store every state change as an immutable event. Rebuild current state by replaying events. Audit log for free.

CQRS

Command Query Responsibility Segregation. Separate read/write models. Read model optimised for queries; write model for commands.

Saga Pattern

Distributed transaction across microservices via a chain of local transactions with compensating actions on failure.

Rate Limiting

Token bucket: steady-state rate + burst. Sliding window: precise per-minute rate. Implemented in API Gateway or app layer.

Red Flags That Kill System Design Interviews

  • Starting to design without asking requirements — you might solve the wrong problem
  • Overcomplicating the HLD — drawing 15 boxes when 5 will do. Complexity without justification is a red flag.
  • Choosing a technology without explaining why — "I'll use Kafka" is wrong. "I'll use Kafka because we need async fan-out to 5 downstream services" is right.
  • Ignoring failure scenarios — what happens when the cache fails? When the DB is unavailable? Senior candidates design for failure.
  • Not mentioning trade-offs — every design decision has a cost. If you don't acknowledge it, the interviewer thinks you don't see it.
  • Designing everything in detail upfront — interviewers guide you to interesting areas. Follow their lead.
  • Going silent — always narrate. If you're thinking, say "I'm thinking about whether to use SQL or NoSQL here because…"

Pre-Interview Preparation Checklist

  • Memorise key numbers: latency, throughput, storage units, powers of 2
  • Know the standard HLD template (client → LB → API → cache → DB)
  • Practice 3–5 classic problems end-to-end (URL shortener, rate limiter, chat)
  • Understand CAP theorem and be able to apply it to real databases
  • Know consistent hashing — it comes up in 50%+ of system design interviews
  • Understand SQL vs NoSQL trade-offs with concrete examples
  • Know at least one caching strategy in depth (cache-aside, write-through, write-behind)
  • Understand message queues and when to use them (async, fan-out, backpressure)
  • Practice drawing diagrams fast — even rough boxes + arrows are fine
  • Read the design articles on the rest of this hub — each builds intuition

What to Study Next