System design interviews test whether you can think like a senior engineer — not just write code, but design scalable, fault-tolerant systems from scratch in 45 minutes. Most candidates fail not because they lack knowledge, but because they lack a structured approach.
This guide gives you a proven 7-step framework that works for any system design question — URL shortener, chat app, payment system, notification service, search engine, or social media feed. Use it as a template in every interview.
These are rough guides, not rigid rules. Let the interviewer's interests guide where you spend more time. If they keep asking about the database, stay there longer.
Never start designing immediately. Ask clarifying questions first. This demonstrates senior engineering thinking and prevents you from designing the wrong system.
Functional requirements — what the system must do:
Non-functional requirements — how the system must perform:
Back-of-envelope calculations show you think in systems, not just code. Interviewers at FAANG explicitly look for this. You don't need exact numbers — order-of-magnitude estimates are fine.
Key numbers to memorise:
Power of 2: 2^10 = 1K, 2^20 = 1M, 2^30 = 1B Latency (approximate): L1 cache hit: 0.5 ns Main memory access: 100 ns SSD random read: 100 µs HDD random seek: 10 ms Network round-trip (same DC): 0.5 ms Network round-trip (cross-continent): 150 ms Throughput (rough): SSD read: 500 MB/s Network: 1-10 Gbps = 125 MB/s – 1.25 GB/s One server: ~10K–100K RPS (depends on workload)
Example estimation — URL Shortener at 100M URLs, 1B redirects/day:
# Write QPS: 100M URLs / (365 days × 86400 sec) ≈ 3 URLs/sec (very low) # Read QPS (100:1 read:write ratio): 1B redirects / 86400 sec ≈ 11,500 redirects/sec ≈ ~12K RPS # Storage (5 years, 100M URLs/yr): URL record ≈ 500 bytes 100M URLs × 500 bytes = 50 GB/yr → 250 GB over 5 years # Cache (80/20 rule — top 20% URLs get 80% traffic): 12K RPS × 500 bytes = 6 MB/sec → cache 20% = ~50 GB in Redis # Bandwidth: Read: 12K RPS × 500 bytes ≈ 6 MB/s → trivial Write: 3 RPS × 500 bytes ≈ negligible
Draw the skeleton of the system — the key components and how data flows between them. Start simple, then add complexity.
Standard HLD template (works for almost every system):
Client → CDN (static assets / global edge caching)
↓
DNS → Load Balancer (L4/L7)
↓
API Gateway (auth, rate limiting, routing)
↓
App Servers (stateless, horizontally scalable)
↓
┌──────────────────────────────────────┐
│ Cache Layer │ Message Queue │
│ (Redis/MC) │ (Kafka/RabbitMQ) │
└──────────────────────────────────────┘
↓
Primary DB (PostgreSQL / MySQL / DynamoDB)
+ Read Replicas
+ Object Store (S3 for files/media)
Define the core API endpoints. This forces clarity on what the system actually does and what data it needs.
REST API pattern (use for most systems):
# URL Shortener API:
POST /api/v1/urls
Body: { "original_url": "https://...", "custom_alias": "mylink", "expire_at": "2027-01-01" }
Response: { "short_url": "https://short.ly/abc123", "short_code": "abc123" }
GET /api/v1/urls/{short_code}
Response: 301 Redirect to original_url (or 302 for analytics tracking)
GET /api/v1/urls/{short_code}/stats
Response: { "clicks": 12500, "created_at": "...", "last_accessed": "..." }
DELETE /api/v1/urls/{short_code}
Response: 204 No Content
Key API design decisions to mention:
Choose your database and define the schema. Explain why you chose this database — this is where most interviewers probe.
Database decision framework:
Structured data, ACID transactions, complex queries, <10TB
Flexible schema, nested objects, varied data shapes
Simple lookups by ID, ultra-high throughput, low latency
Time-series, write-heavy, multi-datacenter, 100M+ rows
Relationship-heavy data: social graph, fraud detection
Full-text search, faceted filtering, log analytics
Example — URL Shortener schema:
-- SQL (works fine at our scale: 250GB over 5 years) CREATE TABLE urls ( id BIGINT PRIMARY KEY, -- auto-increment or snowflake ID short_code VARCHAR(8) NOT NULL UNIQUE, -- "abc123" original_url TEXT NOT NULL, user_id BIGINT, -- nullable (anonymous allowed) created_at TIMESTAMP DEFAULT NOW(), expire_at TIMESTAMP, click_count BIGINT DEFAULT 0 ); CREATE INDEX idx_short_code ON urls(short_code); -- lookup by short code -- For analytics (separate service, avoid write contention): CREATE TABLE clicks ( id BIGINT PRIMARY KEY, short_code VARCHAR(8), clicked_at TIMESTAMP, ip_country VARCHAR(3), referrer TEXT );
urls table) from analytics (write clicks table). Never do analytics writes in the critical path of a redirect.Pick 2–3 of the most interesting or challenging components and explain them in depth. Let the interviewer's follow-up questions guide you.
Common deep-dive areas by system type:
Deep dive example — URL short code generation:
# Option 1: MD5 hash of original URL → take first 7 chars → Base62 encode
# Problem: collision risk; same URL → same code (dedup needed)
# Option 2: Auto-increment ID → Base62 encode
# ID: 1234567 → Base62: "5f3k2" (7 chars → 3.5 trillion combinations)
# Advantages: no collision; predictable; short and ordered
# Problem: single point of failure for ID generation
# Option 3: Distributed ID generator (Snowflake-style)
# 64-bit ID: [timestamp(41)] [datacenter(5)] [machine(5)] [sequence(12)]
# Generates ~4096 unique IDs/millisecond/machine
# No coordination needed between nodes
# Base62 encoding:
CHARS = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
def to_base62(num):
result = []
while num:
result.append(CHARS[num % 62])
num //= 62
return ''.join(reversed(result)) or '0'
to_base62(123456789) # → "8m0Kx" (5 chars → enough for billions of URLs)
Deep dive example — caching strategy:
# Cache-aside (most common): # 1. GET /r/abc123 → check Redis cache # 2. Cache hit → return original URL immediately (sub-ms) # 3. Cache miss → query DB → write to cache → return URL # TTL: 24h for popular URLs; 1h for unpopular # What to cache: top 20% of URLs get 80% of traffic (Pareto principle) # Cache size: 10M hot URLs × 500 bytes = 5GB → fits in one Redis node # Cache eviction: LRU (least recently used) # Cache warming on startup: pre-load most-clicked URLs from DB # Write-through: on new URL creation, also write to cache (avoids cold start)
Proactively identify bottlenecks and explain how your design handles them. This is where senior vs junior candidates diverge — juniors build a system; seniors build a resilient system.
Bottlenecks to identify and address:
Scaling levers to know:
Vertical scaling: bigger machines (quick but expensive, has limits) Horizontal scaling: more machines (preferred, infinite in theory) Caching: CDN → L1 cache (app) → L2 cache (Redis) → DB Database: - Read replicas: offload read traffic from primary - Sharding: partition data across multiple DB instances - Index tuning: right indexes on the right columns Async processing: move non-critical work to message queue (email, analytics) CDN: static assets + geographically distributed caching
Key trade-offs to demonstrate:
Cache static assets and API responses at edge nodes near users. Reduces latency from 150ms to <10ms for cached content.
Copy primary DB to 1–3 read-only replicas. Route all reads to replicas; writes to primary. Scales reads 3–5×.
Decouple producer/consumer. Absorbs traffic spikes. Enables async processing (email, analytics, notifications).
Distribute data across nodes so adding/removing a node only remaps 1/N of keys. Essential for cache clusters and sharding.
Space-efficient probabilistic set membership test. Use to avoid DB lookups for definitely non-existent keys (zero false negatives).
Partition data horizontally across multiple DB instances. Shard key choice is critical — poor choice creates hot shards.
Stop calling a failing downstream service. Fail fast + return cached/default response. Prevents cascade failures.
One node is "leader" for writes; others are followers. Used in databases, distributed locks (ZooKeeper, etcd).
Store every state change as an immutable event. Rebuild current state by replaying events. Audit log for free.
Command Query Responsibility Segregation. Separate read/write models. Read model optimised for queries; write model for commands.
Distributed transaction across microservices via a chain of local transactions with compensating actions on failure.
Token bucket: steady-state rate + burst. Sliding window: precise per-minute rate. Implemented in API Gateway or app layer.