Design a Chat System (like WhatsApp / Slack)

Real-time messaging at scale — WebSockets, message delivery, group fan-out, offline users, and presence

June 2026 20 min read Senior Engineers

A chat system is one of the richest system design problems — it touches real-time communication protocols, message delivery guarantees, storage at massive scale, group chat fan-out strategies, and offline/online presence. This walkthrough covers a WhatsApp-scale design: 500M daily active users, 100B messages per day.

New here? The System Design Interview Guide explains the framework used throughout this article.

1 Requirements Clarification

Functional

1:1 messaging (two users)
Group messaging (up to 500 members)
Message delivery receipts (sent / delivered / read)
Online presence indicator
Push notifications for offline users
Message history (persist indefinitely)
Media messages (images, video — out of scope for core design)

Non-Functional

500M DAU
Each user sends avg 50 messages/day → 500M × 50 = 25B messages/day
Message latency <100ms (sender to receiver)
At-least-once delivery guarantee
High availability: 99.99%
Eventual consistency for message ordering
Message history stored for lifetime of account

Key clarification: Does the system need end-to-end encryption (E2E)? For WhatsApp-style: yes (Signal Protocol). For Slack-style: no (server-side encryption only). This changes the server's ability to read/process messages.

2 Capacity Estimation

# Messages:
500M users × 50 messages/day = 25B messages/day
25B ÷ 86,400 sec = ~290,000 messages/sec → 290K msg/sec peak

# Storage per message:
sender_id (8B) + receiver_id (8B) + timestamp (8B) + content (100B avg) + metadata (20B)
≈ 144 bytes per message

# Daily storage:
25B messages × 144B = 3.6 TB/day → 1.3 PB/year

# WebSocket connections:
500M DAU, assume 50% active simultaneously = 250M concurrent WebSocket connections
Each connection needs a chat server → at 100K connections/server → 2,500 chat servers

# Presence updates:
Every online user sends heartbeat every 5s:
250M online users ÷ 5 sec = 50M presence events/sec → handled by dedicated presence service

Scale insight: 250M concurrent WebSocket connections is the defining constraint. This drives the entire architecture — you can't handle this with HTTP polling or a single server cluster.

3 Communication Protocol — Why WebSockets

Protocol	Direction	Latency	Use Case
HTTP Polling	Client → Server only	High (reconnect overhead)	Not suitable — wastes bandwidth, adds latency
HTTP Long Polling	Client → Server (hold open)	Medium	Acceptable fallback, but wastes connections
Server-Sent Events (SSE)	Server → Client only	Low	Good for one-way feeds (notifications), not chat
WebSocket	Full duplex (both ways)	Very low	Chosen — persistent bidirectional connection, sub-10ms overhead

WebSocket enables the server to push messages to clients immediately without the client polling. Once the WebSocket handshake completes (over HTTP Upgrade), the connection stays open — no TCP handshake overhead per message.

# WebSocket connection lifecycle:
Client → HTTP GET /ws HTTP/1.1
         Upgrade: websocket
         Connection: Upgrade
         Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

Server → HTTP/1.1 101 Switching Protocols
         Upgrade: websocket
         Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

# After handshake: full-duplex binary/text frames
# Both client and server can send at any time
# Keep-alive: ping/pong frames every 30s to detect dead connections

4 High-Level Design

Architecture Components

Mobile/Web Client

⇄ WS

Load Balancer (L4 sticky)

⇄ WS

Chat Server (stateful)

↓

Message Queue (Kafka)

→

Message Delivery Worker

→

Push Notification Service

↓

Cassandra (message storage)

Presence Service

↔

Redis (online status)

API Service (REST)

↔

User Service

↔

MySQL (user/group metadata)

Key Insight: Stateful Chat Servers

Chat servers are stateful — each WebSocket connection is pinned to a specific server. The load balancer uses sticky sessions (by user ID) to ensure reconnects go to the same server. This is different from stateless HTTP servers.

The consequence: you can't just "add more chat servers" without routing logic. When User A (on Server 1) sends a message to User B (on Server 7), Server 1 needs a way to reach Server 7.

Solution: pub/sub via Redis or Kafka. Chat servers subscribe to a user-specific channel. When a message arrives for User B, it's published to User B's channel — whichever server holds User B's connection picks it up and delivers it.

5 1:1 Message Flow (The Core Problem)

Sender → Receiver (both online)

# Step 1: Alice sends message to Bob via WebSocket
Alice → ChatServer-A: {
  "type": "message",
  "to": "bob_id",
  "content": "Hey Bob!",
  "client_msg_id": "alice-1234"   // dedup key — client generates
}

# Step 2: ChatServer-A assigns a global message ID and persists to Cassandra
msg_id = generate_snowflake_id()
cassandra.insert(conversation_id, msg_id, sender="alice", content="Hey Bob!", ...)
# Cassandra write is async — don't block the send path

# Step 3: ChatServer-A publishes to User B's Redis pub/sub channel
redis.publish("user:bob_id:messages", json.dumps({
  "msg_id": msg_id,
  "from": "alice_id",
  "content": "Hey Bob!",
  "timestamp": now()
}))

# Step 4: ChatServer-B (which holds Bob's WebSocket) receives from Redis pub/sub
# and immediately pushes to Bob's WebSocket connection
Bob receives message in <10ms

# Step 5: ChatServer-A sends ACK back to Alice
Alice ← ChatServer-A: { "type": "ack", "client_msg_id": "alice-1234", "msg_id": msg_id, "status": "sent" }
# Alice's client changes status: ✓ (sent)

# Step 6: Bob's client sends delivery receipt
Bob → ChatServer-B: { "type": "receipt", "msg_id": msg_id, "status": "delivered" }
ChatServer-B publishes receipt to Alice's channel → Alice sees ✓✓ (delivered)

# Step 7: Bob opens message — read receipt
Bob → ChatServer-B: { "type": "receipt", "msg_id": msg_id, "status": "read" }
→ Alice sees ✓✓ (blue) (read)

The client-generated client_msg_id is critical for deduplication. If Alice's WebSocket drops after sending but before receiving the ACK, she retries — the server detects the duplicate client_msg_id and doesn't store a second copy.

Sender Online, Receiver Offline

# Same steps 1–3 above. But when Redis pub/sub fires:
ChatServer-B tries to deliver → Bob has no active WebSocket connection

# Fallback flow:
1. Message Delivery Worker receives the Kafka event (all messages go to Kafka for durability)
2. Checks presence service: bob_id → OFFLINE
3. Queries Bob's device push tokens from User Service
4. Sends push notification via:
   - APNs (Apple Push Notification service) for iOS
   - FCM (Firebase Cloud Messaging) for Android
   - Web Push for browsers

# When Bob comes back online:
1. Bob's client connects WebSocket to ChatServer (any server via sticky LB)
2. Client sends "sync" request: { "last_msg_id": "msg-4320" }
3. Chat server fetches all messages with msg_id > 4320 from Cassandra
4. Sends missed messages in order
5. Bob's client sends delivery receipts for all received messages

6 Group Chat Fan-out

Group messages are the hardest part of chat system design. A message sent to a 500-member group must be delivered to all 500 members. This is fan-out — 1 write triggers N deliveries.

Fan-out Strategies

Strategy	How It Works	When to Use
Fan-out on write	When a message is sent, immediately publish to each member's Redis channel (500 pub/sub publishes)	Online-heavy groups — most members are online; direct delivery is fast
Fan-out on read	Store message once; each member pulls new messages on sync	Large groups (1000+) — most members are offline; lazy delivery is cheaper
Hybrid	Fan-out on write for online members; fan-out on read for offline members	Recommended — WhatsApp/Slack approach

Hybrid Fan-out Implementation

# Step 1: Alice sends message to Group-123 (500 members)
ChatServer-A receives message → assigns msg_id → persists to Cassandra

# Step 2: Fetch group members and their online status
members = group_service.get_members("group-123")   # [alice, bob, carol, ... 500 members]
online_members, offline_members = presence_service.partition(members)

# Step 3: Fan-out to online members via Redis pub/sub
for user_id in online_members:
    redis.publish(f"user:{user_id}:messages", message_payload)
# If 100 members are online: 100 Redis publishes in parallel

# Step 4: For offline members: record in "undelivered" table
# Next time they come online, sync fetches messages from Cassandra
cassandra.insert("group_undelivered", group_id="group-123",
                 offline_members=offline_members, msg_id=msg_id)

# Step 5: Send push notifications to offline members' devices
for user_id in offline_members:
    push_notification_service.send(user_id, preview=message[:50])

# Cassandra group messages schema:
CREATE TABLE group_messages (
  group_id    UUID,
  msg_id      BIGINT,    -- Snowflake ID (ordered by time)
  sender_id   UUID,
  content     TEXT,
  sent_at     TIMESTAMP,
  PRIMARY KEY (group_id, msg_id)   -- partition by group, cluster by time
) WITH CLUSTERING ORDER BY (msg_id DESC);

Hard limit on fan-out: For groups >500 members, fan-out on write becomes expensive. Twitter's approach for mega-celebrities: pre-compute feeds, use fan-out on read. For chat, cap groups at 500 (WhatsApp) or 1000 members and always use hybrid strategy.

7 Data Model & Storage

Why Cassandra? Message storage has specific characteristics that make Cassandra ideal: append-heavy writes, time-ordered reads (fetch messages in a conversation in time order), linear horizontal scaling, and multi-datacenter replication.

-- Direct messages (1:1 chats):
-- conversation_id = deterministic hash of (min(user1_id, user2_id), max(...))
CREATE TABLE messages (
  conversation_id  UUID,
  msg_id           BIGINT,       -- Snowflake ID: naturally time-ordered
  sender_id        UUID,
  recipient_id     UUID,
  content          TEXT,
  msg_type         TINYINT,      -- 0=text, 1=image, 2=video, 3=file
  sent_at          TIMESTAMP,
  status           TINYINT,      -- 0=sent, 1=delivered, 2=read
  PRIMARY KEY (conversation_id, msg_id)
) WITH CLUSTERING ORDER BY (msg_id DESC)   -- newest first
  AND gc_grace_seconds = 864000;            -- 10-day tombstone window

-- Group messages (same table, group_id as conversation_id):
-- PRIMARY KEY (group_id, msg_id)

-- User last-seen per conversation (for sync on reconnect):
CREATE TABLE conversation_cursors (
  user_id          UUID,
  conversation_id  UUID,
  last_read_msg_id BIGINT,
  PRIMARY KEY (user_id, conversation_id)
);

MySQL for User & Group Metadata

-- Relational structure works well for user/group metadata (smaller scale, complex queries)
CREATE TABLE users (
  id           BIGINT PRIMARY KEY,
  username     VARCHAR(64) UNIQUE NOT NULL,
  phone        VARCHAR(20) UNIQUE,
  display_name VARCHAR(128),
  avatar_url   TEXT,
  created_at   TIMESTAMP DEFAULT NOW()
);

CREATE TABLE group_members (
  group_id     BIGINT NOT NULL,
  user_id      BIGINT NOT NULL,
  role         ENUM('member','admin','owner') DEFAULT 'member',
  joined_at    TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY (group_id, user_id),
  INDEX idx_user_groups (user_id)   -- "what groups is this user in?"
);

Media Storage

# Images/video are NOT stored in Cassandra — use object storage
# Flow:
1. Client requests pre-signed S3 upload URL from Media Service
2. Client uploads directly to S3 (bypasses chat server — no bandwidth bottleneck)
3. Media Service stores S3 key + generates CDN URL
4. Message record stores the CDN URL, not the file itself
5. Client renders image via CDN (global, <10ms)

# Media thumbnails: generated async by Lambda function triggered on S3 upload
# Retention: user-controlled; media can expire while message metadata remains

8 Presence System (Online / Last Seen)

# Online presence stored in Redis (TTL-based):
# Key: "presence:{user_id}" → Value: "online"  TTL: 10 seconds
# Client sends WebSocket heartbeat every 5 seconds:
Client → ChatServer: { "type": "heartbeat" }
ChatServer → redis.setex(f"presence:{user_id}", 10, "online")
# If no heartbeat for 10s → key expires → user is "offline"

# Checking another user's status:
def get_status(user_id: str) -> str:
    if redis.exists(f"presence:{user_id}"):
        return "online"
    last_seen = redis.get(f"last_seen:{user_id}")  # stored on disconnect
    return f"last seen {format_time(last_seen)}"

# On WebSocket disconnect:
def on_disconnect(user_id: str):
    redis.delete(f"presence:{user_id}")
    redis.set(f"last_seen:{user_id}", now().isoformat())
    # Publish offline event so friends' clients update status
    redis.publish(f"presence_events", json.dumps({"user_id": user_id, "status": "offline"}))

# Scaling presence:
# 50M presence events/sec (heartbeats) → Redis Cluster, 10 shards
# Each shard handles 5M events/sec → Redis handles ~500K ops/sec per node → fine
# Presence is eventually consistent — 10s delay in showing "offline" is acceptable

Privacy: "Last seen" should be user-controlled. Store the raw timestamp in Redis/MySQL, but only expose it based on the viewing user's relationship and the target user's privacy settings.

9 Scaling to 250M Concurrent Connections

Chat Server Capacity

# Each chat server handles:
# - 100,000 concurrent WebSocket connections (typical for Go/Rust; ~50K for Node.js)
# - CPU: mostly idle (event-driven, waiting for messages)
# - Memory: 10KB per connection × 100K = ~1GB RAM per server

# For 250M connections:
250M ÷ 100K connections/server = 2,500 chat servers

# Load balancer: L4 (not L7) for WebSockets
# L4 sticky sessions: hash(user_id) → always same server
# Health check: TCP ping every 5s

Service Discovery — How Servers Find Each Other

# Problem: Alice (ChatServer-A) sends message to Bob (ChatServer-B)
# How does ChatServer-A know which server holds Bob's connection?

# Solution 1: Redis pub/sub (recommended for <10K servers)
# Each user subscribes their chat server to "user:{user_id}" channel in Redis
# Publishing to that channel reaches the right server automatically
# Redis Cluster: 1000 channels per server × 2500 servers = 2.5M channels → Redis handles fine

# Solution 2: Consistent hash ring
# User IDs mapped to servers via consistent hashing
# ChatServer-A looks up "Bob → Server-B" in the hash ring → routes directly

# Solution 3: Service registry (ZooKeeper / etcd)
# Each server registers which user IDs it serves
# Other servers query the registry → direct gRPC call
# Overkill for most chat systems; used at extreme scale (Meta)

Message Ordering

# Challenge: messages from multiple senders in a group may arrive out of order
# Solution: Snowflake IDs are monotonically increasing (within a single generator node)
# For global ordering: use logical clocks (Lamport timestamps)

# In practice, "good enough" ordering:
# 1. Store Snowflake msg_id in Cassandra (time-ordered per partition)
# 2. Client sorts messages by msg_id before display
# 3. Clock skew (NTP) can cause occasional out-of-order — accept this (common in WhatsApp)
# 4. For "last message" preview in conversation list: use server-side timestamp, not client

10 Trade-offs & Design Decisions

At-least-once vs exactly-once delivery: We guarantee at-least-once — messages may be delivered twice in failure scenarios (client retries). The client_msg_id deduplication on the client side removes visible duplicates. Exactly-once would require distributed transactions — prohibitively expensive at this scale.
Cassandra vs MySQL for messages: Cassandra wins here because: (1) 3.6 TB/day write volume is far beyond a single MySQL primary, (2) Cassandra partitions perfectly by conversation — all messages for a conversation are co-located on the same node, (3) linear horizontal scale by adding nodes. MySQL would work for <10M messages/day.
Redis pub/sub vs Kafka for real-time delivery: Redis pub/sub is used for immediate in-memory delivery (sub-millisecond). Kafka is used as the durable backup — every message goes to Kafka regardless. If a chat server crashes before delivering, the Kafka consumer ensures eventual delivery.
WebSocket vs QUIC: WebSocket over TCP is the industry standard. QUIC (HTTP/3) would reduce connection setup time and handle packet loss better on mobile — but WebSocket support is near-universal while QUIC is still maturing in mobile SDKs.
E2E encryption impact: With E2E encryption (WhatsApp/Signal Protocol), the server stores ciphertext only. This prevents server-side spam filtering, message search, and backup cloud reads. Without E2E (Slack), the server can index message content for search, apply ML to detect spam, and provide keyword notifications.

What to Study Next

Design a Notification System — the push notification service used in this article
Design a Twitter/Social Feed — the fan-out problem at even larger scale
Design a Rate Limiter — essential for spam protection in chat
Consistent Hashing — used for routing WebSocket connections
All System Design Topics →