Design Twitter / Social Media News Feed

Fan-out strategies, the celebrity problem, timeline generation, feed ranking, and 500M DAU scale

A social media news feed is one of the most read-heavy, write-complex systems you can design. When Cristiano Ronaldo (800M followers) posts a tweet, that single write must fan out to 800 million timelines — while still returning each user's feed within 100 milliseconds.

The central tension of feed design: fan-out on write vs fan-out on read. This single decision shapes the entire architecture. This article explains both approaches, the celebrity (hotspot) problem, and how real systems (Twitter, Instagram, Facebook) solve it with a hybrid strategy.

Pre-reading: The System Design Interview Guide covers the general framework used here.

1 Requirements

Functional

  • Post a tweet (text, images, video)
  • Follow / unfollow users
  • View home timeline (tweets from people you follow, reverse-chronological)
  • Like, retweet, reply
  • View user profile timeline (all tweets by one user)
  • Trending topics / hashtags
  • Search tweets (out of scope for this design)

Non-Functional

  • 500M DAU
  • Read-heavy: 100:1 read:write ratio
  • 300M tweets/day written (~3,500 TPS)
  • 30B timeline reads/day (~350K RPS)
  • Feed load latency <100ms (p99)
  • High availability: 99.99%
  • Eventual consistency for feed (slight delay acceptable)
# Scale breakdown:
300M tweets/day ÷ 86,400 = ~3,500 tweets/sec (write)
Assume 200 followers avg per user:
  3,500 × 200 = 700K fan-out operations/sec (just for average users)
Add celebrity users (50M followers each) → spikes to 10M+ fan-outs/sec per celebrity tweet

30B feed reads/day ÷ 86,400 = ~350K feed reads/sec (read)

2 The Core Question: Fan-Out on Write vs Read

When user A posts a tweet, it needs to appear in the feeds of all A's followers. "Fan-out" is the process of distributing that tweet to N followers. The key decision: when does the fan-out happen?

Fan-Out on Write (Push)

When a tweet is posted, immediately write it to every follower's timeline cache.

  • Read is fast — pre-built timeline in Redis
  • Write is slow — must push to N followers
  • Wastes writes for inactive followers
  • Celebrity problem: 800M writes per tweet
  • Follow/unfollow is complex (rebuild timelines)

Fan-Out on Read (Pull)

When a user opens their feed, query tweets from all accounts they follow and merge.

  • Write is trivially fast — just store the tweet
  • Read is slow — merge K sorted lists from DB
  • Scales poorly if user follows 10,000 accounts
  • Same tweet computed for every reader
  • No wasted work for inactive users
Twitter/Instagram's approach: Hybrid. Fan-out on write for normal users (<5K followers). Fan-out on read for celebrities (>5K followers). Merge both at read time.

3 High-Level Design

=== Write Path (posting a tweet) ===
User → API Gateway → Tweet Service
                          ↓
                     PostgreSQL (persist tweet: id, user_id, content, created_at)
                          ↓
                     Kafka: "new-tweet" event
                          ↓
                   Fan-out Service (async)
                    ↙           ↘
         [Normal user]        [Celebrity user]
         Push tweet_id        Skip push (too many followers)
         to followers'        Tweet stays in DB only
         Redis timelines      Merged at read time

=== Read Path (loading home feed) ===
User → API Gateway → Feed Service
                          ↓
                     Redis: get pre-built timeline (list of tweet IDs)
                          ↓
                    Merge in celebrity tweets:
                    For each celebrity the user follows:
                      fetch latest N tweets from Tweet DB
                      interleave by timestamp
                          ↓
                    Hydrate tweet IDs → full tweet objects (Tweet Cache / DB)
                          ↓
                    Apply ranking (ML model or recency)
                          ↓
                    Return paginated feed (20 tweets per page)

4 Fan-Out Service — The Heart of the System

def fan_out_tweet(tweet_id: str, author_id: str):
    tweet = tweet_store.get(tweet_id)
    author = user_service.get(author_id)

    # Celebrity check: >5000 followers = fan-out on read at read time
    CELEBRITY_THRESHOLD = 5000
    if author.follower_count > CELEBRITY_THRESHOLD:
        # Just mark tweet as "from celebrity" — no push to followers
        celebrity_tweets.add(author_id, tweet_id)
        return

    # Normal user: fan-out to all followers' Redis timelines
    followers = follower_service.get_all_followers(author_id)
    # followers = [user_id_1, user_id_2, ..., user_id_200]

    # Batch write to Redis (pipeline for efficiency)
    pipe = redis.pipeline()
    for follower_id in followers:
        key = f"timeline:{follower_id}"
        pipe.lpush(key, tweet_id)          # prepend tweet_id to timeline list
        pipe.ltrim(key, 0, 799)            # keep only latest 800 tweet IDs
    pipe.execute()   # one round-trip to Redis for all followers

    # If user has 10M followers: shard across multiple fan-out workers
    # Kafka message partitioned by author_id → multiple consumers process partitions

def get_home_feed(user_id: str, page: int = 0, page_size: int = 20):
    # Step 1: Get pre-built timeline from Redis (tweet IDs from normal users)
    timeline_key = f"timeline:{user_id}"
    tweet_ids = redis.lrange(timeline_key, page * page_size, (page + 1) * page_size - 1)

    # Step 2: Merge in tweets from celebrities the user follows
    celebrities_followed = user_service.get_celebrity_follows(user_id)
    celebrity_tweet_ids = []
    for celeb_id in celebrities_followed:
        recent = tweet_store.get_recent_by_user(celeb_id, limit=20)
        celebrity_tweet_ids.extend(recent)

    # Step 3: Merge and sort by created_at (K-way merge)
    all_ids = merge_by_timestamp(tweet_ids, celebrity_tweet_ids)

    # Step 4: Hydrate tweet IDs → full tweet objects
    tweets = tweet_cache.multiget(all_ids[:page_size])

    # Step 5: Apply engagement ranking (ML or weighted recency)
    ranked = ranking_service.rank(tweets, user_id)
    return ranked
The timeline Redis list stores only tweet IDs (8 bytes each), not full tweet objects. 800 tweet IDs = 6.4KB per user. For 500M users: 500M × 6.4KB = 3.2TB total Redis storage. That's large but manageable with a Redis Cluster.

5 Data Model

-- Tweets (PostgreSQL, sharded by user_id):
CREATE TABLE tweets (
  id          BIGINT PRIMARY KEY,   -- Snowflake ID (time-ordered)
  user_id     BIGINT NOT NULL,
  content     VARCHAR(280),
  media_ids   BIGINT[],             -- references to S3 objects
  reply_to_id BIGINT,               -- NULL for top-level tweets
  retweet_id  BIGINT,               -- NULL for original tweets
  like_count  INT DEFAULT 0,
  retweet_count INT DEFAULT 0,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_tweets_user_time ON tweets(user_id, id DESC);  -- profile timeline

-- Follows (sharded by follower_id AND by followed_id — two separate tables):
CREATE TABLE follows (
  follower_id  BIGINT NOT NULL,
  followed_id  BIGINT NOT NULL,
  created_at   TIMESTAMPTZ DEFAULT NOW(),
  PRIMARY KEY (follower_id, followed_id)
);
CREATE INDEX idx_follows_followed ON follows(followed_id);  -- "who follows me?"

-- Social graph in Redis (for fast fan-out):
-- "followers:{user_id}" → Set of follower_ids (for normal users only; too large for celebs)
-- Redis SET: SMEMBERS "followers:user123" → all follower IDs

-- Timeline in Redis:
-- "timeline:{user_id}" → List of tweet_ids, max 800 items
-- LPUSH "timeline:user456" "tweet_id"   (push to front = most recent first)
-- LRANGE "timeline:user456" 0 19        (first 20 tweet IDs = page 1)

Tweet Cache (separate from timeline)

# Timeline stores only tweet IDs → need to hydrate to full objects
# Tweet cache: Redis hash, "tweet:{tweet_id}" → serialized tweet object
# TTL: 7 days (tweets older than 7 days rarely viewed on home feed)
# Size: 500 bytes per tweet × 300M tweets/day × 7 days = ~1 TB
# Use Redis Cluster: 20 shards × 50 GB each = 1 TB total

# Cache miss → query PostgreSQL → write to cache
def get_tweet(tweet_id: str) -> Tweet:
    cached = redis.hgetall(f"tweet:{tweet_id}")
    if cached:
        return Tweet(**cached)
    tweet = db.query("SELECT * FROM tweets WHERE id = %s", tweet_id)
    redis.hset(f"tweet:{tweet_id}", mapping=tweet.to_dict())
    redis.expire(f"tweet:{tweet_id}", 604800)   # 7 days
    return tweet

6 The Celebrity Problem in Depth

This is the question interviewers most expect you to address proactively. Elon Musk has 200M+ followers. If you fan-out his tweet on write:

# The math:
200M followers × 1 Redis LPUSH = 200M Redis operations
Each LPUSH takes ~50µs → 200M × 50µs = 10,000 seconds ≈ 2.7 HOURS
→ "tweet posted 2.7 hours ago still hasn't appeared in followers' feeds" — unacceptable

# Even with parallel workers:
# 1000 parallel fan-out workers: 10,000 sec / 1000 = 10 sec per tweet
# Still too slow for real-time

# Real Twitter's solution (verified by their eng blog):
# 1. Check follower count on post:
#    - <5K followers: fan-out on write (push to followers' Redis timelines)
#    - >5K followers: skip push — only store tweet in DB

# 2. At feed read time:
#    a. Fetch user's pre-built Redis timeline (normal accounts)
#    b. For each celebrity the user follows:
#       SELECT id FROM tweets WHERE user_id = celeb_id ORDER BY id DESC LIMIT 20
#       (This is fast: PostgreSQL + index + tweet cache)
#    c. K-way merge by created_at timestamp (K = number of celebs followed)
#    d. De-duplicate (edge case: retweet from normal user + original from celeb)

# Identifying celebrities:
# Store follower_count in Redis: "user:{id}:follower_count"
# Update asynchronously on each follow/unfollow
# When count crosses 5K threshold: stop pre-computing fan-out for this user

K-Way Merge at Read Time

# User follows 5 celebrities. For each, we have their recent 20 tweets.
# Need to merge 5 × 20 = 100 tweets and pick top-20 by timestamp.

import heapq

def k_way_merge_feeds(feeds: list[list[Tweet]]) -> list[Tweet]:
    # feeds[0] = celeb_0's tweets (sorted desc by id/timestamp)
    # feeds[1] = celeb_1's tweets, etc.
    # Also include pre-built timeline as one of the feeds

    # Use a min-heap (negate timestamp to get max-heap behavior):
    heap = []
    iterators = [iter(feed) for feed in feeds]

    for i, it in enumerate(iterators):
        tweet = next(it, None)
        if tweet:
            heapq.heappush(heap, (-tweet.id, i, tweet))

    result = []
    while heap and len(result) < 20:
        _, i, tweet = heapq.heappop(heap)
        result.append(tweet)
        next_tweet = next(iterators[i], None)
        if next_tweet:
            heapq.heappush(heap, (-next_tweet.id, i, next_tweet))

    return result

# Complexity: O(N log K) where N=total tweets, K=number of celebrities
# For K=10 celebrities, N=200 tweets: trivial — <1ms

7 Feed Ranking

Twitter (now X) switched from pure reverse-chronological to algorithmic ranking. Here's how to design the ranking layer:

# Simple ranking: reverse chronological (timestamp desc) — easy, transparent
# This is the "latest" tab on Twitter

# Algorithmic ranking: ML model scores each tweet in the candidate set
# Input features:
#   - Tweet age (newer = higher base score)
#   - Author relationship strength (DMed them? replied? close friend?)
#   - Engagement velocity (likes/retweets in first 10 min)
#   - Media type (video gets boost)
#   - Topic relevance (user's interest graph vs tweet topics)
#   - Diversity penalty (same author showing too often)

# Ranking pipeline:
def rank_feed(candidate_tweets: list[Tweet], user_id: str) -> list[Tweet]:
    user_profile = user_model_store.get(user_id)   # interest graph, follow patterns

    scored = []
    for tweet in candidate_tweets:
        score = ranking_model.score(
            tweet=tweet,
            user_profile=user_profile,
            recency_weight=0.4,
            engagement_weight=0.3,
            relevance_weight=0.3
        )
        scored.append((score, tweet))

    scored.sort(reverse=True)

    # Apply diversity: no author appears more than 3 times in top 20
    result = []
    author_count = defaultdict(int)
    for score, tweet in scored:
        if author_count[tweet.user_id] < 3:
            result.append(tweet)
            author_count[tweet.user_id] += 1
        if len(result) == 20:
            break
    return result

# ML model runs in <10ms for 100 candidate tweets (batch inference on GPU or optimized CPU)
# For offline training: Spark ML on interaction logs (clicks, dwells, likes, shares)
# Retrain daily; deploy as TensorFlow Serving endpoint

8 Scaling the Full System

Fan-Out at Celebrity Scale (Async Workers)

# For Ronaldo (800M followers) posting a tweet:
# 800M fan-out writes needed → distribute across Kafka + workers

# Kafka topic: "fan-out-jobs"
# Message: { tweet_id, author_id, follower_batch_start, follower_batch_end }
# Fan-out service splits Ronaldo's 800M followers into batches of 10K
# 800M / 10K = 80,000 Kafka messages
# 1000 fan-out workers, each processes 80 messages = 80 × 10K = 800K Redis writes each
# Each Redis write: ~50µs × 800K = 40 seconds per worker
# All 1000 workers in parallel: ~40 seconds total for 800M fan-outs

# Acceptable? For chronological feed: 40-second lag is bad
# For algorithmic feed: 40-second lag is often OK (the tweet ranks well when it appears)
# Twitter's actual latency target: 5 seconds for celebrity tweets via more workers

PostgreSQL Sharding for Tweets

# 300M tweets/day × 365 × 5 years = 547.5B tweets
# At 500 bytes each: ~274 TB of tweet data
# Way beyond a single PostgreSQL instance

# Shard by user_id (consistent hash):
# Shard 0: user_ids where hash(user_id) % 1024 in [0, 255]
# → profile timeline queries are single-shard (all tweets by user in same shard)
# Home feed (fan-out service) writes to pre-computed Redis → avoids cross-shard joins

# Alternative: shard by tweet_id (time-based)
# Pros: uniform write distribution (hot inserts spread evenly)
# Cons: fan-out queries must check all shards → complex

# Use Citus (PostgreSQL extension) or Amazon Aurora for managed horizontal scaling

CDN for Media

# Tweets often contain images/video (GIFs, video clips)
# 300M tweets/day × 40% with media × 100KB avg image = ~12 TB/day of media uploads
# Store in S3; serve via CloudFront CDN

# Tweet content flow:
# 1. Client uploads media directly to S3 (pre-signed URL from Media Service)
# 2. S3 trigger → Lambda → generate thumbnails + transcode video → store variants
# 3. Tweet object stores CDN URLs: { "thumb": "cdn.x.com/img/abc_thumb.jpg", "full": "..." }
# 4. CDN serves images globally at edge nodes → <10ms latency for cached media

9 Trade-offs & Design Decisions

DecisionChoiceTrade-off
Fan-out strategy Hybrid (write for normal, read for celebrity) More complex than either pure strategy, but solves the celebrity write amplification AND the slow read problem for normal users.
Celebrity threshold 5,000 followers Lower threshold = more accounts treated as celebrities (more read-time merges, slower feed load). Higher = more fan-out writes. 5K is a tuning parameter, not a hard rule.
Timeline storage Redis List (tweet IDs only) Memory-efficient (8 bytes per tweet ID vs 500 bytes for full object). Trade-off: requires a second network call to hydrate IDs → tweet objects. Mitigated by tweet cache hit rate (~99% for recent tweets).
Feed freshness Eventual consistency (seconds delay OK) Relaxed consistency enables async fan-out via Kafka. A tweet appearing 5 seconds late is invisible to users. Strict real-time consistency would require synchronous fan-out — impossible at 800M follower scale.
Ranking Algorithmic (ML-based) Better engagement, but less transparent ("why am I seeing this?"). Reverse-chronological is simpler and predictable. Twitter offers both tabs — an acknowledgment that neither is universally better.

What to Study Next