Consistent hashing is one of the most important concepts in distributed systems — and one of the most frequently tested in system design interviews. It appears in Redis Cluster, Cassandra, Amazon DynamoDB, Akamai CDN, and many other systems you'll use or design.
This article builds from first principles: why naive modulo hashing breaks when you add or remove servers, how the hash ring solves the problem, and how virtual nodes handle uneven distribution.
Suppose you have 3 cache servers (A, B, C) and want to distribute keys evenly. The simplest approach:
# Naive hashing: server = hash(key) % N
# N = number of servers
def get_server(key, servers):
return servers[hash(key) % len(servers)]
# Works great with N=3:
hash("user_123") % 3 = 1 → Server B
hash("user_456") % 3 = 0 → Server A
hash("user_789") % 3 = 2 → Server C
# N changes from 3 to 4
# Now: hash(key) % 4
hash("user_123") % 3 = 1 → was Server B
hash("user_123") % 4 = 3 → now Server D ← MOVED
hash("user_456") % 3 = 0 → was Server A
hash("user_456") % 4 = 0 → still Server A ← OK
hash("user_789") % 3 = 2 → was Server C
hash("user_789") % 4 = 1 → now Server B ← MOVED
# For N=3 → N=4: approximately 75% of all keys get remapped to different servers
# In a cache: 75% of requests suddenly miss (cache invalidated) → massive DB load spike
# In a database cluster: 75% of data must migrate to a new shard → huge I/O storm
# This is why naive modulo hashing doesn't work for distributed systems:
# Adding or removing ONE server remaps ~(N-1)/N of all keys
K/N keys are remapped (where K = total keys, N = total servers). For 1 server added to a 10-server cluster: only ~10% of keys move. Naive hashing remaps ~91% of keys.Consistent hashing maps both servers and keys onto a circular ring (0 to 2³²-1 for 32-bit hashes, or 0 to 2¹²⁸-1 for MD5).
class ConsistentHashRing:
def __init__(self, servers=None, replicas=150):
self.replicas = replicas # virtual nodes per server
self.ring = {} # position → server_name
self.sorted_keys = [] # sorted list of positions
for server in (servers or []):
self.add_server(server)
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_server(self, server: str):
for i in range(self.replicas):
key = f"{server}:{i}" # virtual node: "Server_A:0", "Server_A:1", etc.
position = self._hash(key)
self.ring[position] = server
bisect.insort(self.sorted_keys, position)
def remove_server(self, server: str):
for i in range(self.replicas):
key = f"{server}:{i}"
position = self._hash(key)
del self.ring[position]
self.sorted_keys.remove(position)
def get_server(self, key: str) -> str:
if not self.ring:
return None
position = self._hash(key)
# Find the first position in sorted_keys >= position (clockwise)
idx = bisect.bisect_left(self.sorted_keys, position)
if idx == len(self.sorted_keys):
idx = 0 # wrap around (circular ring)
return self.ring[self.sorted_keys[idx]]
# Usage:
ring = ConsistentHashRing(["Server_A", "Server_B", "Server_C"], replicas=150)
ring.get_server("user_123") # → "Server_A"
ring.get_server("user_456") # → "Server_B"
ring.add_server("Server_D")
ring.get_server("user_456") # → may now be "Server_D" (only moved if D is closer)
With only 3 real server positions on the ring, distribution can be very uneven by chance:
Each physical server is assigned multiple positions on the ring (virtual nodes). With 150 virtual nodes per server, each server ends up with approximately 1/N of the ring, regardless of where real servers happen to hash.
# What if Server_A has 2× the RAM of Server_B?
# Assign 2× as many virtual nodes to Server_A:
ring = ConsistentHashRing()
ring.add_server("Server_A", weight=2) # 300 virtual nodes (2 × 150)
ring.add_server("Server_B", weight=1) # 150 virtual nodes
ring.add_server("Server_C", weight=1) # 150 virtual nodes
# Result:
# Server_A gets 300/600 = 50% of keys (2× the data → needs 2× the memory ✓)
# Server_B gets 150/600 = 25%
# Server_C gets 150/600 = 25%
# This is exactly how Cassandra's vnodes work:
# Each node's token count is proportional to its disk/memory capacity
Most distributed systems don't store data on just one server — they replicate it for fault tolerance. With consistent hashing, replication is elegant:
# Replication factor RF = 3 (store data on 3 servers)
# Rule: store the key on the server it hashes to, PLUS the next (RF-1) clockwise servers
def get_servers_for_key(ring, key, rf=3):
"""Returns list of RF servers where this key should be stored."""
primary_server = ring.get_server(key)
servers = [primary_server]
# Walk clockwise to find next RF-1 unique servers
position = ring._hash(key)
idx = bisect.bisect_left(ring.sorted_keys, position)
seen = {primary_server}
while len(servers) < rf:
idx = (idx + 1) % len(ring.sorted_keys)
server = ring.ring[ring.sorted_keys[idx]]
if server not in seen: # skip virtual nodes of same physical server
servers.append(server)
seen.add(server)
return servers # e.g., ["Server_A", "Server_C", "Server_B"]
# Example:
# key "user_123" → primary: Server_A → replicas: Server_C, Server_B
# Written to all three. Reads can come from any replica.
# If Server_A fails: read from Server_C or Server_B.
# When Server_A recovers: hinted handoff or anti-entropy (gossip) repairs it.
| System | How It Uses Consistent Hashing | Key Detail |
|---|---|---|
| Amazon DynamoDB | Partition key → hash → ring → node | Original inspiration (Dynamo paper 2007). Uses virtual nodes for load balancing. |
| Apache Cassandra | Partition key hashed to token (64-bit int), mapped to ring | Each node owns a range of tokens. VNodes (256 per node default). Replication walks clockwise. |
| Redis Cluster | Uses "hash slots" (16,384 slots) — similar concept but fixed slots, not a ring | CRC16(key) % 16384 = slot. Slots assigned to nodes. Remapping = move slot assignment. |
| Akamai CDN | Route request to nearest edge server using consistent hashing | Adding a new PoP only steals cache from adjacent PoPs on the ring. |
| HAProxy / nginx | Consistent hashing for sticky load balancing (same client → same upstream) | balance uri consistent in HAProxy. Used for upstream cache servers. |
| Memcached clients | Client-side consistent hashing to pick which Memcached server holds each key | libmemcached, Spymemcached implement consistent hashing natively. |
# Redis uses a variant: 16,384 fixed hash slots (not a continuous ring) # CRC16(key) % 16384 = slot number (0–16383) # Slots are assigned to master nodes (not dynamically, but by configuration) # 3-node cluster example: # Master A: slots 0–5460 (~33%) # Master B: slots 5461–10922 (~33%) # Master C: slots 10923–16383 (~33%) # Resharding: to add Master D, move some slots from A, B, C to D # Only data in the moved slots needs to migrate (keys in other slots stay put) # Key difference from pure consistent hashing: # Redis Cluster: slot ranges assigned manually (predictable but less flexible) # Cassandra: token ranges assigned dynamically (more automatic, with VNodes)
Bring up consistent hashing whenever you're:
"To distribute [data/requests] across N [servers/shards], I'll use consistent hashing. Each server is mapped to multiple positions on a hash ring using virtual nodes. When a key comes in, we hash it to a ring position and find the first clockwise server. The benefit over naive modulo hashing: when we add or remove a server, only ~1/N of keys are remapped instead of most of them — so there's no cache invalidation storm or mass data migration when we scale."
| Approach | Keys Remapped on Scale | Hot Spot Risk | Complexity | Used By |
|---|---|---|---|---|
| Modulo hashing | (N-1)/N ≈ 90% | Low | Very low | Single-node caches, toy systems |
| Consistent hashing | ~1/N ≈ 10% | Low (with VNodes) | Medium | Cassandra, DynamoDB, Redis Cluster, CDNs |
| Range sharding | Minimal (explicit) | High (sequential keys) | Low | HBase, Bigtable, DynamoDB (sort key ranges) |
| Directory sharding | Zero (lookup table) | Low | High (lookup is a SPOF) | Custom sharding in legacy systems |