CAP Theorem Explained with Real Examples

Consistency, Availability, Partition Tolerance — what they actually mean and how to apply them in interviews

The CAP theorem is one of the most cited — and most misunderstood — concepts in distributed systems. Many engineers can recite it, but struggle to apply it correctly when choosing a database or explaining a design trade-off in an interview.

This article explains CAP theorem clearly, with real database examples, common misconceptions, and exactly how to use it in system design discussions.

Why it matters in interviews: Every system design problem involves a trade-off between consistency and availability. Interviewers expect you to make this explicit and justify your choice.

1 What is the CAP Theorem?

Proposed by Eric Brewer in 2000, the CAP theorem states:

A distributed system can guarantee at most 2 of the following 3 properties simultaneously.

C — Consistency

Every read receives the most recent write (or an error). All nodes see the same data at the same time.

A — Availability

Every request receives a response (not an error) — but it might be stale data.

P — Partition Tolerance

The system continues to operate despite network partitions (some messages between nodes are dropped or delayed).

The key insight (often misunderstood): In a real distributed system, network partitions are unavoidable — networks fail, packets are dropped. Therefore, P is not optional. You must tolerate partitions. The real choice is between C and A when a partition occurs: do you return stale data (AP) or return an error (CP)?

2 Each Property Explained with a Bank Example

Scenario: Bank with two servers (Node A in Mumbai, Node B in Delhi)

# Your account has $1,000.
# You withdraw $200 from an ATM in Mumbai → Node A records balance = $800.
# The network between Mumbai and Delhi goes down (partition occurs).
# Your partner tries to withdraw $500 from an ATM in Delhi → Node B still shows $1,000.

What happens depends on the system's CAP choice:

CP System (choose Consistency over Availability)

# Node B detects it cannot reach Node A → network partition
# Node B refuses to serve the withdrawal: "Service unavailable, try later"
# Your partner cannot withdraw during the outage
# But: when the network recovers, both nodes are in sync — no overdraft

# Real-world CP systems: HBase, Zookeeper, Redis (single-node), MongoDB (with w:majority)
# Use case: banking, inventory management, booking systems
# Trade-off: Users may get errors during network issues, but data is always correct

AP System (choose Availability over Consistency)

# Node B detects it cannot reach Node A → network partition
# Node B still serves the withdrawal: returns $1,000 balance → allows $500 withdrawal
# Your partner gets $500
# When network recovers: Node A says $800, Node B says $500 → CONFLICT
# System must resolve the conflict (last-write-wins, merge, or manual review)

# Real-world AP systems: Cassandra, DynamoDB, CouchDB, Riak
# Use case: shopping carts, social media likes, product catalogs
# Trade-off: Always available, but may show stale data; conflicts need resolution

CA System (Consistency + Availability, no Partition Tolerance)

# This only works if there is never a network partition → single-machine systems
# MySQL, PostgreSQL on a single node = "CA" (but not truly distributed)
# In a cluster: you must choose CP or AP when partition occurs

# The CAP "CA" category is largely theoretical for distributed systems
# Don't say "we'll use CA" in an interview — it means single-node, which doesn't scale

3 What Consistency and Availability Actually Mean

Consistency — not all or nothing

CAP's "Consistency" is specifically linearizability (the strongest form): after a write completes, any subsequent read from any node must return that write's value. This is different from the "C" in ACID (which means data integrity constraints are maintained).

In practice, consistency is a spectrum:

LevelGuaranteeExample
Strong (Linearizable)All reads see the latest write instantly, globallyGoogle Spanner, etcd
SequentialAll nodes agree on the order of operations, but reads may be slightly behindZooKeeper
CausalCausally related operations appear in orderMongoDB (causal sessions)
EventualAll nodes will eventually converge to the same value (no timing guarantee)Cassandra, DynamoDB
Read-your-writesAfter writing, you always read your own write (others may see old value)Most social apps
In system design interviews, when you say "eventual consistency," you mean: "If no new writes occur, all replicas will eventually converge to the same value." For most non-financial applications, this is perfectly acceptable.

Availability — not 100% or 0%

CAP's "Availability" means: every request to a non-failed node gets a response. In practice, availability is measured as a percentage uptime. A CP system can still be "highly available" (99.99%) — it just may fail requests during the brief period of a partition, which is rare.

4 Real Databases Classified

PostgreSQL / MySQL

Single node or synchronous replication

CP (single region)
Google Spanner

TrueTime API for global strong consistency

CP (global)
HBase

Strong consistency via HDFS, sacrifices availability during failover

CP
ZooKeeper / etcd

Distributed coordination, leader election

CP
Redis (Cluster)

Async replication → AP; single node → CP

AP (cluster)
Cassandra

Tunable consistency. Default AP; stronger with QUORUM

AP (default)
DynamoDB

Eventual consistency by default; strong consistency per-item available

AP (default)
CouchDB / Riak

Multi-master, conflict resolution via CRDT/vector clocks

AP
MongoDB

Replica set with writeConcern=majority = CP; default = AP

AP / CP (tunable)
Elasticsearch

AP by default; NRT (near real-time) indexing

AP
Apache Kafka

Strong ordering within partition; replication lag = AP

CP (within partition)
SQLite

Single-file, single-process — no partition tolerance

CA (not distributed)
Important nuance: Most modern databases let you tune the consistency level per operation. Cassandra can act as CP if you use QUORUM reads and writes. DynamoDB has a "strongly consistent reads" option. The default matters most for interview discussions.

5 How to Apply CAP in System Design Interviews

When you choose a database or describe a distributed component, always articulate your CAP trade-off explicitly. Here's a template:

"For the [component], I'm choosing [CP/AP] because:
 - This system values [consistency/availability] more because [business reason]
 - During a network partition, it's better to [return an error / return stale data]
   because [user impact is acceptable / data correctness is critical]"

Decision Guide by System Type

SystemChooseReasoning
Banking / PaymentsCPCannot risk showing wrong balance. Error is better than data corruption.
Inventory / BookingCPOverselling = revenue loss. Must prevent double-booking.
Shopping CartAPStale cart contents are annoying but harmless. Availability matters more.
Social Media FeedAPUsers can tolerate seeing a post 5 seconds late. Always return a feed.
URL Shortener redirectsAPSlightly stale cache is fine. Never return 503 for a redirect.
User authenticationCPStale session data = security risk. Better to force re-login.
DNSAPDNS must always respond. Temporary stale records are acceptable.
Leader election (ZooKeeper)CPTwo nodes thinking they're leader = split-brain disaster.

Sample Interview Answer

# Q: "Why did you choose Cassandra for the message store in your chat system?"

# A: "Messages are write-heavy (billions per day) and read in time-ordered batches
# per conversation — a perfect fit for Cassandra's partition key + clustering key model.

# For the CAP trade-off: I'm choosing AP (availability over consistency).
# If two users write messages simultaneously and there's a brief network partition,
# it's acceptable for one user to briefly see messages in slightly wrong order.
# The system will converge within milliseconds once the partition heals.

# For financial systems or authentication, I'd choose CP — but for chat,
# message delivery reliability and speed matter more than perfect global ordering.

# I'm using QUORUM reads/writes for the 'unread message count' feature though,
# since showing the wrong badge number (0 vs 3 unread) would be annoying and visible.
# This is Cassandra's strength — tunable consistency per operation."

6 Beyond CAP: The PACELC Model

CAP only describes behaviour during partitions. Most of the time (99.9%+), networks are healthy — but there's still a latency vs consistency trade-off. The PACELC model (Daniel Abadi, 2012) captures this more completely:

PACELC:
If Partition (P): trade off Availability (A) vs Consistency (C)
Else (E, no partition): trade off Latency (L) vs Consistency (C)

Reading: "PAC" then "ELC"

Database examples:
- DynamoDB: PA/EL (available during partition; low latency over consistency normally)
- BigTable/HBase: PC/EC (consistent during partition; consistent normally, higher latency)
- Cassandra: PA/EL (AP; tunable, defaults to low latency)
- VoltDB/Spanner: PC/EC (strict consistency everywhere, higher latency)
- PostgreSQL (single region): PC/EC (consistent, accepts latency cost of sync replication)
PACELC is not required for most interviews, but mentioning it shows depth. Say: "PACELC extends CAP by also capturing the latency-consistency trade-off in normal operation, not just during partitions."

7 Common Misconceptions to Avoid in Interviews

  • "We'll pick CA (Consistency + Availability)" — This doesn't exist in distributed systems. P is mandatory. If you say CA, the interviewer knows you've misunderstood the theorem.
  • "AP means we sacrifice all consistency" — No. AP means eventual consistency — data will converge. It doesn't mean random or wrong data forever.
  • "CP databases are always more correct" — Not necessarily. CP databases that can't reach quorum will return errors — which may cause worse user experience than stale data in an AP system.
  • "CAP picks are permanent" — Most databases let you tune consistency level per operation. You can use DynamoDB (AP by default) with strongly consistent reads when you need it.
  • "ACID and CAP are the same thing" — They're different concepts. ACID (Atomicity, Consistency, Isolation, Durability) is about transaction correctness within a database. CAP is about distributed system behaviour under network failures. "Consistency" means different things in each.

8 Quick Reference Cheat Sheet

CAP Theorem:
  C = Linearizability (all nodes see same data at same time)
  A = Every request gets a response (may be stale)
  P = System works despite network partition (P is ALWAYS required)
  → Real choice: CP (error on partition) or AP (stale data on partition)

CP Databases: HBase, ZooKeeper, etcd, Spanner, PostgreSQL (single-node)
AP Databases: Cassandra, DynamoDB, CouchDB, Riak, Redis (cluster)
Tunable:      MongoDB (writeConcern), Cassandra (consistency level), DynamoDB (read type)

Consistency Levels (weakest → strongest):
  Eventual → Read-your-writes → Session → Bounded staleness → Strong (Linearizable)

When to choose CP: banking, inventory, booking, auth, leader election, config stores
When to choose AP: social feeds, shopping carts, analytics, DNS, autocomplete, search

Interview one-liner:
"Since network partitions are unavoidable in distributed systems, the real trade-off
 is CP (return error on partition) vs AP (return possibly stale data on partition).
 For this [component], I choose [CP/AP] because [business justification]."

What to Study Next