Kubernetes StatefulSets: Running Databases and Stateful Apps (2026)
Running stateful workloads on Kubernetes requires a fundamentally different mental model than stateless services. Databases, message brokers, and distributed caches have identity — a specific pod is not interchangeable with another because it holds unique data or occupies a specific role in a cluster quorum. Kubernetes StatefulSets were designed precisely for this. This guide covers the StatefulSet contract, how persistent storage is provisioned per-pod, how headless services enable stable DNS identities, and when to reach for the Operator pattern instead of managing StatefulSets directly.
Table of Contents
- StatefulSet vs Deployment: Key Differences
- Stable Network Identity and Headless Services
- PVC Templates: Per-Pod Persistent Storage
- Ordered Pod Management and Scaling
- Update Strategies
- Real-World Use Cases: Cassandra, Kafka, Elasticsearch
- The Operator Pattern: Beyond Raw StatefulSets
- Troubleshooting StatefulSets
- Frequently Asked Questions
StatefulSet vs Deployment: Key Differences
Understanding what a StatefulSet guarantees — and what it does not — is essential before deploying any stateful workload:
| Attribute | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random suffix (pod-abc12) | Ordinal index (pod-0, pod-1) |
| Pod identity | Interchangeable | Stable, persistent across restarts |
| DNS hostname | Service IP only | Stable per-pod DNS via headless service |
| Storage | Shared or ephemeral | Dedicated PVC per pod, retained on delete |
| Startup order | Parallel (default) | Sequential (0, 1, 2...) |
| Deletion order | Parallel | Reverse sequential (...2, 1, 0) |
| Rolling update | Replaces old pods freely | Replaces in reverse ordinal order |
Stable Network Identity and Headless Services
A StatefulSet requires a headless service (a Service with clusterIP: None). This service does not load-balance — it creates individual DNS A records for each pod, enabling peer-to-peer communication by stable hostname.
apiVersion: v1
kind: Service
metadata:
name: cassandra # Must match StatefulSet's serviceName
namespace: data
labels:
app: cassandra
spec:
clusterIP: None # Headless — no virtual IP, direct pod DNS
selector:
app: cassandra
ports:
- port: 9042
name: cql
- port: 7000
name: intra-node
With this headless service and a StatefulSet named cassandra in namespace data, each pod gets a stable DNS entry:
cassandra-0.cassandra.data.svc.cluster.local
cassandra-1.cassandra.data.svc.cluster.local
cassandra-2.cassandra.data.svc.cluster.local
These DNS names resolve to the pod IP directly, even after the pod is rescheduled to a different node. Applications can hardcode these hostnames for peer discovery — a critical feature for distributed databases like Cassandra that use gossip protocols.
PVC Templates: Per-Pod Persistent Storage
The volumeClaimTemplates field is what makes StatefulSets special for storage. Each pod gets its own PersistentVolumeClaim created automatically:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
namespace: data
spec:
serviceName: cassandra # Must match headless Service name
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 60
securityContext:
runAsUser: 999
fsGroup: 999
containers:
- name: cassandra
image: cassandra:4.1
ports:
- containerPort: 7000
name: intra-node
- containerPort: 9042
name: cql
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
env:
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.data.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "MyCluster"
- name: MAX_HEAP_SIZE
value: "2048M"
- name: HEAP_NEWSIZE
value: "512M"
readinessProbe:
exec:
command:
- /bin/bash
- -c
- nodetool status | grep -E "^UN\s+$(hostname -i)"
initialDelaySeconds: 90
periodSeconds: 30
timeoutSeconds: 10
volumeMounts:
- name: cassandra-data
mountPath: /var/lib/cassandra/data
volumeClaimTemplates:
- metadata:
name: cassandra-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd # Use your provisioner class
resources:
requests:
storage: 500Gi
This creates PVCs named cassandra-data-cassandra-0, cassandra-data-cassandra-1, and cassandra-data-cassandra-2. Critically, if cassandra-1 is deleted and rescheduled, it re-attaches to cassandra-data-cassandra-1 — the data is never lost and the pod identity is preserved.
volumeClaimTemplates are not deleted when you scale down or delete the StatefulSet (unless you use a persistentVolumeClaimRetentionPolicy). This is intentional — data protection. Use kubectl delete pvc explicitly when you truly want to discard data. Set persistentVolumeClaimRetentionPolicy.whenScaled: Delete only for ephemeral testing clusters.