Kubernetes Pod Disruption Budgets: High Availability Guide (2026)
Understanding Kubernetes Disruptions
In a production Kubernetes cluster, pods are evicted and rescheduled constantly. Not all of these events are equal — some are planned by operators and some are imposed by the infrastructure. Kubernetes formalises this distinction as voluntary and involuntary disruptions.
Voluntary Disruptions
Voluntary disruptions are operator-initiated actions that intentionally remove pods from a node:
- Node drain —
kubectl drainevicts all pods before maintenance or decommissioning. - Node upgrades — rolling OS or Kubernetes version upgrades drain nodes one at a time.
- Cluster autoscaler scale-down — the autoscaler evicts pods to consolidate workloads onto fewer nodes and reduce cost.
- Deployment rollouts — new pod versions replace old ones; this is voluntary at the application layer.
- Manual pod deletion —
kubectl delete podfor debugging or forced restarts.
Involuntary Disruptions
Involuntary disruptions are caused by failures outside operator control:
- Node hardware failure — a physical or virtual machine dies unexpectedly.
- Kernel panic or OS crash — the node becomes NotReady and Kubernetes evicts its pods.
- Out-of-memory (OOM) kill — the kubelet kills pods that breach their memory limits.
- Network partition — a node becomes unreachable and its pods are eventually evicted.
- Cloud provider preemption — spot/preemptible instances are reclaimed by the cloud provider.
PDB Spec: minAvailable vs maxUnavailable
A PodDisruptionBudget is a namespaced resource that tells the Kubernetes Eviction API how many pods of a given selector must remain healthy during a voluntary disruption. The spec has two mutually exclusive fields — use exactly one.
minAvailable
Specifies the minimum number (or percentage) of pods that must remain available after the disruption. If evicting a pod would drop availability below this threshold, the eviction is denied.
# Absolute number: at least 2 pods must always be running
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: web-frontend
# Percentage: at least 75% of pods must always be running
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb-pct
namespace: production
spec:
minAvailable: "75%"
selector:
matchLabels:
app: web-frontend
maxUnavailable
Specifies the maximum number (or percentage) of pods that may be unavailable at any time. This is the mirror of minAvailable and is often more intuitive when you think in terms of "how much can I take down at once?"
# Allow at most 1 pod to be unavailable at a time
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
namespace: production
spec:
maxUnavailable: 1
selector:
matchLabels:
app: api-service
# Allow at most 25% of pods to be unavailable
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb-pct
namespace: production
spec:
maxUnavailable: "25%"
selector:
matchLabels:
app: api-service
minAvailable: "75%", Kubernetes requires at least 3 pods (floor(4 × 0.75) = 3), so at most 1 can be evicted at a time.
API Version
From Kubernetes 1.21+, policy/v1 is the stable API. The older policy/v1beta1 was removed in 1.25. Always use policy/v1 for new PDBs.
PDB with Deployments: Protecting a Web Service
The most common use case is protecting a stateless web or API service running as a Deployment. Here is a complete example for a 3-replica frontend deployment. See our Kubernetes Deployments guide for deployment fundamentals.
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: web-frontend
template:
metadata:
labels:
app: web-frontend
spec:
containers:
- name: frontend
image: myregistry/frontend:v2.4.1
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
# pdb.yaml — companion PDB for the deployment above
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-frontend-pdb
namespace: production
spec:
minAvailable: 2 # always keep at least 2 of 3 pods running
selector:
matchLabels:
app: web-frontend
With this configuration, a node drain can evict at most 1 frontend pod before it must wait for the replacement pod to become ready. This guarantees two-thirds of capacity during any single maintenance window.
PDB with StatefulSets: Database Quorum Protection
Stateful workloads like databases, message brokers, and distributed caches require quorum — a majority of members must be available for the cluster to remain writable. A PDB is critical here. See our StatefulSets guide for the underlying concepts.
# statefulset.yaml (3-node database cluster)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-cluster
namespace: data
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
# pdb-database.yaml — quorum guard for 3-node Postgres
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: data
spec:
minAvailable: 2 # quorum = majority of 3 = 2
selector:
matchLabels:
app: postgres
For a 5-node cluster (e.g., etcd or Kafka), quorum is 3, so set minAvailable: 3. Never set minAvailable below the quorum threshold or you risk split-brain during a node drain.
minAvailable to match the replication factor of your most critical topics. If replication.factor=3 and min.insync.replicas=2, set minAvailable: 2 so at least 2 ISR brokers are always up.
kubectl drain and the Eviction API
When you run kubectl drain <node>, Kubernetes does not simply delete pods. It calls the Eviction API for each pod, which enforces all active PDBs before proceeding.
Eviction Flow
- kubectl sends an
Evictionobject to the API server for a pod. - The API server checks all PDBs whose selector matches the pod.
- If evicting the pod would violate any PDB, the API server returns
429 Too Many Requests. - kubectl retries the eviction on a backoff until the PDB allows it (e.g., when a replacement pod becomes ready).
- Once allowed, the pod is gracefully terminated (SIGTERM, then SIGKILL after
terminationGracePeriodSeconds).
The --force Flag
kubectl drain --force bypasses PDB checks and deletes pods immediately. Never use this in production unless you fully understand the consequences — it can take a quorum-sensitive service offline instantly.
# Safe drain (respects PDBs)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# Dangerous drain (ignores PDBs — production risk!)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data --force
--force also deletes pods not managed by a controller (bare pods). These pods will NOT be rescheduled anywhere. Only use --force in controlled recovery scenarios where data loss is acceptable.
Node Upgrade Workflow: Cordon → Drain → Upgrade → Uncordon
The standard pattern for zero-downtime node maintenance is a four-step process. Here is a complete bash script you can adapt for your cluster:
#!/bin/bash
# node-upgrade.sh — safe rolling node upgrade respecting PDBs
set -euo pipefail
NODE="$1"
if [ -z "$NODE" ]; then
echo "Usage: $0 <node-name>"
exit 1
fi
echo "=== Step 1: Cordon node (mark unschedulable) ==="
kubectl cordon "$NODE"
echo "=== Step 2: Drain node (evict pods, respects PDBs) ==="
kubectl drain "$NODE" \
--ignore-daemonsets \
--delete-emptydir-data \
--grace-period=60 \
--timeout=300s
echo "=== Step 3: Perform maintenance ==="
echo "Node $NODE is now empty. Perform your upgrade here."
echo "Press ENTER when upgrade is complete..."
read -r
echo "=== Step 4: Uncordon node (mark schedulable again) ==="
kubectl uncordon "$NODE"
echo "=== Done! Node $NODE is back in rotation ==="
kubectl get node "$NODE"
The --timeout=300s flag sets a 5-minute deadline for the entire drain. If PDBs prevent all evictions within this window, the drain fails gracefully rather than hanging indefinitely. Tune the timeout based on your pod startup time.
Cluster Autoscaler and PDB Interaction
The Cluster Autoscaler (CA) scales down nodes by evicting pods and terminating idle nodes. PDBs directly affect this process — if a PDB would be violated by evicting a pod on a candidate scale-down node, the CA skips that node.
How CA Respects PDBs
- CA calls the same Eviction API as
kubectl drain. - If any pod on the candidate node is protected by a PDB that would be violated, the node is marked "not safe to remove" and skipped for that cycle.
- CA retries scale-down on the next cycle (typically every 10 minutes).
The safe-to-evict Annotation
For pods that are safe to evict regardless of PDB (e.g., batch jobs, log shippers), add the following annotation to allow CA to proceed without waiting:
# Allow CA to evict this pod even if a PDB would block it
apiVersion: v1
kind: Pod
metadata:
name: log-shipper
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
containers:
- name: fluentd
image: fluentd:v1.16
Conversely, to prevent CA from ever evicting a pod on a critical node:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
safe-to-evict: "false" on all pods on a node will permanently block CA from scaling that node down. Use sparingly — it can lead to wasted cloud spend on idle nodes.
For more on resource-aware scheduling, see our Kubernetes Resource Management guide.
Reading PDB Status: disruptionsAllowed, currentHealthy, desiredHealthy
After applying a PDB, use kubectl get pdb to observe its live status. Understanding these fields is essential for diagnosing why a drain is stuck.
# List all PDBs in a namespace
kubectl get pdb -n production
# Detailed view including status
kubectl get pdb web-frontend-pdb -n production -o yaml
Example output:
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
web-frontend-pdb 2 N/A 1 3d
The key status fields in the YAML output:
status:
conditions:
- lastTransitionTime: "2026-06-10T08:00:00Z"
message: ""
reason: SufficientPods
status: "True"
type: DisruptionAllowed
currentHealthy: 3 # pods currently passing readiness probe
desiredHealthy: 2 # minimum required healthy pods (= minAvailable)
disruptionsAllowed: 1 # how many pods can be evicted right now
expectedPods: 3 # total pods matched by selector
observedGeneration: 1
Field meanings:
- currentHealthy — pods that are Running and passing their readiness probe.
- desiredHealthy — the floor derived from your
minAvailableormaxUnavailablespec. - disruptionsAllowed —
currentHealthy - desiredHealthy. When this is 0, all evictions are blocked. - expectedPods — the total pods matched by the PDB selector, as reported by the controller.
kubectl drain hangs, run kubectl get pdb -n <ns> and look for ALLOWED DISRUPTIONS: 0. Then check why currentHealthy is at or below desiredHealthy — usually a pod is stuck in Pending or failing its readiness probe.
Common Pitfall: PDB with a Single Replica
The most frequent misconfiguration seen in production clusters is applying minAvailable: 1 to a single-replica Deployment. This seems harmless but creates a permanent drain blocker.
# DANGEROUS: single-replica deployment with minAvailable: 1
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: single-app-pdb
spec:
minAvailable: 1 # requires 1 pod, but there is only 1 pod total
selector:
matchLabels:
app: single-app
With 1 replica and minAvailable: 1, disruptionsAllowed is always 0. No eviction is ever permitted. Node drains will hang indefinitely waiting for this pod to be evictable — which it never will be.
Solutions:
- Scale the deployment to at least 2 replicas so the math works:
currentHealthy(2) - desiredHealthy(1) = 1disruption allowed. - Use
maxUnavailable: 1instead ofminAvailable: 1. For a single-replica deployment this still allows 0 disruptions, but at least expresses intent correctly. - Remove the PDB entirely if the workload does not require HA and can tolerate brief downtime during maintenance.
minAvailable: "100%" on any deployment is equivalent — it permanently blocks all voluntary disruptions. Only use 100% if you truly require zero tolerance for pod eviction.
PDB for Jobs and CronJobs
PDBs can technically be applied to pods created by Jobs and CronJobs, but the semantics are different and often counter-productive.
When PDB Applies to Jobs
- If a Job pod is running and a PDB with a matching selector is present, eviction of that pod will be gated by the budget.
- This means a node drain may be blocked until a Job pod completes — which could be hours or days for long-running batch jobs.
When PDB Does NOT Apply
- Completed pods (phase
SucceededorFailed) are not counted and are not subject to PDB eviction checks. - CronJob-spawned pods follow the same rules as regular Job pods once running.
Recommendation: For batch Jobs, prefer the safe-to-evict: "true" annotation over a PDB. For long-running Jobs that must complete without interruption, set a PDB with minAvailable: 1 but also scale the Job's parallelism so disruptionsAllowed > 0.
# Job with safe-to-evict — allows CA to evict without a PDB
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
restartPolicy: OnFailure
containers:
- name: migrator
image: myregistry/migrator:v1.2
Testing PDBs: Simulate a Drain in Dev
Before relying on a PDB in production, validate it in a staging or dev cluster. The goal is to verify that traffic is not dropped during a drain.
Step 1: Start a continuous curl loop
# In one terminal — fire requests every 0.5 seconds and log failures
while true; do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://your-service-endpoint/healthz)
if [ "$HTTP_CODE" != "200" ]; then
echo "$(date) — FAILED: HTTP $HTTP_CODE"
else
echo "$(date) — OK: $HTTP_CODE"
fi
sleep 0.5
done
Step 2: Drain a node while the curl loop runs
# In a second terminal — drain the node
kubectl drain dev-node-1 \
--ignore-daemonsets \
--delete-emptydir-data \
--grace-period=30 \
--timeout=120s
Step 3: Observe
With a correctly configured PDB and readiness probe, the curl loop should show no failures. The drain will proceed one pod at a time, each replacement becoming ready before the next eviction is allowed.
kubectl get pods -n production -w to watch pod transitions in real time. You should see each old pod Terminating only after a new pod reaches Running/Ready.
Multi-Zone HA: PDB + topologySpreadConstraints
A PDB alone does not guarantee zone-level availability. If all replicas happen to land on nodes in the same availability zone, a single zone failure takes down your entire service — regardless of PDB. Combine PDB with topologySpreadConstraints to enforce zone spread. See also our Affinity Guide and Taints & Tolerations guide.
# Full HA deployment: 6 replicas spread across 3 zones + PDB
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-web-service
namespace: production
spec:
replicas: 6
selector:
matchLabels:
app: ha-web
template:
metadata:
labels:
app: ha-web
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: ha-web
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: ha-web
containers:
- name: web
image: myregistry/web:v3.1
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 5
# PDB companion — allow at most 1 disruption at a time
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ha-web-pdb
namespace: production
spec:
maxUnavailable: 1
selector:
matchLabels:
app: ha-web
With this setup:
topologySpreadConstraintsensures no single zone or node has more than 1 extra pod compared to others (maxSkew: 1).- The PDB ensures at most 1 pod is evicted at a time during any drain operation.
- Even if an entire AZ fails (involuntary), 4 of 6 pods survive across the remaining 2 zones — enough to serve traffic.
Monitoring PDB Health
Integrate PDB status into your observability stack. Prometheus (via kube-state-metrics) exposes kube_poddisruptionbudget_status_disruptions_allowed. Alert when this drops to 0 for an extended period — it indicates a stuck pod blocking future maintenance. See our Kubernetes Monitoring with Prometheus guide for the full setup.
# Prometheus alert: PDB blocking all disruptions for 15 minutes
groups:
- name: pdb-alerts
rules:
- alert: PDBDisruptionsAllowedZero
expr: kube_poddisruptionbudget_status_disruptions_allowed == 0
for: 15m
labels:
severity: warning
annotations:
summary: "PDB {{ $labels.poddisruptionbudget }} has 0 disruptions allowed"
description: "Node drains will be blocked until this PDB allows at least 1 disruption."
Summary
Pod Disruption Budgets are a small but critical piece of Kubernetes high-availability architecture. Here is the quick-reference checklist:
- Always create a PDB for every workload with 2 or more replicas in production.
- Use
minAvailablefor quorum-sensitive stateful workloads;maxUnavailableis often more intuitive for stateless services. - Never set
minAvailable: 1on a single-replica deployment — it permanently blocks drains. - Define readiness probes on all PDB-protected pods so
currentHealthyreflects actual service health. - Combine PDB with
topologySpreadConstraintsfor zone-level resilience. - Monitor
disruptionsAllowedin Prometheus and alert on persistent zeros. - Test your PDB with a curl loop +
kubectl drainbefore relying on it in production.
Explore related topics in our Kubernetes series: