Kubernetes etcd Backup and Restore
etcd is the distributed key-value store that serves as the single source of truth for all Kubernetes cluster state — every pod, service, configmap, secret, and custom resource is stored in etcd. If etcd data is lost or corrupted without a backup, the entire cluster configuration is gone. A disciplined etcd backup strategy with tested restore procedures is therefore a non-negotiable requirement for production Kubernetes clusters.
Table of Contents
Understanding etcd in Kubernetes
In a standard kubeadm-deployed cluster, etcd runs as a static pod on each control plane node. Its data directory is typically /var/lib/etcd. The kube-apiserver is the only component that communicates directly with etcd — all other components (controller manager, scheduler, kubelet) interact with the cluster state through the API server.
etcd uses the Raft consensus algorithm to maintain consistency across its members. In a 3-member etcd cluster, the cluster can tolerate 1 node failure. In a 5-member cluster, 2 failures can be tolerated. The minimum recommended configuration for production is 3 etcd members, deployed across separate availability zones.
# Check etcd member list and health
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Check etcd cluster health
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint health
kubectl describe pod etcd-master -n kube-system | grep -A5 command.
Manual Backup with etcdctl
The etcdctl snapshot save command creates a point-in-time snapshot of etcd data. The snapshot is consistent and safe to take on a live cluster — etcd creates the snapshot atomically without interrupting cluster operations.
# Create a snapshot — run on a control plane node
BACKUP_FILE="/backup/etcd-$(date +%Y%m%d-%H%M%S).db"
sudo ETCDCTL_API=3 etcdctl snapshot save "$BACKUP_FILE" \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Verify the snapshot integrity
sudo ETCDCTL_API=3 etcdctl snapshot status "$BACKUP_FILE" \
--write-out=table
The snapshot status output shows the snapshot hash, revision, total keys, and total size. A typical production cluster snapshot is 50-500 MB depending on the number of resources and secrets stored.
# Example output of snapshot status
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 0x3d2a1b | 2847391 | 12453 | 128 MB |
+----------+----------+------------+------------+
Automated Backup CronJob
Manual backups are error-prone and easily forgotten. Automate etcd backups using a Kubernetes CronJob that runs on the control plane node using the host network and etcd certificate mounts.
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: "0 */6 * * *" # Every 6 hours
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
restartPolicy: OnFailure
containers:
- name: etcd-backup
image: bitnami/etcd:3.5
command:
- /bin/sh
- -c
- |
BACKUP_FILE="/backup/etcd-$(date +%Y%m%d-%H%M%S).db"
ETCDCTL_API=3 etcdctl snapshot save "$BACKUP_FILE" \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key
ETCDCTL_API=3 etcdctl snapshot status "$BACKUP_FILE" --write-out=table
# Keep only last 10 backups
ls -t /backup/*.db | tail -n +11 | xargs rm -f
echo "Backup complete: $BACKUP_FILE"
volumeMounts:
- name: etcd-certs
mountPath: /etc/kubernetes/pki/etcd
readOnly: true
- name: backup-dir
mountPath: /backup
volumes:
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd
- name: backup-dir
hostPath:
path: /var/etcd-backups
Storing Backups in S3
Local disk backups are insufficient — if the control plane node is lost, the backups go with it. Upload every snapshot to S3 (or GCS/Azure Blob) immediately after creation.
#!/bin/bash
# etcd-backup-s3.sh — run as CronJob or systemd timer
set -euo pipefail
BACKUP_FILE="/tmp/etcd-$(date +%Y%m%d-%H%M%S).db"
S3_BUCKET="s3://my-company-etcd-backups"
CLUSTER_NAME="${CLUSTER_NAME:-production}"
# Take snapshot
ETCDCTL_API=3 etcdctl snapshot save "$BACKUP_FILE" \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key
# Verify
ETCDCTL_API=3 etcdctl snapshot status "$BACKUP_FILE"
# Upload to S3
aws s3 cp "$BACKUP_FILE" \
"${S3_BUCKET}/${CLUSTER_NAME}/$(basename $BACKUP_FILE)" \
--sse aws:kms \
--kms-key-id alias/etcd-backups
# Delete local temp file
rm -f "$BACKUP_FILE"
echo "Backup uploaded to ${S3_BUCKET}/${CLUSTER_NAME}/"
# Delete backups older than 30 days
aws s3 ls "${S3_BUCKET}/${CLUSTER_NAME}/" \
| awk '{print $4}' \
| while read key; do
date=$(echo "$key" | grep -oP '\d{8}')
cutoff=$(date -d '30 days ago' +%Y%m%d)
if [[ "$date" < "$cutoff" ]]; then
aws s3 rm "${S3_BUCKET}/${CLUSTER_NAME}/$key"
fi
done
Restoring etcd from a Snapshot
Restoring etcd is a disruptive operation that requires stopping the API server and all control plane components. For a single control plane node cluster, the procedure is straightforward. For HA clusters with multiple etcd members, you must restore all members from the same snapshot simultaneously.
# Step 1: Download the snapshot from S3
aws s3 cp s3://my-company-etcd-backups/production/etcd-20260616-060000.db /tmp/restore.db
# Step 2: Move the API server static pod manifest out of the manifests directory
# (this stops the API server without systemctl)
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sudo mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
# Wait for containers to stop
sleep 30
# Step 3: Stop etcd (by removing its manifest too)
sudo mv /etc/kubernetes/manifests/etcd.yaml /tmp/
sleep 10
# Step 4: Remove the old etcd data directory
sudo mv /var/lib/etcd /var/lib/etcd.bak
# Step 5: Restore the snapshot
sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/restore.db \
--data-dir=/var/lib/etcd \
--name=master-01 \
--initial-cluster="master-01=https://10.0.0.1:2380" \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://10.0.0.1:2380
# Step 6: Restore the manifests — Kubernetes will restart all control plane pods
sudo mv /tmp/etcd.yaml /etc/kubernetes/manifests/
sleep 15
sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
# Step 7: Wait for API server to be ready
kubectl wait --for=condition=Ready node --all --timeout=300s
Application-Level Backup with Velero
etcd snapshots back up all cluster state but do not back up persistent volume data. Velero is a CNCF project that provides application-consistent backups including PersistentVolumes, making it complementary to etcd snapshots rather than a replacement.
# Install Velero with S3 backend
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket my-velero-backups \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1 \
--secret-file ./credentials-velero
# Create a scheduled backup of the production namespace
velero schedule create production-daily \
--schedule="0 2 * * *" \
--include-namespaces production \
--ttl 720h # 30 days retention
# Restore from a specific backup
velero restore create --from-backup production-daily-20260615020000
Testing Your Backup and Restore
An untested backup is not a backup — it is hope. Run a full restore drill at least quarterly in a non-production environment. Document the exact commands, time taken, and any surprises encountered. The worst time to discover your restore procedure is broken is during an actual disaster.
- Spin up a temporary cluster (e.g., with
kindor a cloud VM) - Restore your production etcd snapshot to it
- Verify that key resources (namespaces, deployments, secrets, configmaps) are present with
kubectl get all -A - Test that your most critical applications can be started from the restored state
- Measure total Recovery Time Objective (RTO) from snapshot download to cluster ready
Monitoring etcd Health
Proactive monitoring catches etcd degradation before it becomes a disaster. Key metrics and alerts to configure:
# Prometheus alerting rules for etcd
groups:
- name: etcd
rules:
- alert: EtcdMemberDown
expr: up{job="etcd"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "etcd member {{ $labels.instance }} is down"
- alert: EtcdHighCommitDuration
expr: histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) > 0.25
for: 10m
labels:
severity: warning
annotations:
summary: "etcd commit p99 latency is {{ $value }}s — disk may be slow"
- alert: EtcdDatabaseSizeHigh
expr: etcd_mvcc_db_total_size_in_bytes > 6e9
for: 5m
labels:
severity: warning
annotations:
summary: "etcd database size is {{ $value | humanize }}B — approaching 8GB limit"
- alert: EtcdNoRecentBackup
expr: time() - etcd_backup_last_success_timestamp > 86400
for: 1m
labels:
severity: critical
annotations:
summary: "No successful etcd backup in the last 24 hours"
--auto-compaction-retention=8 (hours) on the etcd pod arguments to keep the database size manageable.