Kubernetes etcd Backup and Restore

etcd is the distributed key-value store that serves as the single source of truth for all Kubernetes cluster state — every pod, service, configmap, secret, and custom resource is stored in etcd. If etcd data is lost or corrupted without a backup, the entire cluster configuration is gone. A disciplined etcd backup strategy with tested restore procedures is therefore a non-negotiable requirement for production Kubernetes clusters.

Understanding etcd in Kubernetes
Manual Backup with etcdctl
Automated Backup CronJob
Storing Backups in S3
Restoring etcd from a Snapshot
Application-Level Backup with Velero
Testing Your Backup and Restore
Monitoring etcd Health

Understanding etcd in Kubernetes

In a standard kubeadm-deployed cluster, etcd runs as a static pod on each control plane node. Its data directory is typically /var/lib/etcd. The kube-apiserver is the only component that communicates directly with etcd — all other components (controller manager, scheduler, kubelet) interact with the cluster state through the API server.

etcd uses the Raft consensus algorithm to maintain consistency across its members. In a 3-member etcd cluster, the cluster can tolerate 1 node failure. In a 5-member cluster, 2 failures can be tolerated. The minimum recommended configuration for production is 3 etcd members, deployed across separate availability zones.

# Check etcd member list and health
sudo ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list

# Check etcd cluster health
sudo ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint health

Certificate paths: The etcd TLS certificates are always required when running etcdctl against a secured etcd cluster (which is the default for kubeadm). The paths shown above are the standard kubeadm locations. Verify your paths with kubectl describe pod etcd-master -n kube-system | grep -A5 command.

Manual Backup with etcdctl

The etcdctl snapshot save command creates a point-in-time snapshot of etcd data. The snapshot is consistent and safe to take on a live cluster — etcd creates the snapshot atomically without interrupting cluster operations.

# Create a snapshot — run on a control plane node
BACKUP_FILE="/backup/etcd-$(date +%Y%m%d-%H%M%S).db"

sudo ETCDCTL_API=3 etcdctl snapshot save "$BACKUP_FILE" \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify the snapshot integrity
sudo ETCDCTL_API=3 etcdctl snapshot status "$BACKUP_FILE" \
  --write-out=table

The snapshot status output shows the snapshot hash, revision, total keys, and total size. A typical production cluster snapshot is 50-500 MB depending on the number of resources and secrets stored.

# Example output of snapshot status
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 0x3d2a1b | 2847391  |    12453   |   128 MB   |
+----------+----------+------------+------------+

Automated Backup CronJob

Manual backups are error-prone and easily forgotten. Automate etcd backups using a Kubernetes CronJob that runs on the control plane node using the host network and etcd certificate mounts.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: "0 */6 * * *"   # Every 6 hours
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
            - key: node-role.kubernetes.io/control-plane
              effect: NoSchedule
          restartPolicy: OnFailure
          containers:
            - name: etcd-backup
              image: bitnami/etcd:3.5
              command:
                - /bin/sh
                - -c
                - |
                  BACKUP_FILE="/backup/etcd-$(date +%Y%m%d-%H%M%S).db"
                  ETCDCTL_API=3 etcdctl snapshot save "$BACKUP_FILE" \
                    --endpoints=https://127.0.0.1:2379 \
                    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                    --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
                    --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
                  ETCDCTL_API=3 etcdctl snapshot status "$BACKUP_FILE" --write-out=table
                  # Keep only last 10 backups
                  ls -t /backup/*.db | tail -n +11 | xargs rm -f
                  echo "Backup complete: $BACKUP_FILE"
              volumeMounts:
                - name: etcd-certs
                  mountPath: /etc/kubernetes/pki/etcd
                  readOnly: true
                - name: backup-dir
                  mountPath: /backup
          volumes:
            - name: etcd-certs
              hostPath:
                path: /etc/kubernetes/pki/etcd
            - name: backup-dir
              hostPath:
                path: /var/etcd-backups

Storing Backups in S3

Local disk backups are insufficient — if the control plane node is lost, the backups go with it. Upload every snapshot to S3 (or GCS/Azure Blob) immediately after creation.

#!/bin/bash
# etcd-backup-s3.sh — run as CronJob or systemd timer

set -euo pipefail

BACKUP_FILE="/tmp/etcd-$(date +%Y%m%d-%H%M%S).db"
S3_BUCKET="s3://my-company-etcd-backups"
CLUSTER_NAME="${CLUSTER_NAME:-production}"

# Take snapshot
ETCDCTL_API=3 etcdctl snapshot save "$BACKUP_FILE" \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

# Verify
ETCDCTL_API=3 etcdctl snapshot status "$BACKUP_FILE"

# Upload to S3
aws s3 cp "$BACKUP_FILE" \
  "${S3_BUCKET}/${CLUSTER_NAME}/$(basename $BACKUP_FILE)" \
  --sse aws:kms \
  --kms-key-id alias/etcd-backups

# Delete local temp file
rm -f "$BACKUP_FILE"

echo "Backup uploaded to ${S3_BUCKET}/${CLUSTER_NAME}/"

# Delete backups older than 30 days
aws s3 ls "${S3_BUCKET}/${CLUSTER_NAME}/" \
  | awk '{print $4}' \
  | while read key; do
    date=$(echo "$key" | grep -oP '\d{8}')
    cutoff=$(date -d '30 days ago' +%Y%m%d)
    if [[ "$date" < "$cutoff" ]]; then
      aws s3 rm "${S3_BUCKET}/${CLUSTER_NAME}/$key"
    fi
  done

Encryption at rest: Always encrypt etcd backups using S3 server-side encryption with KMS. etcd snapshots contain all Kubernetes Secrets in plaintext (unless you have encryption at rest enabled on the API server). A leaked snapshot reveals every secret in your cluster.

Restoring etcd from a Snapshot

Restoring etcd is a disruptive operation that requires stopping the API server and all control plane components. For a single control plane node cluster, the procedure is straightforward. For HA clusters with multiple etcd members, you must restore all members from the same snapshot simultaneously.

# Step 1: Download the snapshot from S3
aws s3 cp s3://my-company-etcd-backups/production/etcd-20260616-060000.db /tmp/restore.db

# Step 2: Move the API server static pod manifest out of the manifests directory
# (this stops the API server without systemctl)
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sudo mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/

# Wait for containers to stop
sleep 30

# Step 3: Stop etcd (by removing its manifest too)
sudo mv /etc/kubernetes/manifests/etcd.yaml /tmp/
sleep 10

# Step 4: Remove the old etcd data directory
sudo mv /var/lib/etcd /var/lib/etcd.bak

# Step 5: Restore the snapshot
sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/restore.db \
  --data-dir=/var/lib/etcd \
  --name=master-01 \
  --initial-cluster="master-01=https://10.0.0.1:2380" \
  --initial-cluster-token=etcd-cluster-1 \
  --initial-advertise-peer-urls=https://10.0.0.1:2380

# Step 6: Restore the manifests — Kubernetes will restart all control plane pods
sudo mv /tmp/etcd.yaml /etc/kubernetes/manifests/
sleep 15
sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/

# Step 7: Wait for API server to be ready
kubectl wait --for=condition=Ready node --all --timeout=300s

Application-Level Backup with Velero

etcd snapshots back up all cluster state but do not back up persistent volume data. Velero is a CNCF project that provides application-consistent backups including PersistentVolumes, making it complementary to etcd snapshots rather than a replacement.

# Install Velero with S3 backend
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0 \
  --bucket my-velero-backups \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1 \
  --secret-file ./credentials-velero

# Create a scheduled backup of the production namespace
velero schedule create production-daily \
  --schedule="0 2 * * *" \
  --include-namespaces production \
  --ttl 720h    # 30 days retention

# Restore from a specific backup
velero restore create --from-backup production-daily-20260615020000

Testing Your Backup and Restore

An untested backup is not a backup — it is hope. Run a full restore drill at least quarterly in a non-production environment. Document the exact commands, time taken, and any surprises encountered. The worst time to discover your restore procedure is broken is during an actual disaster.

Spin up a temporary cluster (e.g., with kind or a cloud VM)
Restore your production etcd snapshot to it
Verify that key resources (namespaces, deployments, secrets, configmaps) are present with kubectl get all -A
Test that your most critical applications can be started from the restored state
Measure total Recovery Time Objective (RTO) from snapshot download to cluster ready

Monitoring etcd Health

Proactive monitoring catches etcd degradation before it becomes a disaster. Key metrics and alerts to configure:

# Prometheus alerting rules for etcd
groups:
  - name: etcd
    rules:
      - alert: EtcdMemberDown
        expr: up{job="etcd"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "etcd member {{ $labels.instance }} is down"

      - alert: EtcdHighCommitDuration
        expr: histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) > 0.25
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "etcd commit p99 latency is {{ $value }}s — disk may be slow"

      - alert: EtcdDatabaseSizeHigh
        expr: etcd_mvcc_db_total_size_in_bytes > 6e9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "etcd database size is {{ $value | humanize }}B — approaching 8GB limit"

      - alert: EtcdNoRecentBackup
        expr: time() - etcd_backup_last_success_timestamp > 86400
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No successful etcd backup in the last 24 hours"

Compaction: etcd keeps all historical versions of keys by default, causing the database to grow indefinitely. Enable automatic compaction with --auto-compaction-retention=8 (hours) on the etcd pod arguments to keep the database size manageable.