Velero: Kubernetes Backup and Disaster Recovery Guide (2026)
Velero is the de-facto open-source tool for Kubernetes backup, restore, and migration. It captures cluster state — Kubernetes objects and persistent volume data — and stores it in object storage like S3, GCS, or Azure Blob. This guide covers installation, scheduling, restoring after disaster, CSI snapshot integration, and cross-cluster workload migration.
Why Velero for Kubernetes DR
Kubernetes itself does not provide a built-in backup mechanism for workload data or cluster state beyond etcd snapshots. etcd snapshots restore the entire cluster but are impractical for granular namespace or application-level recovery. Velero fills this gap:
- Namespace-scoped backup — back up only the namespaces relevant to an application.
- PV data backup — integrates with cloud provider snapshots and CSI to back up persistent volume data alongside object definitions.
- Cluster migration — move workloads between clusters or cloud providers without manual YAML exports.
- GitOps complement — even with ArgoCD or Flux managing manifests, Velero captures runtime state (Secrets, ConfigMaps, PVC data) that git doesn't store.
Velero Architecture
Velero runs as a Deployment in your cluster and exposes custom resources that define backup behaviour:
- BackupStorageLocation (BSL) — points to an object-storage bucket (S3, GCS, Azure Blob) where backups are written.
- VolumeSnapshotLocation (VSL) — points to a cloud provider snapshot API used to snapshot PV disks.
- Backup — a single on-demand or scheduled backup of selected resources.
- Schedule — a cron-based schedule that creates Backup objects automatically.
- Restore — restores a Backup into the same or a different cluster.
Velero uses plugins for cloud providers. AWS, GCP, and Azure each have an official plugin that handles authentication, BSL access, and volume snapshots.
Installing Velero with S3
Install the Velero CLI and deploy the server using an AWS S3 bucket:
# Download and install the Velero CLI (Linux/Mac)
curl -L https://github.com/vmware-tanzu/velero/releases/download/v1.13.0/velero-v1.13.0-linux-amd64.tar.gz | tar xz
sudo mv velero-v1.13.0-linux-amd64/velero /usr/local/bin/
# Create an S3 bucket and IAM credentials file
# credentials-velero:
# [default]
# aws_access_key_id=AKIAIOSFODNN7EXAMPLE
# aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.9.0 \
--bucket my-velero-backups \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1 \
--secret-file ./credentials-velero
Verify the installation:
kubectl get pods -n velero
# NAME READY STATUS RESTARTS
# velero-7d9b6c4f8d-xz4qp 1/1 Running 0
velero backup-location get
# NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED
# default aws my-velero-backups Available 2026-06-11
Creating and Managing Backups
Create an on-demand backup of a namespace:
# Back up the production namespace
velero backup create prod-backup-20260611 \
--include-namespaces production \
--storage-location default \
--volume-snapshot-locations default \
--wait
# Verify the backup
velero backup describe prod-backup-20260611 --details
# List all backups
velero backup get
Back up multiple namespaces and exclude certain resources:
velero backup create app-stack-backup \
--include-namespaces frontend,backend,databases \
--exclude-resources pods,events \
--ttl 720h \
--wait
--ttl flag sets the backup retention period. After this duration Velero automatically deletes the backup from object storage. Default is 720h (30 days).
Back up specific label-selected resources cluster-wide:
velero backup create billing-service-backup \
--selector app=billing \
--include-cluster-resources=true
Scheduling Automated Backups
Schedules use standard cron syntax to create Backup objects automatically:
# Daily backup of production namespace at 2 AM UTC
velero schedule create daily-prod \
--schedule="0 2 * * *" \
--include-namespaces production \
--ttl 168h
# Weekly full-cluster backup on Sunday at midnight
velero schedule create weekly-full \
--schedule="0 0 * * 0" \
--include-cluster-resources=true \
--ttl 720h
# List schedules
velero schedule get
You can also define schedules as Kubernetes custom resources:
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-prod
namespace: velero
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
includeClusterResources: true
storageLocation: default
volumeSnapshotLocations:
- default
ttl: 168h0m0s
Restoring from Backup
Restore an entire backup into the original namespace:
# Restore the full backup
velero restore create --from-backup prod-backup-20260611 --wait
# Check restore status
velero restore describe prod-backup-20260611-20260611120000 --details
# List all restores
velero restore get
Restore to a different namespace (useful for staging environments):
velero restore create prod-to-staging \
--from-backup prod-backup-20260611 \
--namespace-mappings production:staging \
--wait
Restore only specific resources from a backup:
velero restore create selective-restore \
--from-backup prod-backup-20260611 \
--include-resources deployments,services,configmaps \
--wait
CSI Volume Snapshots
CSI snapshot integration lets Velero take point-in-time snapshots of persistent volumes using the Kubernetes CSI snapshot API, independent of cloud-provider plugins:
# Install the CSI snapshot controller (if not already present)
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/main/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/main/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
# Install Velero with CSI feature flag
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.9.0,velero/velero-plugin-for-csi:v0.7.0 \
--bucket my-velero-backups \
--backup-location-config region=us-east-1 \
--features=EnableCSI \
--secret-file ./credentials-velero
Create a VolumeSnapshotClass and annotate it for Velero:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-aws-vsc
labels:
velero.io/csi-volumesnapshot-class: "true"
driver: ebs.csi.aws.com
deletionPolicy: Retain
With CSI enabled, Velero automatically snapshots PVCs during backup and restores them during recovery — no separate volume snapshot location configuration needed.
Cross-Cluster Migration
Velero's most powerful use case is migrating workloads between clusters — for example, from on-prem to cloud, or from one cloud region to another:
# Step 1: On source cluster — install Velero and back up the workload
velero backup create migration-source \
--include-namespaces my-app \
--include-cluster-resources=true \
--wait
# Step 2: On target cluster — install Velero pointing to the SAME S3 bucket
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.9.0 \
--bucket my-velero-backups \ # Same bucket as source
--backup-location-config region=us-east-1 \
--secret-file ./credentials-velero
# Step 3: On target cluster — sync and restore
velero backup-location get # Should see migration-source listed
velero restore create --from-backup migration-source --wait
Production Best Practices
- Test restores regularly — schedule monthly restore drills into a staging namespace to verify backup integrity. An untested backup is not a backup.
- Use TTL matching your RTO/RPO — align backup retention (TTL) with your recovery time and point-in-time objectives. Daily backups with 30-day TTL is a common baseline.
- Back up Velero's own namespace — include the
veleronamespace in your backups so you can recover Velero itself along with its BSL and VSL configurations. - Enable resource filtering — exclude ephemeral resources like Pods and Events (
--exclude-resources pods,events) to reduce backup size and noise. - Encrypt your S3 bucket — Velero backups contain Secrets. Enable S3 server-side encryption (SSE-S3 or SSE-KMS) and restrict bucket access via IAM policies.
- Monitor backup status — use
velero backup getin a cron job or alerting rule to detect failed backups. Velero exposes Prometheus metrics at:8085/metrics. - Separate BSL per environment — use distinct S3 buckets or prefixes for production, staging, and dev backups to prevent accidental cross-environment restores.
FAQ
No. Velero uses the Kubernetes API server to enumerate and export objects. It does not access etcd directly. For full cluster recovery without an API server, use etcd snapshots in addition to Velero.
Yes, with CSI snapshots or a cloud provider volume snapshot plugin. Without snapshot integration, Velero backs up the StatefulSet and PVC objects but not the data on the volumes.
Object restoration (YAML re-application) is fast — seconds to minutes. PV data restore time depends on volume size and the snapshot provider's restore speed, which can be minutes to hours for large volumes.
Yes. Velero is open source (Apache 2.0) maintained by VMware Tanzu. You pay for the underlying object storage and volume snapshots from your cloud provider, not for Velero itself.