AWS Backup: Centralized Data Protection Strategy
Managing backups across dozens of AWS services — EC2 volumes, RDS databases, DynamoDB tables, EFS file systems, S3 buckets — using individual service controls quickly becomes unmanageable. AWS Backup is a fully managed service that centralizes and automates data protection across AWS services and hybrid environments. This guide covers everything from creating your first backup plan to advanced cross-account, cross-region protection with Audit Manager compliance reporting.
Table of Contents
1. Core Concepts: Plans, Vaults, and Rules
Before diving into configuration, it's important to understand the three building blocks of AWS Backup:
- Backup Plan — a policy that defines when to back up, how long to retain backups, and where to store them. A plan contains one or more backup rules.
- Backup Rule — specifies the schedule (cron or rate), backup window, lifecycle (warm/cold storage transition and deletion), and the target vault.
- Backup Vault — an encrypted logical container for recovery points (backups). Each vault has an access policy and an optional vault lock for immutability.
- Recovery Point — a snapshot or backup of a specific resource at a point in time. Identified by a unique ARN.
- Resource Assignment — rules that map AWS resources (by ARN, tag, or resource type) to a backup plan.
Supported Services
As of 2026, AWS Backup supports the following services:
| Service | Resource Types | Notes |
|---|---|---|
| Amazon EC2 | Instances, EBS Volumes | AMI + EBS snapshots |
| Amazon RDS | DB Instances, Clusters | Includes Aurora |
| Amazon DynamoDB | Tables | On-demand and continuous backups |
| Amazon EFS | File Systems | Full and incremental backups |
| Amazon S3 | Buckets | Object-level continuous backup |
| Amazon FSx | Windows, Lustre, NetApp ONTAP | File system backups |
| AWS Storage Gateway | Volumes | Hybrid on-premises volumes |
| Amazon Redshift | Clusters | Automated snapshots |
| Amazon DocumentDB | Clusters | Includes cluster snapshots |
| Amazon Neptune | Clusters | Graph database backups |
| VMware Cloud on AWS | VMs | Hybrid workloads |
2. Creating a Backup Plan
You can create backup plans via the console, CLI, CloudFormation, or Terraform. Here's a production-ready plan using the AWS CLI that creates daily backups retained for 35 days with a weekly cold-tier transition and monthly backups retained for 1 year:
# 1. Create a backup vault with KMS encryption
aws backup create-backup-vault \
--backup-vault-name prod-backup-vault \
--encryption-key-arn arn:aws:kms:us-east-1:123456789012:key/mrk-abc123 \
--tags Environment=production,Team=platform
# 2. Create the backup plan JSON
cat > backup-plan.json <<'EOF'
{
"BackupPlanName": "prod-backup-plan",
"Rules": [
{
"RuleName": "daily-backup",
"TargetBackupVaultName": "prod-backup-vault",
"ScheduleExpression": "cron(0 5 ? * * *)",
"StartWindowMinutes": 60,
"CompletionWindowMinutes": 180,
"Lifecycle": {
"MoveToColdStorageAfterDays": 30,
"DeleteAfterDays": 35
},
"CopyActions": []
},
{
"RuleName": "monthly-backup",
"TargetBackupVaultName": "prod-backup-vault",
"ScheduleExpression": "cron(0 5 1 * ? *)",
"StartWindowMinutes": 60,
"CompletionWindowMinutes": 360,
"Lifecycle": {
"MoveToColdStorageAfterDays": 7,
"DeleteAfterDays": 365
},
"CopyActions": []
}
]
}
EOF
# 3. Create the backup plan
aws backup create-backup-plan \
--backup-plan file://backup-plan.json
cron(0 5 ? * * *) to run at 05:00 UTC daily. The ? in the day-of-month position means "any" — required because you cannot specify both day-of-month and day-of-week simultaneously. Set the start window large enough (60+ minutes) to accommodate service throttling during peak backup periods.Vault Lock for Immutability
For compliance requirements (HIPAA, PCI-DSS, SEC Rule 17a-4), enable Vault Lock to make recovery points immutable during the retention period — even the root account cannot delete them:
# Enable vault lock in governance mode (14-day cool-off period)
aws backup put-backup-vault-lock-configuration \
--backup-vault-name prod-backup-vault \
--min-retention-days 7 \
--max-retention-days 365 \
--changeable-for-days 14
# After 14 days it switches to compliance mode — IRREVERSIBLE
# In compliance mode even AWS Support cannot delete recovery points
3. Assigning Resources and Tag-Based Selection
Resource assignments tell AWS Backup which resources a plan protects. The most scalable approach is tag-based assignment — any resource with a matching tag is automatically included, including resources created after the plan was set up.
# Get the backup plan ID from the create output
PLAN_ID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# Create a tag-based resource assignment
aws backup create-backup-selection \
--backup-plan-id $PLAN_ID \
--backup-selection '{
"SelectionName": "prod-tagged-resources",
"IamRoleArn": "arn:aws:iam::123456789012:role/AWSBackupDefaultServiceRole",
"ListOfTags": [
{
"ConditionType": "STRINGEQUALS",
"ConditionKey": "Backup",
"ConditionValue": "daily"
}
],
"NotResources": [],
"Conditions": {
"StringEquals": [
{
"ConditionKey": "aws:ResourceTag/Environment",
"ConditionValue": "production"
}
]
}
}'
With this setup, any resource tagged Backup=daily and Environment=production — whether it's an EC2 instance, RDS cluster, or DynamoDB table — is automatically enrolled in the backup plan. Apply these tags in your infrastructure-as-code (Terraform/CloudFormation) to ensure every new resource is backed up on day one.
AWSBackupDefaultServiceRole is created automatically when you first use AWS Backup in the console. For CLI/IaC workflows, attach the managed policies AWSBackupServiceRolePolicyForBackup and AWSBackupServiceRolePolicyForRestores to a custom role.4. Cross-Region and Cross-Account Backup
A single-region backup does not protect against an AWS region outage. Cross-region copy creates a second recovery point in another region automatically as part of the backup rule. For even stronger isolation, cross-account copy places backups in a separate AWS account where a compromised production account cannot delete them.
Cross-Region Copy
{
"RuleName": "daily-with-cross-region-copy",
"TargetBackupVaultName": "prod-backup-vault",
"ScheduleExpression": "cron(0 5 ? * * *)",
"StartWindowMinutes": 60,
"CompletionWindowMinutes": 480,
"Lifecycle": {
"DeleteAfterDays": 35
},
"CopyActions": [
{
"DestinationBackupVaultArn": "arn:aws:backup:eu-west-1:123456789012:backup-vault:prod-backup-vault-dr",
"Lifecycle": {
"DeleteAfterDays": 35
}
}
]
}
Cross-Account Copy with AWS Organizations
Cross-account backup requires enabling backup in AWS Organizations and establishing trust between source and destination account vaults:
# Step 1: Enable cross-account backup in the management account
aws backup put-backup-vault-access-policy \
--backup-vault-name backup-account-vault \
--policy '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::PROD-ACCOUNT-ID:root"
},
"Action": [
"backup:CopyIntoBackupVault"
],
"Resource": "*"
}
]
}'
# Step 2: In backup plan CopyActions, reference the cross-account vault ARN
# "DestinationBackupVaultArn": "arn:aws:backup:us-east-1:BACKUP-ACCOUNT-ID:backup-vault:backup-account-vault"
# Step 3: Enable cross-account management via Organizations (management account)
aws backup put-backup-vault-notifications \
--backup-vault-name prod-backup-vault \
--sns-topic-arn arn:aws:sns:us-east-1:123456789012:backup-notifications \
--backup-vault-events BACKUP_JOB_STARTED BACKUP_JOB_COMPLETED BACKUP_JOB_FAILED \
COPY_JOB_FAILED RESTORE_JOB_FAILED
5. Service-Specific Backup Configurations
While AWS Backup provides a unified interface, each service has nuances worth understanding.
EC2 Instance Backup
EC2 backups create an AMI (Amazon Machine Image) plus EBS snapshots for all attached volumes. For application consistency, enable crash-consistent backups or use pre/post hooks with SSM:
# Tag EC2 instances for backup with pre/post scripts
aws ec2 create-tags \
--resources i-0abc1234def567890 \
--tags Key=Backup,Value=daily \
Key=Environment,Value=production \
Key="aws:backup:snapshot-without-reboot",Value=true
# For application-consistent backups, create a pre-backup hook
# that AWS Backup calls via SSM before taking the snapshot
aws ssm create-document \
--name "PreBackupHook-AppQuiesce" \
--document-type "Command" \
--content '{
"schemaVersion": "2.2",
"description": "Quiesce application before backup",
"mainSteps": [
{
"action": "aws:runShellScript",
"name": "quiesceApp",
"inputs": {
"runCommand": [
"systemctl stop myapp || true",
"sync && echo 3 > /proc/sys/vm/drop_caches"
]
}
}
]
}'
RDS and Aurora Continuous Backup
AWS Backup integrates with RDS automated backups but also adds centralized management and cross-account copy. For point-in-time recovery (PITR), enable continuous backups:
# Enable continuous backup for a DynamoDB table (PITR)
aws dynamodb update-continuous-backups \
--table-name orders-prod \
--point-in-time-recovery-specification PointInTimeRecoveryEnabled=true
# For RDS, verify backup retention via AWS Backup
aws backup list-recovery-points-by-resource \
--resource-arn arn:aws:rds:us-east-1:123456789012:db:myapp-prod \
--query 'RecoveryPoints[*].{Created:CreationDate,Status:Status,Size:BackupSizeInBytes}' \
--output table
EFS Backup
EFS backups use AWS Backup natively (the old EFS backup service was deprecated). Unlike EBS snapshots, EFS recovery points are stored in AWS Backup-managed S3 storage and support incremental backups after the first full backup:
# Verify EFS backup is enabled (automatic backup creates a plan)
aws efs describe-backup-policy \
--file-system-id fs-0abc1234
# Response shows ENABLED or DISABLED
# To enable automatic EFS backup:
aws efs put-backup-policy \
--file-system-id fs-0abc1234 \
--backup-policy Status=ENABLED
S3 Continuous Backup
S3 backup in AWS Backup provides object-level point-in-time recovery — useful when S3 Versioning alone isn't sufficient (for example, when you need to restore a bucket to a specific timestamp across millions of objects):
import boto3
backup = boto3.client('backup')
# Create a backup plan with S3 continuous backup rule
response = backup.create_backup_plan(
BackupPlan={
'BackupPlanName': 's3-continuous-backup',
'Rules': [
{
'RuleName': 's3-pitr-rule',
'TargetBackupVaultName': 'prod-backup-vault',
'ScheduleExpression': 'cron(0 5 ? * * *)',
'EnableContinuousBackup': True, # Enables PITR for S3
'Lifecycle': {
'DeleteAfterDays': 35
}
}
]
}
)
plan_id = response['BackupPlanId']
print(f"S3 backup plan created: {plan_id}")
# Assign specific S3 bucket
backup.create_backup_selection(
BackupPlanId=plan_id,
BackupSelection={
'SelectionName': 's3-prod-buckets',
'IamRoleArn': 'arn:aws:iam::123456789012:role/AWSBackupDefaultServiceRole',
'Resources': [
'arn:aws:s3:::my-prod-data-bucket'
]
}
)
print("S3 bucket assigned to backup plan")
6. AWS Backup Audit Manager
AWS Backup Audit Manager evaluates your backup activity against compliance controls and generates reports. It answers questions like "Do all production resources have a backup plan?" and "Were any backups deleted within their retention period?"
Built-in Controls
Audit Manager provides pre-built controls you can enable without writing custom rules:
| Control | What It Checks |
|---|---|
| BACKUP_PLAN_MIN_FREQUENCY_AND_MIN_RETENTION_CHECK | Resources backed up at least N times per period with Y-day retention |
| BACKUP_RECOVERY_POINT_ENCRYPTED | All recovery points are KMS-encrypted |
| BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK | Recovery points meet minimum retention requirement |
| BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN | Tagged resources have an active backup plan |
| BACKUP_RECOVERY_POINT_MANUAL_DELETION_DISABLED | Vault lock prevents manual deletion |
| RESTORE_TIME_MEETS_TARGET | Restore jobs complete within defined RTO |
# Create a backup framework with compliance controls
aws backup create-framework \
--framework-name "prod-compliance-framework" \
--framework-controls '[
{
"ControlName": "BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN",
"ControlInputParameters": [
{
"ParameterName": "requiredRetentionDays",
"ParameterValue": "35"
}
],
"Scope": {
"ComplianceResourceTypes": ["RDS", "DynamoDB", "EFS", "EC2"],
"Tags": {"Environment": "production"}
}
},
{
"ControlName": "BACKUP_RECOVERY_POINT_ENCRYPTED",
"ControlInputParameters": []
},
{
"ControlName": "BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK",
"ControlInputParameters": [
{
"ParameterName": "requiredRetentionDays",
"ParameterValue": "7"
}
]
}
]'
# Create a daily compliance report
aws backup create-report-plan \
--report-plan-name "daily-backup-compliance" \
--report-delivery-channel '{
"S3BucketName": "techoral-backup-reports",
"S3KeyPrefix": "compliance/",
"Formats": ["CSV", "JSON"]
}' \
--report-setting '{
"ReportTemplate": "RESOURCE_COMPLIANCE_REPORT",
"FrameworkArns": ["arn:aws:backup:us-east-1:123456789012:framework/prod-compliance-framework"]
}' \
--report-plan-tags Environment=production
7. Restore Testing and Validation
A backup that has never been tested is not a backup — it's an assumption. AWS Backup Restore Testing (launched 2023, GA in 2024) automates restore validation on a schedule. It restores a recovery point to an isolated environment, runs validation scripts, then cleans up automatically.
# Create a restore testing plan
aws backup create-restore-testing-plan \
--restore-testing-plan '{
"RestoreTestingPlanName": "weekly-restore-validation",
"ScheduleExpression": "cron(0 8 ? * 1 *)",
"StartWindowHours": 2,
"RecoveryPointSelection": {
"Algorithm": "LATEST_WITHIN_WINDOW",
"IncludeVaults": [
"arn:aws:backup:us-east-1:123456789012:backup-vault:prod-backup-vault"
],
"RecoveryPointTypes": ["SNAPSHOT", "CONTINUOUS"],
"SelectionWindowDays": 7
}
}'
# Add a RDS restore testing selection with validation
aws backup create-restore-testing-selection \
--restore-testing-plan-name "weekly-restore-validation" \
--restore-testing-selection '{
"RestoreTestingSelectionName": "rds-restore-test",
"ProtectedResourceType": "RDS",
"IamRoleArn": "arn:aws:iam::123456789012:role/RestoreTestingRole",
"ProtectedResourceArns": [
"arn:aws:rds:us-east-1:123456789012:db:myapp-prod"
],
"RestoreMetadataOverrides": {
"DBInstanceIdentifier": "restore-test-{{random}}",
"MultiAZ": "false",
"DBInstanceClass": "db.t3.small"
},
"ValidationWindowHours": 4
}'
After the restore completes, AWS Backup runs a validation step. Connect a Lambda function to perform application-level checks:
import boto3
import json
def lambda_handler(event, context):
"""
AWS Backup restore testing validation Lambda.
Called after restore completes. Returns SUCCEEDED or FAILED.
"""
restore_job_id = event.get('restoreJobId')
restored_resource_arn = event.get('createdResourceArn')
# Parse the restored RDS instance identifier from ARN
# arn:aws:rds:us-east-1:123456789012:db:restore-test-abc123
db_identifier = restored_resource_arn.split(':')[-1]
rds = boto3.client('rds')
# Wait for the restored instance to be available
waiter = rds.get_waiter('db_instance_available')
waiter.wait(DBInstanceIdentifier=db_identifier)
# Describe instance to verify it came up correctly
response = rds.describe_db_instances(DBInstanceIdentifier=db_identifier)
instance = response['DBInstances'][0]
checks = {
'status_available': instance['DBInstanceStatus'] == 'available',
'encrypted': instance['StorageEncrypted'],
'engine_correct': instance['Engine'] == 'postgres',
'storage_sufficient': instance['AllocatedStorage'] >= 50
}
all_passed = all(checks.values())
print(json.dumps({
'restoreJobId': restore_job_id,
'restoredArn': restored_resource_arn,
'checks': checks,
'result': 'SUCCEEDED' if all_passed else 'FAILED'
}))
return {
'statusCode': 200,
'validationStatus': 'SUCCEEDED' if all_passed else 'FAILED',
'validationStatusMessage': f"Checks: {checks}"
}
8. Cost Optimization Tips
AWS Backup costs are primarily driven by storage of recovery points. Here are proven strategies to minimize cost without compromising protection:
1. Use Cold Storage Tiering
Move recovery points to cold storage (up to 50% cheaper than warm) after the hot-restore window has passed. Typical pattern: warm for 7–30 days, cold for remainder:
{
"Lifecycle": {
"MoveToColdStorageAfterDays": 7,
"DeleteAfterDays": 365
}
}
Note: cold storage requires a minimum 90-day total retention (objects must stay in cold at least 90 days). If your total retention is under 90 days, skip cold tier.
2. Deduplicate with Incremental Backups
EFS, DynamoDB, and S3 backups are automatically incremental — only changed data is stored after the first full backup. EC2 (EBS) snapshots are also incremental at the block level. Only the first snapshot is full; subsequent ones store only changed blocks.
3. Right-Size Retention by Tier
| Resource Tier | Recommended Retention | Rationale |
|---|---|---|
| Tier 1 (Critical prod DB) | 35 days warm + 1 year cold | Compliance + disaster recovery |
| Tier 2 (Application servers) | 14 days warm | Rollback window for deployments |
| Tier 3 (Dev/test environments) | 7 days warm, no cold | Cost control — dev data is recreatable |
| Tier 4 (Ephemeral) | Exclude from backup | Stateless instances do not need backup |
4. Monitor and Alert on Backup Size Growth
# Create a CloudWatch alarm for backup storage exceeding budget
aws cloudwatch put-metric-alarm \
--alarm-name "backup-storage-overage" \
--metric-name "NumberOfBackupJobsCompleted" \
--namespace "AWS/Backup" \
--dimensions Name=BackupVaultName,Value=prod-backup-vault \
--statistic Sum \
--period 86400 \
--threshold 1000 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:backup-alerts \
--alarm-description "Alert when daily backup job count exceeds 1000"
# Use Cost Explorer to track backup costs by vault
aws ce get-cost-and-usage \
--time-period Start=2026-06-01,End=2026-06-07 \
--granularity DAILY \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["AWS Backup"]}}' \
--metrics BlendedCost \
--group-by Type=TAG,Key=Environment
5. Delete Orphaned Recovery Points
When you delete a resource (EC2 instance, RDS database), the recovery points remain in the vault and continue to accrue storage costs. Automate cleanup with an EventBridge rule that triggers when resources are deleted:
import boto3
def cleanup_orphaned_recovery_points(vault_name: str, dry_run: bool = True):
"""Find and delete recovery points whose source resources no longer exist."""
backup = boto3.client('backup')
paginator = backup.get_paginator('list_recovery_points_by_backup_vault')
orphaned = []
for page in paginator.paginate(BackupVaultName=vault_name):
for rp in page['RecoveryPoints']:
resource_arn = rp['ResourceArn']
rp_arn = rp['RecoveryPointArn']
# Check if source resource still exists
try:
backup.describe_protected_resource(ResourceArn=resource_arn)
except backup.exceptions.ResourceNotFoundException:
orphaned.append(rp_arn)
print(f"Orphaned: {rp_arn} (source: {resource_arn})")
if not dry_run:
backup.delete_recovery_point(
BackupVaultName=vault_name,
RecoveryPointArn=rp_arn
)
print(f" Deleted: {rp_arn}")
print(f"\nTotal orphaned recovery points: {len(orphaned)}")
return orphaned
# Run in dry-run mode first
cleanup_orphaned_recovery_points('prod-backup-vault', dry_run=True)