AWS EFS and FSx: Managed File Systems for Every Workload (2026)
Shared file storage is one of the most misunderstood corners of AWS. Engineers reach for S3 by reflex, then discover their application needs a POSIX filesystem. They create an EBS volume, then realise it can only attach to one EC2 instance at a time. AWS actually offers a rich portfolio of managed file storage services — EFS for elastic NFS, and four FSx variants (Windows File Server, Lustre, NetApp ONTAP, OpenZFS) — each purpose-built for different workload profiles. Choosing the wrong one costs you performance, money, and nights on-call. This guide cuts through the marketing and gives you the decision framework, the configuration details, and the working code to deploy each option correctly.
Table of Contents
- EFS vs FSx vs EBS vs S3 — When to Use Which
- EFS Deep Dive: Performance and Throughput Modes
- EFS Setup: CLI, Terraform, and Mount Configuration
- EFS with ECS and EKS
- FSx for Windows File Server
- FSx for Lustre — HPC and ML Workloads
- FSx for NetApp ONTAP — Multi-Protocol Enterprise Storage
- FSx for OpenZFS — ZFS Snapshots and Clones
- Backup and Disaster Recovery
- Cost Optimization and TCO
EFS vs FSx vs EBS vs S3 — When to Use Which
Before diving into configuration, you need the right mental model. AWS offers four distinct storage paradigms: block (EBS), object (S3), elastic NFS (EFS), and managed file servers (FSx family). Each has fundamentally different consistency models, access patterns, and performance characteristics. Getting this decision right at architecture time prevents painful migrations later.
Here is the definitive comparison across the dimensions that actually matter in production:
| Dimension | EBS | EFS | FSx (Windows) | FSx (Lustre) | FSx (ONTAP) | S3 |
|---|---|---|---|---|---|---|
| Protocol | Block (iSCSI) | NFSv4.1 | SMB 3.0 | Lustre / POSIX NFS | NFS + SMB | HTTP REST |
| Multi-attach | io1/io2 only (limited) | Yes (thousands) | Yes (SMB shares) | Yes (Lustre clients) | Yes | Yes |
| Latency | <1ms | 1–3ms | 1–3ms | sub-ms | sub-ms | 10–100ms |
| Max throughput | 16 GB/s (io2 BE) | 10+ GB/s (elastic) | 2 GB/s | 1000+ GB/s | 4 GB/s | Unlimited |
| POSIX compliant | Yes | Yes | Partial | Yes | Yes | No |
| Windows ACLs | No | No | Yes (AD-integrated) | No | Yes | No |
| Elastic capacity | No (manual resize) | Yes (auto) | Manual | Manual | Manual | Unlimited |
| Best for | OS disk, DB primary | Shared web content, CMS, containers | Windows apps, DFS | HPC, ML training, genomics | Enterprise NAS migration | Backup, static assets, data lake |
EFS Deep Dive: Performance and Throughput Modes
Amazon EFS is a fully managed elastic NFS filesystem. "Elastic" here means you do not provision capacity upfront — the file system grows and shrinks automatically as you add or remove files, and you pay only for what you store. You can mount the same EFS filesystem from thousands of EC2 instances, ECS tasks, and EKS pods simultaneously across multiple Availability Zones.
Performance Modes
EFS offers two performance modes, selected at creation time and not changeable later:
- General Purpose (default) — lowest per-operation latency (1–3ms). Ideal for web serving, content management, home directories, development environments. Supports up to 35,000 read IOPS and 7,000 write IOPS. This is the right choice for 99% of workloads.
- Max I/O — designed for workloads with thousands of concurrent NFS clients (big data analytics, media processing). Sacrifices per-operation latency (slightly higher) in exchange for aggregate throughput that scales to hundreds of thousands of IOPS. Not recommended unless you have a specific benchmark showing General Purpose is the bottleneck — Max I/O adds latency that hurts interactive workloads.
Throughput Modes
Throughput mode determines how much read/write bandwidth your file system can deliver:
- Bursting Throughput (legacy default) — throughput scales with storage size. You earn burst credits at 50 MB/s per TB stored, and can burst to 100 MB/s per TB (minimum 100 MB/s regardless of size). Works well for bursty, low-duty-cycle workloads that are not continuously streaming at full throughput.
- Provisioned Throughput — you manually specify a throughput value (1 MB/s to 3 GB/s) independent of storage size. Use this when your workload needs more throughput than the bursting model provides, but you have predictable requirements. You pay for provisioned throughput above what your storage size earns.
- Elastic Throughput (recommended) — automatically scales throughput up and down based on workload demand. No manual provisioning, no burst credits to manage. Can deliver up to 10 GB/s reads and 3 GB/s writes in General Purpose mode. Pay per GB transferred. This is now the recommended mode for almost all new file systems.
Storage Classes
EFS has two storage classes and lifecycle management that automatically moves files between them:
- Standard — frequently accessed data. Replicated across multiple AZs (Standard) or within a single AZ (One Zone). Standard costs ~$0.30/GB-month.
- Infrequent Access (IA) — lower storage cost (~$0.025/GB-month Standard-IA) but a per-request retrieval fee. EFS lifecycle management automatically transitions files not accessed for 7, 14, 30, 60, or 90 days. Files are transparently promoted back to Standard on access.
# View current lifecycle configuration
aws efs describe-lifecycle-configuration \
--file-system-id fs-0abc123def456789
# Set 30-day IA transition + immediate move back on access
aws efs put-lifecycle-configuration \
--file-system-id fs-0abc123def456789 \
--lifecycle-policies \
'[{"TransitionToIA":"AFTER_30_DAYS"},{"TransitionToPrimaryStorageClass":"AFTER_1_ACCESS"}]'
EFS Setup: CLI, Terraform, and Mount Configuration
Creating an EFS filesystem is simple, but creating it correctly — with the right throughput mode, encryption, mount targets in all AZs, and security groups — requires attention to several details.
CLI Setup
# 1. Create the file system (Elastic Throughput, encrypted)
aws efs create-file-system \
--performance-mode generalPurpose \
--throughput-mode elastic \
--encrypted \
--kms-key-id arn:aws:kms:us-east-1:123456789:key/mrk-abc123 \
--tags Key=Name,Value=prod-shared-efs Key=Env,Value=production
# Output: {"FileSystemId": "fs-0abc123def456789", ...}
# 2. Create mount targets in each AZ's subnet
for SUBNET in subnet-aaa111 subnet-bbb222 subnet-ccc333; do
aws efs create-mount-target \
--file-system-id fs-0abc123def456789 \
--subnet-id $SUBNET \
--security-groups sg-efs-clients
done
# 3. Create a security group rule — allow NFS (2049) from app SG
aws ec2 authorize-security-group-ingress \
--group-id sg-efs-clients \
--protocol tcp \
--port 2049 \
--source-group sg-app-servers
# 4. Mount on an EC2 instance (install amazon-efs-utils first)
sudo yum install -y amazon-efs-utils
sudo mkdir /mnt/efs
sudo mount -t efs -o tls fs-0abc123def456789:/ /mnt/efs
# 5. Persistent mount via /etc/fstab
echo "fs-0abc123def456789:/ /mnt/efs efs _netdev,tls,iam 0 0" | sudo tee -a /etc/fstab
amazon-efs-utils with TLS. The -o tls option encrypts data in transit using TLS 1.2. Without it, NFS traffic is plaintext on the wire inside your VPC. The iam option enforces IAM-based NFS authorization on top of standard POSIX permissions — combine with EFS resource policies for zero-trust access control.
Terraform Configuration
resource "aws_efs_file_system" "prod" {
performance_mode = "generalPurpose"
throughput_mode = "elastic"
encrypted = true
kms_key_id = aws_kms_key.efs.arn
lifecycle_policy {
transition_to_ia = "AFTER_30_DAYS"
transition_to_primary_storage_class = "AFTER_1_ACCESS"
}
tags = {
Name = "prod-shared-efs"
Environment = "production"
}
}
resource "aws_efs_mount_target" "az" {
for_each = toset(var.private_subnet_ids)
file_system_id = aws_efs_file_system.prod.id
subnet_id = each.value
security_groups = [aws_security_group.efs_mount.id]
}
resource "aws_efs_access_point" "app" {
file_system_id = aws_efs_file_system.prod.id
posix_user {
uid = 1000
gid = 1000
}
root_directory {
path = "/app-data"
creation_info {
owner_uid = 1000
owner_gid = 1000
permissions = "755"
}
}
tags = { Name = "app-access-point" }
}
resource "aws_security_group" "efs_mount" {
name = "efs-mount-sg"
vpc_id = var.vpc_id
ingress {
from_port = 2049
to_port = 2049
protocol = "tcp"
security_groups = [var.app_security_group_id]
}
}
EFS Access Points
Access Points are named entry points into an EFS filesystem. Each access point enforces a specific POSIX UID/GID, a root directory, and directory permissions. This is critical when multiple applications share one EFS filesystem — each app gets its own isolated subtree with its own permissions, without needing separate filesystems (and paying separate mount target costs).
# Create an access point that maps root to /teams/backend
aws efs create-access-point \
--file-system-id fs-0abc123def456789 \
--posix-user Uid=2000,Gid=2000 \
--root-directory "Path=/teams/backend,CreationInfo={OwnerUid=2000,OwnerGid=2000,Permissions=750}" \
--tags Key=App,Value=backend-api
# Mount using access point ARN
sudo mount -t efs -o tls,accesspoint=fsap-0abc123 fs-0abc123def456789:/ /mnt/backend
EFS with ECS and EKS
EFS is the go-to shared persistent storage for containerised workloads. Both ECS and EKS have native integrations that handle mount target selection, TLS encryption, and credential injection automatically.
EFS with ECS (Task Definition)
Add a volume block referencing EFS and an access point to your ECS task definition. The ECS agent handles the NFS mount on the host, injecting TLS and IAM credentials automatically.
{
"family": "api-service",
"taskRoleArn": "arn:aws:iam::123456789:role/ecs-task-role",
"executionRoleArn": "arn:aws:iam::123456789:role/ecs-exec-role",
"networkMode": "awsvpc",
"volumes": [
{
"name": "efs-data",
"efsVolumeConfiguration": {
"fileSystemId": "fs-0abc123def456789",
"rootDirectory": "/",
"transitEncryption": "ENABLED",
"transitEncryptionPort": 2049,
"authorizationConfig": {
"accessPointId": "fsap-0abc123def",
"iam": "ENABLED"
}
}
}
],
"containerDefinitions": [
{
"name": "api",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api:latest",
"mountPoints": [
{
"sourceVolume": "efs-data",
"containerPath": "/app/data",
"readOnly": false
}
],
"portMappings": [{"containerPort": 8080}],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/api-service",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "api"
}
}
}
],
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024"
}
elasticfilesystem:ClientMount, elasticfilesystem:ClientWrite, and elasticfilesystem:ClientRootAccess (if needed) to the task role. Without this, the mount will be refused even though the security group allows port 2049.
EFS with EKS (PersistentVolume + StorageClass)
AWS provides the Amazon EFS CSI driver for Kubernetes. Install it via the EKS add-on, then create a StorageClass and PersistentVolumeClaim. The CSI driver creates an EFS Access Point per PVC automatically when using dynamic provisioning.
# Install the EFS CSI driver as an EKS managed add-on
aws eks create-addon \
--cluster-name prod-cluster \
--addon-name aws-efs-csi-driver \
--service-account-role-arn arn:aws:iam::123456789:role/AmazonEKS_EFS_CSI_DriverRole
# StorageClass — dynamic provisioning via EFS access points
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap # creates an access point per PVC
fileSystemId: fs-0abc123def456789
directoryPerms: "700"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/dynamic-pv" # root dir for auto-created access points
mountOptions:
- tls
---
# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-claim
namespace: default
spec:
accessModes:
- ReadWriteMany # EFS supports concurrent writes from multiple pods
storageClassName: efs-sc
resources:
requests:
storage: 5Gi # EFS ignores this value — it is elastic
---
# Deployment using the PVC
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web
image: nginx:1.25
volumeMounts:
- name: persistent-storage
mountPath: /usr/share/nginx/html
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: efs-claim
FSx for Windows File Server
FSx for Windows File Server delivers a fully managed Windows-native filesystem backed by SSD storage and built on Windows Server. It supports SMB 3.0/3.1.1, NTFS, Windows ACLs, DFS namespaces, and Active Directory integration — everything Windows applications expect from a file server, without the overhead of running and patching Windows Server VMs.
Key Capabilities
- Active Directory integration — join the filesystem to your AWS Managed Microsoft AD or self-managed on-premises AD. Users authenticate with their existing credentials and ACLs are enforced by AD group membership.
- DFS Namespaces — create a unified namespace across multiple FSx file systems and on-premises shares. Users see
\\corp\shares\financeregardless of whether the data lives in AWS or on-premises. - Multi-AZ deployment — active/standby configuration with automatic failover in under 30 seconds. Single-AZ is cheaper for dev/test.
- Throughput capacity — 8 MB/s to 2 GB/s, configurable independently of storage. Scale throughput without migrating data.
- Shadow copies (VSS) — built-in volume shadow copy service for end-user self-service file restore. Users can right-click → Previous Versions.
# Create FSx for Windows file system joined to AWS Managed AD
aws fsx create-file-system \
--file-system-type WINDOWS \
--storage-capacity 300 \
--storage-type SSD \
--subnet-ids subnet-aaa111 subnet-bbb222 \
--windows-configuration '{
"ActiveDirectoryId": "d-9067abcd12",
"ThroughputCapacity": 128,
"DeploymentType": "MULTI_AZ_1",
"PreferredSubnetId": "subnet-aaa111",
"AutomaticBackupRetentionDays": 7,
"DailyAutomaticBackupStartTime": "02:00",
"CopyTagsToBackups": true,
"SelfManagedActiveDirectoryConfiguration": null
}' \
--tags Key=Name,Value=prod-windows-fsx
# Mount from a Windows EC2 instance (PowerShell)
# net use Z: \\fs-0abc.fsx.us-east-1.amazonaws.com\share
# Create a DFS namespace root pointing to FSx
# (run on a Windows Server with DFS role installed)
# New-DfsnRoot -TargetPath "\\fs-0abc.fsx.us-east-1.amazonaws.com\share" \
# -Type DomainV2 -Path "\\corp.example.com\files"
# Terraform: FSx for Windows
resource "aws_fsx_windows_file_system" "prod" {
active_directory_id = aws_directory_service_directory.corp.id
storage_capacity = 300
subnet_ids = [aws_subnet.primary.id, aws_subnet.secondary.id]
throughput_capacity = 128
deployment_type = "MULTI_AZ_1"
preferred_subnet_id = aws_subnet.primary.id
automatic_backup_retention_days = 7
daily_automatic_backup_start_time = "02:00"
copy_tags_to_backups = true
storage_type = "SSD"
security_group_ids = [aws_security_group.fsx_windows.id]
tags = {
Name = "prod-windows-fsx"
}
}
enableDnsHostnames and enableDnsSupport set to true. For cross-region access, set up Route 53 Resolver forwarding rules.
FSx for Lustre — HPC and ML Workloads
Lustre is the file system behind most of the world's top supercomputers. FSx for Lustre delivers fully managed Lustre with sub-millisecond latencies and aggregate throughput that scales to hundreds of GB/s. It's the right choice for ML training jobs that need to feed GPUs at full bandwidth, genomics pipelines, financial risk simulations, and any workload where storage I/O is the bottleneck.
Deployment Types
- SCRATCH_1 / SCRATCH_2 — temporary, non-replicated storage for short-duration HPC jobs. SCRATCH_2 adds data encryption and higher burst throughput (200 MB/s per TiB). No automatic backups. Cheapest option.
- PERSISTENT_1 — HA, replicated within a single AZ. SSD-backed. For long-running workloads that need data durability. 50, 100, or 200 MB/s per TiB baseline throughput.
- PERSISTENT_2 — latest generation. SSD with 125, 250, 500, or 1000 MB/s per TiB. Supports both S3 data repository associations and auto-export. Recommended for all new deployments.
S3 Data Repository Association
FSx for Lustre can be linked to an S3 bucket so that your dataset is lazily imported on first access. Files appear in the Lustre namespace immediately (metadata only), and data is transferred on-demand. This eliminates the need to pre-stage TBs of training data — your ML job starts instantly and fetches what it needs.
# Create FSx for Lustre PERSISTENT_2 with S3 data repository
aws fsx create-file-system \
--file-system-type LUSTRE \
--storage-capacity 1200 \
--storage-type SSD \
--subnet-ids subnet-gpu-training \
--lustre-configuration '{
"DeploymentType": "PERSISTENT_2",
"PerUnitStorageThroughput": 250,
"DataCompressionType": "LZ4",
"AutoImportPolicy": "NEW_CHANGED_DELETED",
"ExportPath": "s3://ml-training-data/exports/",
"ImportPath": "s3://ml-training-data/datasets/",
"WeeklyMaintenanceStartTime": "1:05:00"
}' \
--tags Key=Name,Value=ml-lustre
# Mount on a GPU instance (install lustre client first)
sudo amazon-linux-extras install -y lustre2.10
sudo mkdir /fsx
sudo mount -t lustre \
-o relatime,flock \
fs-0abc.fsx.us-east-1.amazonaws.com@tcp:/xxxxxxxx /fsx
# Preload a specific S3 prefix into Lustre cache
aws fsx create-data-repository-task \
--file-system-id fs-0abc123def456789 \
--type IMPORT_METADATA_ONLY \
--paths "datasets/imagenet-2012/"
Striping Configuration
Lustre stripes files across Object Storage Targets (OSTs). Large files benefit from wider striping; small files from narrow (1 OST). Set stripe configuration before writing data:
# Set stripe count for a directory (data written here stripes across 4 OSTs)
lfs setstripe -c 4 /fsx/large-model-weights/
# Set stripe size to 4MB for checkpoint files
lfs setstripe -c 8 -S 4M /fsx/checkpoints/
# Check current stripe on a file
lfs getstripe /fsx/checkpoints/epoch_100.ckpt
# Monitor OST usage balance
lfs df /fsx
# Terraform: FSx for Lustre with S3 association
resource "aws_fsx_lustre_file_system" "ml" {
storage_capacity = 1200
subnet_ids = [aws_subnet.gpu.id]
deployment_type = "PERSISTENT_2"
per_unit_storage_throughput = 250
data_compression_type = "LZ4"
storage_type = "SSD"
security_group_ids = [aws_security_group.lustre.id]
weekly_maintenance_start_time = "1:05:00"
tags = { Name = "ml-training-lustre" }
}
resource "aws_fsx_data_repository_association" "training" {
file_system_id = aws_fsx_lustre_file_system.ml.id
data_repository_path = "s3://ml-training-data/datasets/"
file_system_path = "/datasets"
s3 {
auto_import_policy {
events = ["NEW", "CHANGED", "DELETED"]
}
auto_export_policy {
events = ["NEW", "CHANGED", "DELETED"]
}
}
}
FSx for NetApp ONTAP — Multi-Protocol Enterprise Storage
FSx for NetApp ONTAP delivers a managed ONTAP cluster in AWS. If you are migrating an on-premises NetApp NAS or need enterprise storage capabilities — SnapMirror replication, FlexClone instant clones, multi-protocol NFS+SMB access, automatic tiering — ONTAP is in a different league from EFS. It is more expensive and more complex, but it eliminates the need for custom data management scripts.
Key Differentiators vs EFS
- Multi-protocol — same volume served over both NFS and SMB simultaneously. Mixed Linux/Windows environments can access the same data without syncing tools.
- SnapMirror — asynchronous replication to another FSx ONTAP in a different region or to on-premises ONTAP. RPO in minutes, RTO in minutes.
- FlexClone — instant zero-copy clones of volumes or snapshots. Clone a 10 TB volume in seconds for test/dev environments.
- Automatic tiering — inactive data automatically tiered to S3-backed capacity pool storage (3× cheaper than SSD). Transparent to applications.
- iSCSI block storage — FSx ONTAP also provides iSCSI LUNs, making it useful for databases that need SAN-style block access.
# Create FSx for NetApp ONTAP Multi-AZ
aws fsx create-file-system \
--file-system-type ONTAP \
--storage-capacity 1024 \
--subnet-ids subnet-primary subnet-standby \
--ontap-configuration '{
"DeploymentType": "MULTI_AZ_1",
"ThroughputCapacity": 512,
"PreferredSubnetId": "subnet-primary",
"RouteTableIds": ["rtb-main","rtb-private"],
"AutomaticBackupRetentionDays": 7,
"DailyAutomaticBackupStartTime": "03:00",
"FsxAdminPassword": "Secr3t!Admin",
"DiskIopsConfiguration": {
"Mode": "AUTOMATIC"
}
}' \
--tags Key=Name,Value=prod-ontap
# Create a Storage Virtual Machine (SVM) and volume
aws fsx create-storage-virtual-machine \
--file-system-id fs-0abc123def456789 \
--name prod-svm \
--root-volume-security-style MIXED
aws fsx create-volume \
--volume-type ONTAP \
--name app-data-vol \
--ontap-configuration '{
"JunctionPath": "/app-data",
"SecurityStyle": "UNIX",
"SizeInMegabytes": 102400,
"StorageEfficiencyEnabled": true,
"StorageVirtualMachineId": "svm-0abc123",
"TieringPolicy": {
"Name": "AUTO",
"CoolingPeriod": 31
}
}'
# Set up SnapMirror to DR region (run from ONTAP CLI or BlueXP)
# snapmirror create -source-path prod-svm:app-data-vol \
# -destination-path dr-svm:app-data-vol-dr \
# -type XDP -policy MirrorAllSnapshots
StorageEfficiencyEnabled to also activate deduplication and compression, which often reduces effective capacity by another 2–3×.
FSx for OpenZFS — ZFS Snapshots and Clones
FSx for OpenZFS delivers a managed ZFS filesystem. ZFS is beloved by developers and DBAs for its data integrity guarantees, copy-on-write snapshots, and instant cloning. FSx OpenZFS is the right choice when you want ZFS-native features without managing ZFS on EC2 yourself — especially for development databases, content repositories, and anything needing point-in-time clones for testing.
ZFS Capabilities Available in FSx
- Snapshots — point-in-time, space-efficient, near-instantaneous. Snapshots are copy-on-write so they do not double your storage on creation. Available via the AWS console, CLI, or scheduled automatically.
- Clones — writable volumes created from a snapshot in seconds, consuming only the space for changed blocks. Create a production database clone for a load test in <5 seconds.
- Compression — LZ4 or ZSTD compression applied transparently. Typical compression ratios of 1.3–2× for text and log data, 1.1× for already-compressed data.
- Data integrity — ZFS checksums every block and detects/corrects silent data corruption automatically (self-healing via mirroring).
# Create FSx for OpenZFS
aws fsx create-file-system \
--file-system-type OPENZFS \
--storage-capacity 512 \
--storage-type SSD \
--subnet-ids subnet-app \
--open-zfs-configuration '{
"DeploymentType": "SINGLE_AZ_1",
"ThroughputCapacity": 512,
"RootVolumeConfiguration": {
"DataCompressionType": "LZ4",
"RecordSizeKiB": 128,
"NfsExports": [
{
"ClientConfigurations": [
{
"Clients": "10.0.0.0/16",
"Options": ["rw","crossmnt","no_root_squash"]
}
]
}
]
},
"AutomaticBackupRetentionDays": 7,
"DailyAutomaticBackupStartTime": "04:00"
}' \
--tags Key=Name,Value=dev-openzfs
# Create a manual snapshot
aws fsx create-snapshot \
--volume-id fsvol-0abc123 \
--name "before-migration-$(date +%Y%m%d)"
# Create a clone volume from a snapshot
aws fsx create-volume \
--volume-type OPENZFS \
--name load-test-clone \
--open-zfs-configuration '{
"ParentVolumeId": "fsvol-0abc123",
"OriginSnapshot": {
"SnapshotARN": "arn:aws:fsx:us-east-1:123456789:snapshot:fsvolsnap-0abc123",
"CopyStrategy": "CLONE"
},
"DataCompressionType": "LZ4",
"NfsExports": [
{
"ClientConfigurations": [
{"Clients": "10.0.1.0/24", "Options": ["rw","no_root_squash"]}
]
}
]
}'
# Mount on Linux
sudo mount -t nfs \
-o nfsvers=4.1,rsize=1048576,wsize=1048576,timeo=600,retrans=2 \
fs-0abc.fsx.us-east-1.amazonaws.com:/fsx /mnt/openzfs
pg_dump | psql pipelines.
Backup and Disaster Recovery
All AWS managed file systems support AWS Backup, giving you a unified backup management plane across EFS, FSx for Windows, FSx for Lustre (PERSISTENT only), FSx for ONTAP, and FSx for OpenZFS. AWS Backup handles scheduling, retention, cross-region copy, and cross-account copy from a single console and API.
AWS Backup for EFS
EFS backups use the EFS-native backup mechanism, which performs a consistent, incremental backup without impacting performance. The first backup is a full backup; subsequent backups are incremental based on changed blocks tracked at the filesystem level.
# Create a backup plan covering all EFS file systems with tag Backup=true
aws backup create-backup-plan --backup-plan '{
"BackupPlanName": "efs-daily-weekly",
"Rules": [
{
"RuleName": "daily-efs-backup",
"TargetBackupVaultName": "prod-backup-vault",
"ScheduleExpression": "cron(0 3 * * ? *)",
"StartWindowMinutes": 60,
"CompletionWindowMinutes": 180,
"Lifecycle": {
"DeleteAfterDays": 35
},
"CopyActions": [
{
"DestinationBackupVaultArn": "arn:aws:backup:us-west-2:123456789:backup-vault:dr-vault",
"Lifecycle": {"DeleteAfterDays": 90}
}
]
},
{
"RuleName": "weekly-long-term",
"TargetBackupVaultName": "prod-backup-vault",
"ScheduleExpression": "cron(0 4 ? * SUN *)",
"Lifecycle": {
"MoveToColdStorageAfterDays": 30,
"DeleteAfterDays": 365
}
}
]
}'
# Assign EFS filesystems by tag
aws backup create-backup-selection \
--backup-plan-id \
--backup-selection '{
"SelectionName": "tagged-efs",
"IamRoleArn": "arn:aws:iam::123456789:role/AWSBackupDefaultServiceRole",
"ListOfTags": [
{"ConditionType": "STRINGEQUALS", "ConditionKey": "Backup", "ConditionValue": "true"}
]
}'
EFS Cross-Region Replication
For RPO near zero, use EFS Replication rather than AWS Backup. EFS Replication asynchronously replicates all file system data and metadata to a destination EFS in another region, typically with a lag under 15 minutes. The destination is read-only until you fail over.
# Create EFS replication to us-west-2
aws efs create-replication-configuration \
--source-file-system-id fs-0abc123def456789 \
--destinations '[{
"Region": "us-west-2",
"KmsKeyId": "arn:aws:kms:us-west-2:123456789:key/mrk-dr"
}]'
# Check replication status and lag
aws efs describe-replication-configurations \
--file-system-id fs-0abc123def456789 \
--query 'Replications[0].Destinations[0].{Status:Status,Lag:LastReplicatedTimestamp}'
Cost Optimization and TCO
Managed file storage on AWS can get expensive fast. Here are the most impactful levers to reduce costs without compromising performance or durability.
EFS Cost Optimization
| Optimization | Savings potential | Action |
|---|---|---|
| Enable IA lifecycle policy (30 days) | 50–80% for cold data | aws efs put-lifecycle-configuration |
| Switch to One Zone storage class | ~47% vs Multi-AZ | Set --availability-zone-name at creation |
| Use Elastic Throughput vs Provisioned | Eliminate idle provisioned cost | Update throughput mode |
| Right-size Provisioned Throughput | 10–40% | Reduce to P95 of actual usage |
# Switch an existing EFS from Provisioned to Elastic throughput
aws efs update-file-system \
--file-system-id fs-0abc123def456789 \
--throughput-mode elastic
# Check actual throughput usage (CloudWatch metric)
aws cloudwatch get-metric-statistics \
--namespace AWS/EFS \
--metric-name MeteredIOBytes \
--dimensions Name=FileSystemId,Value=fs-0abc123def456789 \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 \
--statistics Sum \
--query 'sort_by(Datapoints,&Timestamp)[-1].Sum'
FSx Cost Optimization
- FSx for Lustre: use SCRATCH_2 for jobs under 1 week (no replication cost, ~40% cheaper than PERSISTENT_2). Enable LZ4 compression — it's in-memory and costs nothing but saves 20–30% on SSD capacity for typical ML datasets.
- FSx for Windows: use Single-AZ for dev/test (50% of Multi-AZ cost). For production, right-size throughput capacity — it is the dominant cost driver (not storage). Start at 32 MB/s and scale up only if CloudWatch shows
FileServerBusySeconds> 1% of time. - FSx for ONTAP: enable auto-tiering to capacity pool. In steady state, 60% of data should be in the capacity pool tier. If your tiering ratio is below 40%, reduce the cooling period from 31 to 14 days.
- FSx for OpenZFS: enable LZ4 compression always. For dev workloads, delete clones immediately after test runs to recover space. Use Single-AZ unless you need HA.
Total Cost of Ownership Comparison
For a 10 TB shared filesystem with 500 GB/month of infrequently-accessed data, accessed by 50 containers in EKS:
| Option | Monthly cost (approx.) | Notes |
|---|---|---|
| EFS Standard, no IA | ~$3,000 | $0.30/GB × 10,000 GB |
| EFS Standard + IA (30d) | ~$700 | 9.5TB in IA at $0.025, 500GB in Standard at $0.30 |
| EFS One Zone + IA | ~$370 | One Zone Standard $0.16/GB, IA $0.013/GB |
| Self-managed NFS on EC2 | ~$500 | r6g.xlarge + 10TB gp3 EBS, but ops overhead |
| FSx for ONTAP with tiering | ~$1,200 | More features but higher base cost |