AWS EFS and FSx: Managed File Systems for Every Workload (2026)

Shared file storage is one of the most misunderstood corners of AWS. Engineers reach for S3 by reflex, then discover their application needs a POSIX filesystem. They create an EBS volume, then realise it can only attach to one EC2 instance at a time. AWS actually offers a rich portfolio of managed file storage services — EFS for elastic NFS, and four FSx variants (Windows File Server, Lustre, NetApp ONTAP, OpenZFS) — each purpose-built for different workload profiles. Choosing the wrong one costs you performance, money, and nights on-call. This guide cuts through the marketing and gives you the decision framework, the configuration details, and the working code to deploy each option correctly.

EFS vs FSx vs EBS vs S3 — When to Use Which
EFS Deep Dive: Performance and Throughput Modes
EFS Setup: CLI, Terraform, and Mount Configuration
EFS with ECS and EKS
FSx for Windows File Server
FSx for Lustre — HPC and ML Workloads
FSx for NetApp ONTAP — Multi-Protocol Enterprise Storage
FSx for OpenZFS — ZFS Snapshots and Clones
Backup and Disaster Recovery
Cost Optimization and TCO

EFS vs FSx vs EBS vs S3 — When to Use Which

Before diving into configuration, you need the right mental model. AWS offers four distinct storage paradigms: block (EBS), object (S3), elastic NFS (EFS), and managed file servers (FSx family). Each has fundamentally different consistency models, access patterns, and performance characteristics. Getting this decision right at architecture time prevents painful migrations later.

Here is the definitive comparison across the dimensions that actually matter in production:

Dimension	EBS	EFS	FSx (Windows)	FSx (Lustre)	FSx (ONTAP)	S3
Protocol	Block (iSCSI)	NFSv4.1	SMB 3.0	Lustre / POSIX NFS	NFS + SMB	HTTP REST
Multi-attach	io1/io2 only (limited)	Yes (thousands)	Yes (SMB shares)	Yes (Lustre clients)	Yes	Yes
Latency	<1ms	1–3ms	1–3ms	sub-ms	sub-ms	10–100ms
Max throughput	16 GB/s (io2 BE)	10+ GB/s (elastic)	2 GB/s	1000+ GB/s	4 GB/s	Unlimited
POSIX compliant	Yes	Yes	Partial	Yes	Yes	No
Windows ACLs	No	No	Yes (AD-integrated)	No	Yes	No
Elastic capacity	No (manual resize)	Yes (auto)	Manual	Manual	Manual	Unlimited
Best for	OS disk, DB primary	Shared web content, CMS, containers	Windows apps, DFS	HPC, ML training, genomics	Enterprise NAS migration	Backup, static assets, data lake

Decision shortcut: Use EBS when you need block storage for a single EC2 (databases, boot volumes). Use EFS when multiple Linux workloads need shared file access with elastic scaling. Use FSx for Windows when you have Windows-native applications or Active Directory dependency. Use FSx for Lustre when you need maximum parallel throughput for HPC or ML jobs. Use FSx for ONTAP when migrating an on-premises NetApp NAS. Use FSx for OpenZFS when you want ZFS-native snapshots and clones. Use S3 for everything else — object store, not a filesystem.

EFS Deep Dive: Performance and Throughput Modes

Amazon EFS is a fully managed elastic NFS filesystem. "Elastic" here means you do not provision capacity upfront — the file system grows and shrinks automatically as you add or remove files, and you pay only for what you store. You can mount the same EFS filesystem from thousands of EC2 instances, ECS tasks, and EKS pods simultaneously across multiple Availability Zones.

Performance Modes

EFS offers two performance modes, selected at creation time and not changeable later:

General Purpose (default) — lowest per-operation latency (1–3ms). Ideal for web serving, content management, home directories, development environments. Supports up to 35,000 read IOPS and 7,000 write IOPS. This is the right choice for 99% of workloads.
Max I/O — designed for workloads with thousands of concurrent NFS clients (big data analytics, media processing). Sacrifices per-operation latency (slightly higher) in exchange for aggregate throughput that scales to hundreds of thousands of IOPS. Not recommended unless you have a specific benchmark showing General Purpose is the bottleneck — Max I/O adds latency that hurts interactive workloads.

Important: Starting November 2023, EFS General Purpose supports Elastic Throughput which effectively removes the need for Max I/O in most cases. AWS now recommends staying on General Purpose with Elastic Throughput rather than switching to Max I/O.

Throughput Modes

Throughput mode determines how much read/write bandwidth your file system can deliver:

Bursting Throughput (legacy default) — throughput scales with storage size. You earn burst credits at 50 MB/s per TB stored, and can burst to 100 MB/s per TB (minimum 100 MB/s regardless of size). Works well for bursty, low-duty-cycle workloads that are not continuously streaming at full throughput.
Provisioned Throughput — you manually specify a throughput value (1 MB/s to 3 GB/s) independent of storage size. Use this when your workload needs more throughput than the bursting model provides, but you have predictable requirements. You pay for provisioned throughput above what your storage size earns.
Elastic Throughput (recommended) — automatically scales throughput up and down based on workload demand. No manual provisioning, no burst credits to manage. Can deliver up to 10 GB/s reads and 3 GB/s writes in General Purpose mode. Pay per GB transferred. This is now the recommended mode for almost all new file systems.

Storage Classes

EFS has two storage classes and lifecycle management that automatically moves files between them:

Standard — frequently accessed data. Replicated across multiple AZs (Standard) or within a single AZ (One Zone). Standard costs ~$0.30/GB-month.
Infrequent Access (IA) — lower storage cost (~$0.025/GB-month Standard-IA) but a per-request retrieval fee. EFS lifecycle management automatically transitions files not accessed for 7, 14, 30, 60, or 90 days. Files are transparently promoted back to Standard on access.

# View current lifecycle configuration
aws efs describe-lifecycle-configuration \
  --file-system-id fs-0abc123def456789

# Set 30-day IA transition + immediate move back on access
aws efs put-lifecycle-configuration \
  --file-system-id fs-0abc123def456789 \
  --lifecycle-policies \
    '[{"TransitionToIA":"AFTER_30_DAYS"},{"TransitionToPrimaryStorageClass":"AFTER_1_ACCESS"}]'

EFS Setup: CLI, Terraform, and Mount Configuration

Creating an EFS filesystem is simple, but creating it correctly — with the right throughput mode, encryption, mount targets in all AZs, and security groups — requires attention to several details.

CLI Setup

# 1. Create the file system (Elastic Throughput, encrypted)
aws efs create-file-system \
  --performance-mode generalPurpose \
  --throughput-mode elastic \
  --encrypted \
  --kms-key-id arn:aws:kms:us-east-1:123456789:key/mrk-abc123 \
  --tags Key=Name,Value=prod-shared-efs Key=Env,Value=production

# Output: {"FileSystemId": "fs-0abc123def456789", ...}

# 2. Create mount targets in each AZ's subnet
for SUBNET in subnet-aaa111 subnet-bbb222 subnet-ccc333; do
  aws efs create-mount-target \
    --file-system-id fs-0abc123def456789 \
    --subnet-id $SUBNET \
    --security-groups sg-efs-clients
done

# 3. Create a security group rule — allow NFS (2049) from app SG
aws ec2 authorize-security-group-ingress \
  --group-id sg-efs-clients \
  --protocol tcp \
  --port 2049 \
  --source-group sg-app-servers

# 4. Mount on an EC2 instance (install amazon-efs-utils first)
sudo yum install -y amazon-efs-utils
sudo mkdir /mnt/efs
sudo mount -t efs -o tls fs-0abc123def456789:/ /mnt/efs

# 5. Persistent mount via /etc/fstab
echo "fs-0abc123def456789:/ /mnt/efs efs _netdev,tls,iam 0 0" | sudo tee -a /etc/fstab

Always use amazon-efs-utils with TLS. The -o tls option encrypts data in transit using TLS 1.2. Without it, NFS traffic is plaintext on the wire inside your VPC. The iam option enforces IAM-based NFS authorization on top of standard POSIX permissions — combine with EFS resource policies for zero-trust access control.

Terraform Configuration

resource "aws_efs_file_system" "prod" {
  performance_mode = "generalPurpose"
  throughput_mode  = "elastic"
  encrypted        = true
  kms_key_id       = aws_kms_key.efs.arn

  lifecycle_policy {
    transition_to_ia                    = "AFTER_30_DAYS"
    transition_to_primary_storage_class = "AFTER_1_ACCESS"
  }

  tags = {
    Name        = "prod-shared-efs"
    Environment = "production"
  }
}

resource "aws_efs_mount_target" "az" {
  for_each = toset(var.private_subnet_ids)

  file_system_id  = aws_efs_file_system.prod.id
  subnet_id       = each.value
  security_groups = [aws_security_group.efs_mount.id]
}

resource "aws_efs_access_point" "app" {
  file_system_id = aws_efs_file_system.prod.id

  posix_user {
    uid = 1000
    gid = 1000
  }

  root_directory {
    path = "/app-data"
    creation_info {
      owner_uid   = 1000
      owner_gid   = 1000
      permissions = "755"
    }
  }

  tags = { Name = "app-access-point" }
}

resource "aws_security_group" "efs_mount" {
  name   = "efs-mount-sg"
  vpc_id = var.vpc_id

  ingress {
    from_port       = 2049
    to_port         = 2049
    protocol        = "tcp"
    security_groups = [var.app_security_group_id]
  }
}

EFS Access Points

Access Points are named entry points into an EFS filesystem. Each access point enforces a specific POSIX UID/GID, a root directory, and directory permissions. This is critical when multiple applications share one EFS filesystem — each app gets its own isolated subtree with its own permissions, without needing separate filesystems (and paying separate mount target costs).

# Create an access point that maps root to /teams/backend
aws efs create-access-point \
  --file-system-id fs-0abc123def456789 \
  --posix-user Uid=2000,Gid=2000 \
  --root-directory "Path=/teams/backend,CreationInfo={OwnerUid=2000,OwnerGid=2000,Permissions=750}" \
  --tags Key=App,Value=backend-api

# Mount using access point ARN
sudo mount -t efs -o tls,accesspoint=fsap-0abc123 fs-0abc123def456789:/ /mnt/backend

EFS with ECS and EKS

EFS is the go-to shared persistent storage for containerised workloads. Both ECS and EKS have native integrations that handle mount target selection, TLS encryption, and credential injection automatically.

EFS with ECS (Task Definition)

Add a volume block referencing EFS and an access point to your ECS task definition. The ECS agent handles the NFS mount on the host, injecting TLS and IAM credentials automatically.

{
  "family": "api-service",
  "taskRoleArn": "arn:aws:iam::123456789:role/ecs-task-role",
  "executionRoleArn": "arn:aws:iam::123456789:role/ecs-exec-role",
  "networkMode": "awsvpc",
  "volumes": [
    {
      "name": "efs-data",
      "efsVolumeConfiguration": {
        "fileSystemId": "fs-0abc123def456789",
        "rootDirectory": "/",
        "transitEncryption": "ENABLED",
        "transitEncryptionPort": 2049,
        "authorizationConfig": {
          "accessPointId": "fsap-0abc123def",
          "iam": "ENABLED"
        }
      }
    }
  ],
  "containerDefinitions": [
    {
      "name": "api",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api:latest",
      "mountPoints": [
        {
          "sourceVolume": "efs-data",
          "containerPath": "/app/data",
          "readOnly": false
        }
      ],
      "portMappings": [{"containerPort": 8080}],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/api-service",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "api"
        }
      }
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024"
}

IAM task role must allow EFS access. Add elasticfilesystem:ClientMount, elasticfilesystem:ClientWrite, and elasticfilesystem:ClientRootAccess (if needed) to the task role. Without this, the mount will be refused even though the security group allows port 2049.

EFS with EKS (PersistentVolume + StorageClass)

AWS provides the Amazon EFS CSI driver for Kubernetes. Install it via the EKS add-on, then create a StorageClass and PersistentVolumeClaim. The CSI driver creates an EFS Access Point per PVC automatically when using dynamic provisioning.

# Install the EFS CSI driver as an EKS managed add-on
aws eks create-addon \
  --cluster-name prod-cluster \
  --addon-name aws-efs-csi-driver \
  --service-account-role-arn arn:aws:iam::123456789:role/AmazonEKS_EFS_CSI_DriverRole

# StorageClass — dynamic provisioning via EFS access points
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap          # creates an access point per PVC
  fileSystemId: fs-0abc123def456789
  directoryPerms: "700"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
  basePath: "/dynamic-pv"           # root dir for auto-created access points
mountOptions:
  - tls
---
# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim
  namespace: default
spec:
  accessModes:
    - ReadWriteMany        # EFS supports concurrent writes from multiple pods
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi         # EFS ignores this value — it is elastic
---
# Deployment using the PVC
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web
          image: nginx:1.25
          volumeMounts:
            - name: persistent-storage
              mountPath: /usr/share/nginx/html
      volumes:
        - name: persistent-storage
          persistentVolumeClaim:
            claimName: efs-claim

FSx for Windows File Server

FSx for Windows File Server delivers a fully managed Windows-native filesystem backed by SSD storage and built on Windows Server. It supports SMB 3.0/3.1.1, NTFS, Windows ACLs, DFS namespaces, and Active Directory integration — everything Windows applications expect from a file server, without the overhead of running and patching Windows Server VMs.

Key Capabilities

Active Directory integration — join the filesystem to your AWS Managed Microsoft AD or self-managed on-premises AD. Users authenticate with their existing credentials and ACLs are enforced by AD group membership.
DFS Namespaces — create a unified namespace across multiple FSx file systems and on-premises shares. Users see \\corp\shares\finance regardless of whether the data lives in AWS or on-premises.
Multi-AZ deployment — active/standby configuration with automatic failover in under 30 seconds. Single-AZ is cheaper for dev/test.
Throughput capacity — 8 MB/s to 2 GB/s, configurable independently of storage. Scale throughput without migrating data.
Shadow copies (VSS) — built-in volume shadow copy service for end-user self-service file restore. Users can right-click → Previous Versions.

# Create FSx for Windows file system joined to AWS Managed AD
aws fsx create-file-system \
  --file-system-type WINDOWS \
  --storage-capacity 300 \
  --storage-type SSD \
  --subnet-ids subnet-aaa111 subnet-bbb222 \
  --windows-configuration '{
    "ActiveDirectoryId": "d-9067abcd12",
    "ThroughputCapacity": 128,
    "DeploymentType": "MULTI_AZ_1",
    "PreferredSubnetId": "subnet-aaa111",
    "AutomaticBackupRetentionDays": 7,
    "DailyAutomaticBackupStartTime": "02:00",
    "CopyTagsToBackups": true,
    "SelfManagedActiveDirectoryConfiguration": null
  }' \
  --tags Key=Name,Value=prod-windows-fsx

# Mount from a Windows EC2 instance (PowerShell)
# net use Z: \\fs-0abc.fsx.us-east-1.amazonaws.com\share

# Create a DFS namespace root pointing to FSx
# (run on a Windows Server with DFS role installed)
# New-DfsnRoot -TargetPath "\\fs-0abc.fsx.us-east-1.amazonaws.com\share" \
#   -Type DomainV2 -Path "\\corp.example.com\files"

# Terraform: FSx for Windows
resource "aws_fsx_windows_file_system" "prod" {
  active_directory_id             = aws_directory_service_directory.corp.id
  storage_capacity                = 300
  subnet_ids                      = [aws_subnet.primary.id, aws_subnet.secondary.id]
  throughput_capacity             = 128
  deployment_type                 = "MULTI_AZ_1"
  preferred_subnet_id             = aws_subnet.primary.id
  automatic_backup_retention_days = 7
  daily_automatic_backup_start_time = "02:00"
  copy_tags_to_backups            = true
  storage_type                    = "SSD"
  security_group_ids              = [aws_security_group.fsx_windows.id]

  tags = {
    Name = "prod-windows-fsx"
  }
}

Networking note: FSx for Windows requires DNS resolution of the file system's DNS name. EC2 instances must use the VPC DNS resolver (which is the default) and the VPC must have enableDnsHostnames and enableDnsSupport set to true. For cross-region access, set up Route 53 Resolver forwarding rules.

FSx for Lustre — HPC and ML Workloads

Lustre is the file system behind most of the world's top supercomputers. FSx for Lustre delivers fully managed Lustre with sub-millisecond latencies and aggregate throughput that scales to hundreds of GB/s. It's the right choice for ML training jobs that need to feed GPUs at full bandwidth, genomics pipelines, financial risk simulations, and any workload where storage I/O is the bottleneck.

Deployment Types

SCRATCH_1 / SCRATCH_2 — temporary, non-replicated storage for short-duration HPC jobs. SCRATCH_2 adds data encryption and higher burst throughput (200 MB/s per TiB). No automatic backups. Cheapest option.
PERSISTENT_1 — HA, replicated within a single AZ. SSD-backed. For long-running workloads that need data durability. 50, 100, or 200 MB/s per TiB baseline throughput.
PERSISTENT_2 — latest generation. SSD with 125, 250, 500, or 1000 MB/s per TiB. Supports both S3 data repository associations and auto-export. Recommended for all new deployments.

S3 Data Repository Association

FSx for Lustre can be linked to an S3 bucket so that your dataset is lazily imported on first access. Files appear in the Lustre namespace immediately (metadata only), and data is transferred on-demand. This eliminates the need to pre-stage TBs of training data — your ML job starts instantly and fetches what it needs.

# Create FSx for Lustre PERSISTENT_2 with S3 data repository
aws fsx create-file-system \
  --file-system-type LUSTRE \
  --storage-capacity 1200 \
  --storage-type SSD \
  --subnet-ids subnet-gpu-training \
  --lustre-configuration '{
    "DeploymentType": "PERSISTENT_2",
    "PerUnitStorageThroughput": 250,
    "DataCompressionType": "LZ4",
    "AutoImportPolicy": "NEW_CHANGED_DELETED",
    "ExportPath": "s3://ml-training-data/exports/",
    "ImportPath":  "s3://ml-training-data/datasets/",
    "WeeklyMaintenanceStartTime": "1:05:00"
  }' \
  --tags Key=Name,Value=ml-lustre

# Mount on a GPU instance (install lustre client first)
sudo amazon-linux-extras install -y lustre2.10
sudo mkdir /fsx
sudo mount -t lustre \
  -o relatime,flock \
  fs-0abc.fsx.us-east-1.amazonaws.com@tcp:/xxxxxxxx /fsx

# Preload a specific S3 prefix into Lustre cache
aws fsx create-data-repository-task \
  --file-system-id fs-0abc123def456789 \
  --type IMPORT_METADATA_ONLY \
  --paths "datasets/imagenet-2012/"

Striping Configuration

Lustre stripes files across Object Storage Targets (OSTs). Large files benefit from wider striping; small files from narrow (1 OST). Set stripe configuration before writing data:

# Set stripe count for a directory (data written here stripes across 4 OSTs)
lfs setstripe -c 4 /fsx/large-model-weights/

# Set stripe size to 4MB for checkpoint files
lfs setstripe -c 8 -S 4M /fsx/checkpoints/

# Check current stripe on a file
lfs getstripe /fsx/checkpoints/epoch_100.ckpt

# Monitor OST usage balance
lfs df /fsx

# Terraform: FSx for Lustre with S3 association
resource "aws_fsx_lustre_file_system" "ml" {
  storage_capacity              = 1200
  subnet_ids                    = [aws_subnet.gpu.id]
  deployment_type               = "PERSISTENT_2"
  per_unit_storage_throughput   = 250
  data_compression_type         = "LZ4"
  storage_type                  = "SSD"
  security_group_ids            = [aws_security_group.lustre.id]
  weekly_maintenance_start_time = "1:05:00"

  tags = { Name = "ml-training-lustre" }
}

resource "aws_fsx_data_repository_association" "training" {
  file_system_id       = aws_fsx_lustre_file_system.ml.id
  data_repository_path = "s3://ml-training-data/datasets/"
  file_system_path     = "/datasets"

  s3 {
    auto_import_policy {
      events = ["NEW", "CHANGED", "DELETED"]
    }
    auto_export_policy {
      events = ["NEW", "CHANGED", "DELETED"]
    }
  }
}

FSx for NetApp ONTAP — Multi-Protocol Enterprise Storage

FSx for NetApp ONTAP delivers a managed ONTAP cluster in AWS. If you are migrating an on-premises NetApp NAS or need enterprise storage capabilities — SnapMirror replication, FlexClone instant clones, multi-protocol NFS+SMB access, automatic tiering — ONTAP is in a different league from EFS. It is more expensive and more complex, but it eliminates the need for custom data management scripts.

Key Differentiators vs EFS

Multi-protocol — same volume served over both NFS and SMB simultaneously. Mixed Linux/Windows environments can access the same data without syncing tools.
SnapMirror — asynchronous replication to another FSx ONTAP in a different region or to on-premises ONTAP. RPO in minutes, RTO in minutes.
FlexClone — instant zero-copy clones of volumes or snapshots. Clone a 10 TB volume in seconds for test/dev environments.
Automatic tiering — inactive data automatically tiered to S3-backed capacity pool storage (3× cheaper than SSD). Transparent to applications.
iSCSI block storage — FSx ONTAP also provides iSCSI LUNs, making it useful for databases that need SAN-style block access.

# Create FSx for NetApp ONTAP Multi-AZ
aws fsx create-file-system \
  --file-system-type ONTAP \
  --storage-capacity 1024 \
  --subnet-ids subnet-primary subnet-standby \
  --ontap-configuration '{
    "DeploymentType": "MULTI_AZ_1",
    "ThroughputCapacity": 512,
    "PreferredSubnetId": "subnet-primary",
    "RouteTableIds": ["rtb-main","rtb-private"],
    "AutomaticBackupRetentionDays": 7,
    "DailyAutomaticBackupStartTime": "03:00",
    "FsxAdminPassword": "Secr3t!Admin",
    "DiskIopsConfiguration": {
      "Mode": "AUTOMATIC"
    }
  }' \
  --tags Key=Name,Value=prod-ontap

# Create a Storage Virtual Machine (SVM) and volume
aws fsx create-storage-virtual-machine \
  --file-system-id fs-0abc123def456789 \
  --name prod-svm \
  --root-volume-security-style MIXED

aws fsx create-volume \
  --volume-type ONTAP \
  --name app-data-vol \
  --ontap-configuration '{
    "JunctionPath": "/app-data",
    "SecurityStyle": "UNIX",
    "SizeInMegabytes": 102400,
    "StorageEfficiencyEnabled": true,
    "StorageVirtualMachineId": "svm-0abc123",
    "TieringPolicy": {
      "Name": "AUTO",
      "CoolingPeriod": 31
    }
  }'

# Set up SnapMirror to DR region (run from ONTAP CLI or BlueXP)
# snapmirror create -source-path prod-svm:app-data-vol \
#   -destination-path dr-svm:app-data-vol-dr \
#   -type XDP -policy MirrorAllSnapshots

ONTAP tiering reduces cost dramatically. With the AUTO tiering policy, data not accessed for 31 days moves to the capacity pool tier automatically. In typical enterprise workloads, 60–80% of data is cold, meaning 60–80% of your storage cost drops to capacity pool pricing (~$0.065/GB vs ~$0.28/GB for SSD). Enable StorageEfficiencyEnabled to also activate deduplication and compression, which often reduces effective capacity by another 2–3×.

FSx for OpenZFS — ZFS Snapshots and Clones

FSx for OpenZFS delivers a managed ZFS filesystem. ZFS is beloved by developers and DBAs for its data integrity guarantees, copy-on-write snapshots, and instant cloning. FSx OpenZFS is the right choice when you want ZFS-native features without managing ZFS on EC2 yourself — especially for development databases, content repositories, and anything needing point-in-time clones for testing.

ZFS Capabilities Available in FSx

Snapshots — point-in-time, space-efficient, near-instantaneous. Snapshots are copy-on-write so they do not double your storage on creation. Available via the AWS console, CLI, or scheduled automatically.
Clones — writable volumes created from a snapshot in seconds, consuming only the space for changed blocks. Create a production database clone for a load test in <5 seconds.
Compression — LZ4 or ZSTD compression applied transparently. Typical compression ratios of 1.3–2× for text and log data, 1.1× for already-compressed data.
Data integrity — ZFS checksums every block and detects/corrects silent data corruption automatically (self-healing via mirroring).

# Create FSx for OpenZFS
aws fsx create-file-system \
  --file-system-type OPENZFS \
  --storage-capacity 512 \
  --storage-type SSD \
  --subnet-ids subnet-app \
  --open-zfs-configuration '{
    "DeploymentType": "SINGLE_AZ_1",
    "ThroughputCapacity": 512,
    "RootVolumeConfiguration": {
      "DataCompressionType": "LZ4",
      "RecordSizeKiB": 128,
      "NfsExports": [
        {
          "ClientConfigurations": [
            {
              "Clients": "10.0.0.0/16",
              "Options": ["rw","crossmnt","no_root_squash"]
            }
          ]
        }
      ]
    },
    "AutomaticBackupRetentionDays": 7,
    "DailyAutomaticBackupStartTime": "04:00"
  }' \
  --tags Key=Name,Value=dev-openzfs

# Create a manual snapshot
aws fsx create-snapshot \
  --volume-id fsvol-0abc123 \
  --name "before-migration-$(date +%Y%m%d)"

# Create a clone volume from a snapshot
aws fsx create-volume \
  --volume-type OPENZFS \
  --name load-test-clone \
  --open-zfs-configuration '{
    "ParentVolumeId": "fsvol-0abc123",
    "OriginSnapshot": {
      "SnapshotARN": "arn:aws:fsx:us-east-1:123456789:snapshot:fsvolsnap-0abc123",
      "CopyStrategy": "CLONE"
    },
    "DataCompressionType": "LZ4",
    "NfsExports": [
      {
        "ClientConfigurations": [
          {"Clients": "10.0.1.0/24", "Options": ["rw","no_root_squash"]}
        ]
      }
    ]
  }'

# Mount on Linux
sudo mount -t nfs \
  -o nfsvers=4.1,rsize=1048576,wsize=1048576,timeo=600,retrans=2 \
  fs-0abc.fsx.us-east-1.amazonaws.com:/fsx /mnt/openzfs

Use OpenZFS for database dev/clone workflows. The pattern is: snapshot production volume → clone it → mount clone on dev/staging → run tests → delete clone. Total time: <60 seconds. Total extra storage cost for clone: only the changed blocks (often <5% of volume size). This replaces hours of pg_dump | psql pipelines.

Backup and Disaster Recovery

All AWS managed file systems support AWS Backup, giving you a unified backup management plane across EFS, FSx for Windows, FSx for Lustre (PERSISTENT only), FSx for ONTAP, and FSx for OpenZFS. AWS Backup handles scheduling, retention, cross-region copy, and cross-account copy from a single console and API.

AWS Backup for EFS

EFS backups use the EFS-native backup mechanism, which performs a consistent, incremental backup without impacting performance. The first backup is a full backup; subsequent backups are incremental based on changed blocks tracked at the filesystem level.

# Create a backup plan covering all EFS file systems with tag Backup=true
aws backup create-backup-plan --backup-plan '{
  "BackupPlanName": "efs-daily-weekly",
  "Rules": [
    {
      "RuleName": "daily-efs-backup",
      "TargetBackupVaultName": "prod-backup-vault",
      "ScheduleExpression": "cron(0 3 * * ? *)",
      "StartWindowMinutes": 60,
      "CompletionWindowMinutes": 180,
      "Lifecycle": {
        "DeleteAfterDays": 35
      },
      "CopyActions": [
        {
          "DestinationBackupVaultArn": "arn:aws:backup:us-west-2:123456789:backup-vault:dr-vault",
          "Lifecycle": {"DeleteAfterDays": 90}
        }
      ]
    },
    {
      "RuleName": "weekly-long-term",
      "TargetBackupVaultName": "prod-backup-vault",
      "ScheduleExpression": "cron(0 4 ? * SUN *)",
      "Lifecycle": {
        "MoveToColdStorageAfterDays": 30,
        "DeleteAfterDays": 365
      }
    }
  ]
}'

# Assign EFS filesystems by tag
aws backup create-backup-selection \
  --backup-plan-id  \
  --backup-selection '{
    "SelectionName": "tagged-efs",
    "IamRoleArn": "arn:aws:iam::123456789:role/AWSBackupDefaultServiceRole",
    "ListOfTags": [
      {"ConditionType": "STRINGEQUALS", "ConditionKey": "Backup", "ConditionValue": "true"}
    ]
  }'

EFS Cross-Region Replication

For RPO near zero, use EFS Replication rather than AWS Backup. EFS Replication asynchronously replicates all file system data and metadata to a destination EFS in another region, typically with a lag under 15 minutes. The destination is read-only until you fail over.

# Create EFS replication to us-west-2
aws efs create-replication-configuration \
  --source-file-system-id fs-0abc123def456789 \
  --destinations '[{
    "Region": "us-west-2",
    "KmsKeyId": "arn:aws:kms:us-west-2:123456789:key/mrk-dr"
  }]'

# Check replication status and lag
aws efs describe-replication-configurations \
  --file-system-id fs-0abc123def456789 \
  --query 'Replications[0].Destinations[0].{Status:Status,Lag:LastReplicatedTimestamp}'

Backup vs Replication: AWS Backup is for RPO in hours (daily backups) and point-in-time restore. EFS Replication is for RPO in minutes and fast regional failover. Use both: replication for your recovery site, AWS Backup for long-term retention and accidental deletion protection. They serve different failure modes.

Cost Optimization and TCO

Managed file storage on AWS can get expensive fast. Here are the most impactful levers to reduce costs without compromising performance or durability.

EFS Cost Optimization

Optimization	Savings potential	Action
Enable IA lifecycle policy (30 days)	50–80% for cold data	`aws efs put-lifecycle-configuration`
Switch to One Zone storage class	~47% vs Multi-AZ	Set `--availability-zone-name` at creation
Use Elastic Throughput vs Provisioned	Eliminate idle provisioned cost	Update throughput mode
Right-size Provisioned Throughput	10–40%	Reduce to P95 of actual usage

# Switch an existing EFS from Provisioned to Elastic throughput
aws efs update-file-system \
  --file-system-id fs-0abc123def456789 \
  --throughput-mode elastic

# Check actual throughput usage (CloudWatch metric)
aws cloudwatch get-metric-statistics \
  --namespace AWS/EFS \
  --metric-name MeteredIOBytes \
  --dimensions Name=FileSystemId,Value=fs-0abc123def456789 \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 3600 \
  --statistics Sum \
  --query 'sort_by(Datapoints,&Timestamp)[-1].Sum'

FSx Cost Optimization

FSx for Lustre: use SCRATCH_2 for jobs under 1 week (no replication cost, ~40% cheaper than PERSISTENT_2). Enable LZ4 compression — it's in-memory and costs nothing but saves 20–30% on SSD capacity for typical ML datasets.
FSx for Windows: use Single-AZ for dev/test (50% of Multi-AZ cost). For production, right-size throughput capacity — it is the dominant cost driver (not storage). Start at 32 MB/s and scale up only if CloudWatch shows FileServerBusySeconds > 1% of time.
FSx for ONTAP: enable auto-tiering to capacity pool. In steady state, 60% of data should be in the capacity pool tier. If your tiering ratio is below 40%, reduce the cooling period from 31 to 14 days.
FSx for OpenZFS: enable LZ4 compression always. For dev workloads, delete clones immediately after test runs to recover space. Use Single-AZ unless you need HA.

Total Cost of Ownership Comparison

For a 10 TB shared filesystem with 500 GB/month of infrequently-accessed data, accessed by 50 containers in EKS:

Option	Monthly cost (approx.)	Notes
EFS Standard, no IA	~$3,000	$0.30/GB × 10,000 GB
EFS Standard + IA (30d)	~$700	9.5TB in IA at $0.025, 500GB in Standard at $0.30
EFS One Zone + IA	~$370	One Zone Standard $0.16/GB, IA $0.013/GB
Self-managed NFS on EC2	~$500	r6g.xlarge + 10TB gp3 EBS, but ops overhead
FSx for ONTAP with tiering	~$1,200	More features but higher base cost

EFS One Zone + IA is the cheapest option for dev/staging or any environment that does not need cross-AZ redundancy. One Zone stores data in a single AZ (with 11 nines of durability within that AZ, since it replicates within the AZ) but without cross-AZ protection. For production workloads where the AZ going down must not cause an outage, stick with Multi-AZ Standard + IA.