AWS DataSync: Automated Data Transfer Between On-Premises and AWS
Published: June 9, 2026 • 18 min read
Migrating terabytes of file system data from an on-premises NAS to Amazon S3 using rsync can take weeks and consumes a significant portion of your WAN bandwidth. AWS DataSync changes that equation by delivering transfer speeds up to 10x faster than open-source tools, built-in encryption, automatic retry, checksum verification, and a scheduling engine — all managed from a single console or CLI. This guide covers every component of DataSync from agent deployment to cost optimization, with production-ready CLI commands and Terraform HCL throughout.
- DataSync vs Storage Gateway vs Transfer Family vs S3 sync
- How DataSync Works — Architecture Deep Dive
- Agent Deployment — On-Premises and EC2
- Source Locations — NFS, SMB, HDFS, Object Storage
- Destination Locations — S3, EFS, FSx
- Task Configuration — Filters, Verify, Permissions, Scheduling
- AWS-to-AWS Transfers — No Agent Required
- Monitoring with CloudWatch and CloudTrail
- DataSync with Direct Connect and VPN
- Cost Optimization — Pricing, Compression, Scheduling
1. DataSync vs Storage Gateway vs Transfer Family vs S3 sync — Decision Table
AWS offers several managed services for moving data into and out of the cloud, and choosing the wrong one leads to unnecessary cost, complexity, or performance bottlenecks. Understanding the distinct use case for each service is the most important design decision before you write a single line of infrastructure code.
| Service | Primary Use Case | Protocol | Agent Required | Ongoing vs One-Shot | DataSync Wins When... |
|---|---|---|---|---|---|
| AWS DataSync | Bulk file/object migration & scheduled sync | NFS, SMB, HDFS, S3-compatible, EFS, FSx | Yes (on-prem); No (AWS-to-AWS) | Both | Speed, automation, and verification matter |
| AWS Storage Gateway | Hybrid storage — on-prem apps access S3/EFS via NFS/SMB | NFS, SMB, iSCSI | Yes (VM or hardware) | Ongoing | On-prem apps need cloud storage without code changes |
| AWS Transfer Family | Managed SFTP/FTPS/FTP endpoint in front of S3 or EFS | SFTP, FTPS, FTP, AS2 | No | Ongoing | External partners need SFTP access to S3 |
| aws s3 sync (CLI) | Ad-hoc S3-to-S3 or local-to-S3 sync | HTTPS (S3 API) | No | One-shot / scripted | Small datasets, simple scripts, no scheduling needed |
| AWS Snowball Edge | Offline bulk migration where WAN is unavailable or too slow | Physical device | No | One-shot | Data > 10 TB and WAN cost/time is prohibitive |
DataSync wins when you need to move structured file system data (NFS shares, SMB volumes, HDFS) at high speed with integrity verification, and when you need that process to run repeatedly on a schedule — for example nightly incremental syncs from an on-premises data warehouse landing zone to S3 for downstream EMR/Glue processing. If all you need is to expose a cloud bucket to legacy apps over NFS, Storage Gateway is the correct choice. If external trading partners need SFTP, use Transfer Family.
2. How DataSync Works — Architecture Deep Dive
DataSync is built around four core concepts: agents, locations, tasks, and task executions. Understanding how these relate to each other is essential before you start configuring anything in the console or CLI.
- Agent — a VM or container you deploy in your on-premises environment (or an EC2 instance for AWS-to-AWS). The agent mounts source file systems, reads data, and sends it to AWS over TLS 1.2. For AWS-to-AWS transfers the agent is not required — DataSync operates entirely within the AWS backbone.
- Location — a pointer to a data store: an NFS export, an SMB share, an HDFS cluster, an S3 bucket, an EFS file system, or an FSx volume. Locations are reusable — one location can be the source for many tasks.
- Task — a pairing of a source location and a destination location, plus configuration: include/exclude filters, verify mode, bandwidth throttle, POSIX permission handling, and schedule. A task defines what to transfer and how.
- Task Execution — a single run of a task. Each execution goes through five phases: LAUNCHING → PREPARING (directory listing) → TRANSFERRING → VERIFYING → SUCCESS or ERROR. CloudWatch emits metrics at each phase transition.
During the PREPARING phase, DataSync builds a tree of all source files and compares modification times, sizes, and checksums against the destination. Only files that differ are transferred. This incremental approach means subsequent executions of a nightly sync job transfer only the delta, not the full dataset — dramatically reducing bandwidth consumption and transfer time after the initial load.
Internally, DataSync parallelizes across multiple TCP connections per agent and can use multiple agents for a single task (multi-agent mode) to saturate even a 10 Gbps Direct Connect link. The service handles congestion control, error recovery, and automatic retry transparently, so you never need to write retry logic in a shell script.
3. Agent Deployment — On-Premises VM and EC2
The DataSync agent is distributed as a virtual machine image for VMware ESXi, Microsoft Hyper-V, and Linux KVM. For AWS-to-AWS transfers you can launch the agent as an EC2 instance from the official AMI. The agent requires outbound HTTPS (port 443) to the DataSync service endpoints in your target region, plus access to the source file system protocols (NFS port 2049, SMB port 445, etc.).
3.1 Deploy Agent on VMware ESXi
# 1. Download the DataSync agent OVA from the AWS console:
# Services → DataSync → Agents → Create agent → Download OVA
# 2. Deploy via VMware OVF Tool
ovftool \
--name=datasync-agent-prod \
--datastore=SSD-Datastore \
--network="VM Network" \
--diskMode=thin \
aws-datasync-agent.ova \
vi://administrator@vcenter.example.com/Datacenter/host/Cluster
# 3. Power on the VM and note the IP address from the VMware console
# The agent exposes a local activation UI on port 80
# 4. Activate the agent via CLI (replace ACTIVATION_KEY with the 5-part key
# displayed on the agent's local web UI, e.g., 12345-ABCDE-67890-FGHIJ-KLMNO)
aws datasync create-agent \
--activation-key "12345-ABCDE-67890-FGHIJ-KLMNO" \
--agent-name "prod-onprem-agent-01" \
--tags Key=Environment,Value=Production \
Key=Location,Value=MysoreDatacenter \
--region us-east-1
3.2 Deploy Agent on EC2 (AWS-to-AWS or co-located transfers)
# Launch the DataSync agent AMI in your VPC
# Find the current AMI ID for your region:
aws ssm get-parameter \
--name "/aws/service/datasync/ami" \
--region us-east-1 \
--query "Parameter.Value" \
--output text
# Launch EC2 agent instance (m5.2xlarge recommended for high throughput)
aws ec2 run-instances \
--image-id ami-0abc123def456789 \
--instance-type m5.2xlarge \
--subnet-id subnet-0abc12345 \
--security-group-ids sg-0abc12345 \
--iam-instance-profile Name=DataSyncAgentRole \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=datasync-ec2-agent}]' \
--count 1
# Activate as above using the IP or DNS of the EC2 instance
aws datasync create-agent \
--activation-key "AAAAA-BBBBB-CCCCC-DDDDD-EEEEE" \
--agent-name "ec2-agent-us-east-1" \
--vpc-endpoint-id vpce-0abc12345def67890 \
--subnet-arns arn:aws:ec2:us-east-1:123456789012:subnet/subnet-0abc12345 \
--security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-0abc12345 \
--region us-east-1
3.3 Terraform — Agent Resource
resource "aws_datasync_agent" "onprem_agent" {
name = "prod-onprem-agent-01"
activation_key = var.datasync_activation_key # from agent local UI
tags = {
Environment = "Production"
Location = "MysoreDatacenter"
}
}
4. Source Locations — NFS, SMB, HDFS, Object Storage
A DataSync location is created once and reused across multiple tasks. Source locations point DataSync at the data you want to transfer. The agent must have network access to the source system using the appropriate protocol and credentials.
4.1 NFS Source Location
aws datasync create-location-nfs \
--server-hostname "nas01.corp.example.com" \
--subdirectory "/exports/analytics-data" \
--on-prem-config AgentArns=arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123 \
--mount-options Version=NFS4_1 \
--tags Key=Source,Value=NAS01
4.2 SMB Source Location
aws datasync create-location-smb \
--server-hostname "fileserver01.corp.example.com" \
--subdirectory "/DataShare/Projects" \
--user "datasync-svc" \
--password "$(aws secretsmanager get-secret-value --secret-id datasync/smb-password --query SecretString --output text)" \
--domain "CORP" \
--agent-arns arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123 \
--mount-options Version=SMB3
4.3 HDFS Source Location
aws datasync create-location-hdfs \
--name-nodes "[{\"Hostname\":\"namenode01.corp.example.com\",\"Port\":8020}]" \
--subdirectory "/user/hive/warehouse/transactions" \
--replication-factor 3 \
--authentication-type KERBEROS \
--kerberos-principal "datasync/datasync-agent@CORP.EXAMPLE.COM" \
--kerberos-keytab fileb://datasync.keytab \
--kerberos-krb5-conf fileb:///etc/krb5.conf \
--agent-arns arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123
4.4 Terraform — NFS Location
resource "aws_datasync_location_nfs" "nas_analytics" {
server_hostname = "nas01.corp.example.com"
subdirectory = "/exports/analytics-data"
on_prem_config {
agent_arns = [aws_datasync_agent.onprem_agent.arn]
}
mount_options {
version = "NFS4_1"
}
tags = {
Source = "NAS01"
}
}
NFS4_1 when your server supports it — it provides better parallelism via session trunking. Fall back to NFS4_0 or AUTOMATIC only if your NAS firmware requires it.
5. Destination Locations — S3, EFS, FSx for Windows/Lustre/OpenZFS
DataSync supports five destination storage services. Each has different performance characteristics, use cases, and IAM requirements. The destination location IAM role must grant DataSync permission to write objects or files into the target storage.
5.1 S3 Destination
# Create IAM role for DataSync S3 access
aws iam create-role \
--role-name DataSyncS3Role \
--assume-role-policy-document '{
"Version":"2012-10-17",
"Statement":[{
"Effect":"Allow",
"Principal":{"Service":"datasync.amazonaws.com"},
"Action":"sts:AssumeRole"
}]
}'
aws iam attach-role-policy \
--role-name DataSyncS3Role \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
# Create S3 destination location
aws datasync create-location-s3 \
--s3-bucket-arn arn:aws:s3:::my-analytics-landing-bucket \
--subdirectory /incoming/nas-data \
--s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
--s3-storage-class STANDARD_IA \
--tags Key=Destination,Value=AnalyticsS3
5.2 EFS Destination
aws datasync create-location-efs \
--efs-filesystem-arn arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-abc12345 \
--ec2-config '{
"SubnetArn":"arn:aws:ec2:us-east-1:123456789012:subnet/subnet-0abc12345",
"SecurityGroupArns":["arn:aws:ec2:us-east-1:123456789012:security-group/sg-0abc12345"]
}' \
--subdirectory /shared/projects \
--in-transit-encryption TLS1_2
5.3 FSx for Windows File Server Destination
aws datasync create-location-fsx-windows \
--fsx-filesystem-arn arn:aws:fsx:us-east-1:123456789012:file-system/fs-0abc12345 \
--user "fsx-admin" \
--password "$(aws secretsmanager get-secret-value --secret-id datasync/fsx-password --query SecretString --output text)" \
--domain "corp.example.com" \
--security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-0abc12345 \
--subdirectory /DataShare
5.4 Terraform — S3 Destination
resource "aws_datasync_location_s3" "analytics_landing" {
s3_bucket_arn = aws_s3_bucket.analytics_landing.arn
subdirectory = "/incoming/nas-data"
s3_storage_class = "STANDARD_IA"
s3_config {
bucket_access_role_arn = aws_iam_role.datasync_s3.arn
}
tags = {
Destination = "AnalyticsS3"
}
}
GLACIER_INSTANT_RETRIEVAL. For analytics pipelines accessed within hours, STANDARD_IA balances cost and retrieval speed. DataSync sets the storage class per object at write time — you don't need a lifecycle rule to handle it.
6. Task Configuration — Filters, Verify Mode, POSIX Permissions, Scheduling
The DataSync task is where the real power lives. Beyond simply pointing source at destination, you can filter which files are transferred, control how checksums are verified, preserve POSIX metadata, throttle bandwidth, and set a recurring schedule — all without writing a single cron job or shell script.
6.1 Create a Task with Full Configuration
aws datasync create-task \
--name "NAS-to-S3-NightlySync" \
--source-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-abc123 \
--destination-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-def456 \
--cloud-watch-log-group-arn arn:aws:logs:us-east-1:123456789012:log-group:/datasync/NAS-to-S3 \
--options '{
"VerifyMode": "ONLY_FILES_TRANSFERRED",
"Atime": "BEST_EFFORT",
"Mtime": "PRESERVE",
"Uid": "INT_VALUE",
"Gid": "INT_VALUE",
"PreserveDeletedFiles": "REMOVE",
"PreserveDevices": "NONE",
"PosixPermissions": "PRESERVE",
"BytesPerSecond": 104857600,
"TaskQueueing": "ENABLED",
"LogLevel": "TRANSFER",
"TransferMode": "CHANGED",
"SecurityDescriptorCopyFlags": "NONE",
"ObjectTags": "PRESERVE"
}' \
--excludes '[
{"FilterType":"SIMPLE_PATTERN","Value":"*.tmp"},
{"FilterType":"SIMPLE_PATTERN","Value":"*.log"},
{"FilterType":"SIMPLE_PATTERN","Value":".Trash-*"}
]' \
--includes '[
{"FilterType":"SIMPLE_PATTERN","Value":"/analytics/**"},
{"FilterType":"SIMPLE_PATTERN","Value":"/reports/**"}
]' \
--schedule ScheduleExpression="cron(0 1 * * ? *)"
6.2 Key Option Explanations
| Option | Recommended Value | Notes |
|---|---|---|
| VerifyMode | ONLY_FILES_TRANSFERRED | Verifies checksum only for transferred files — fast. Use POINT_IN_TIME_CONSISTENT for full audits. |
| TransferMode | CHANGED | Only transfer files that differ. Use ALL to force full resync. |
| PreserveDeletedFiles | REMOVE | Delete destination files that were removed from source — true sync semantics. |
| PosixPermissions | PRESERVE | Critical when destination is EFS and POSIX ACLs must match source NFS export. |
| BytesPerSecond | 104857600 (100 MB/s) | -1 = unlimited. Always set a daytime cap when sharing WAN with production. |
| LogLevel | TRANSFER | Logs every transferred file to CloudWatch. Use BASIC for high-volume tasks to control CWL costs. |
6.3 Override Options at Execution Time
# Run a task immediately with a bandwidth override (e.g., allow full speed on a weekend)
aws datasync start-task-execution \
--task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
--override-options BytesPerSecond=-1,VerifyMode=POINT_IN_TIME_CONSISTENT
cron(0 1 * * ? *) runs at 01:00 UTC daily. cron(0 2 ? * MON-FRI *) runs at 02:00 UTC on weekdays only. Scheduling is built in — no Lambda or EventBridge rule required.
7. AWS-to-AWS Transfers — No Agent Required
When both source and destination are AWS storage services, DataSync operates entirely within the AWS backbone — no agent VM is needed, and data never traverses the public internet. This makes DataSync the fastest and most cost-effective option for S3 cross-region replication that requires filtering, verification, or non-S3 destinations, and for EFS-to-S3 or FSx-to-S3 data lake ingestion pipelines.
7.1 S3 Cross-Region Copy with Filtering
# Source: S3 bucket in us-east-1
aws datasync create-location-s3 \
--s3-bucket-arn arn:aws:s3:::prod-data-us-east-1 \
--subdirectory /raw \
--s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
--region us-east-1
# Destination: S3 bucket in ap-south-1
aws datasync create-location-s3 \
--s3-bucket-arn arn:aws:s3:::dr-data-ap-south-1 \
--subdirectory /raw \
--s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
--region us-east-1 # Task and locations are managed from source region
# Create cross-region replication task
aws datasync create-task \
--name "S3-CrossRegion-DR-Sync" \
--source-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-src123 \
--destination-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-dst456 \
--options TransferMode=CHANGED,VerifyMode=ONLY_FILES_TRANSFERRED \
--schedule ScheduleExpression="cron(0 */6 * * ? *)"
7.2 EFS to S3 Data Lake Ingestion
# Source: EFS file system
aws datasync create-location-efs \
--efs-filesystem-arn arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-app01 \
--ec2-config SubnetArn=arn:aws:ec2:us-east-1:123456789012:subnet/subnet-private,SecurityGroupArns=arn:aws:ec2:us-east-1:123456789012:security-group/sg-datasync \
--subdirectory /app/exports
# Destination: S3 data lake
aws datasync create-location-s3 \
--s3-bucket-arn arn:aws:s3:::data-lake-prod \
--subdirectory /efs-exports/app01 \
--s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
--s3-storage-class INTELLIGENT_TIERING
7.3 Cross-Account Transfer
# In the destination account, add a bucket policy allowing the source account's DataSync role
aws s3api put-bucket-policy \
--bucket cross-account-destination-bucket \
--policy '{
"Version":"2012-10-17",
"Statement":[{
"Effect":"Allow",
"Principal":{"AWS":"arn:aws:iam::SOURCE_ACCOUNT_ID:role/DataSyncS3Role"},
"Action":["s3:GetBucketLocation","s3:ListBucket","s3:ListBucketMultipartUploads",
"s3:PutObject","s3:AbortMultipartUpload","s3:DeleteObject"],
"Resource":["arn:aws:s3:::cross-account-destination-bucket",
"arn:aws:s3:::cross-account-destination-bucket/*"]
}]
}'
8. Monitoring with CloudWatch Metrics, CloudTrail, and SNS Alerts
DataSync publishes rich telemetry to CloudWatch automatically — no additional configuration is required to start seeing metrics. The key metrics fall into two categories: throughput/volume metrics (how much data moved) and execution outcome metrics (did the task succeed or fail).
8.1 Key CloudWatch Metrics
| Metric | Namespace | Description |
|---|---|---|
| BytesTransferred | AWS/DataSync | Total bytes moved during a task execution |
| FilesTransferred | AWS/DataSync | Number of files successfully transferred |
| BytesVerified | AWS/DataSync | Bytes verified by checksum in VERIFY phase |
| FilesVerified | AWS/DataSync | Files that passed checksum verification |
| FilesDeleted | AWS/DataSync | Files removed from destination (PreserveDeletedFiles=REMOVE) |
| TaskExecutionResultCode | AWS/DataSync | 0 = SUCCESS, non-zero = ERROR |
8.2 CloudWatch Alarm on Task Failure
aws cloudwatch put-metric-alarm \
--alarm-name "DataSync-Task-Failure" \
--alarm-description "Alert when DataSync NAS-to-S3 task fails" \
--namespace "AWS/DataSync" \
--metric-name "TaskExecutionResultCode" \
--dimensions Name=TaskId,Value=task-abc12345 \
--statistic Maximum \
--period 300 \
--evaluation-periods 1 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--treat-missing-data notBreaching \
--alarm-actions arn:aws:sns:us-east-1:123456789012:datasync-alerts \
--ok-actions arn:aws:sns:us-east-1:123456789012:datasync-alerts
8.3 CloudWatch Dashboard Query (Insights)
# Query DataSync transfer logs in CloudWatch Logs Insights
# Log group: /datasync/NAS-to-S3
fields @timestamp, Type, Category, ResourceId, ErrorCode, ErrorDetail
| filter Type = "ERROR"
| sort @timestamp desc
| limit 50
8.4 SNS Topic for Alerts
aws sns create-topic --name datasync-alerts
# Subscribe ops email
aws sns subscribe \
--topic-arn arn:aws:sns:us-east-1:123456789012:datasync-alerts \
--protocol email \
--notification-endpoint ops-team@example.com
# Subscribe a Slack webhook via Lambda (standard pattern)
aws sns subscribe \
--topic-arn arn:aws:sns:us-east-1:123456789012:datasync-alerts \
--protocol lambda \
--notification-endpoint arn:aws:lambda:us-east-1:123456789012:function:datasync-slack-notifier
8.5 CloudTrail — API-Level Auditing
Every DataSync API call — CreateTask, StartTaskExecution, DeleteAgent — is logged to CloudTrail automatically if a trail is active in your account. This is your audit trail for compliance: who created which task, when was the last execution started, which IAM principal activated an agent. Filter CloudTrail events with:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventSource,AttributeValue=datasync.amazonaws.com \
--start-time "2026-06-01T00:00:00Z" \
--end-time "2026-06-09T23:59:59Z" \
--query 'Events[*].{Time:EventTime,Name:EventName,User:Username}' \
--output table
LogLevel=TRANSFER can generate gigabytes of logs per day for large transfers. Set a CloudWatch log retention policy (14 or 30 days) to avoid paying indefinitely for transfer logs you'll never query again.
9. DataSync with Direct Connect and VPN — Private Connectivity
By default, the DataSync agent sends data over the public internet to the DataSync service endpoint. For workloads that require private connectivity — either for security policy compliance, consistent latency, or to avoid internet data transfer costs — you can route DataSync traffic through AWS Direct Connect or a Site-to-Site VPN, combined with VPC interface endpoints.
9.1 Architecture: Agent → Direct Connect → VPC Endpoint → DataSync
# Step 1: Create a VPC interface endpoint for DataSync
aws ec2 create-vpc-endpoint \
--vpc-id vpc-0abc12345 \
--service-name com.amazonaws.us-east-1.datasync \
--vpc-endpoint-type Interface \
--subnet-ids subnet-private-az1 subnet-private-az2 \
--security-group-ids sg-datasync-endpoint \
--private-dns-enabled \
--tag-specifications 'ResourceType=vpc-endpoint,Tags=[{Key=Name,Value=datasync-vpce}]'
# Note the endpoint ID and DNS names
aws ec2 describe-vpc-endpoints \
--filters Name=service-name,Values=com.amazonaws.us-east-1.datasync \
--query 'VpcEndpoints[*].{Id:VpcEndpointId,DNS:DnsEntries[0].DnsName}'
9.2 Activate Agent via VPC Endpoint
# When activating an agent that will use private connectivity,
# specify the VPC endpoint and the subnets/SGs the agent will use
aws datasync create-agent \
--activation-key "12345-ABCDE-67890-FGHIJ-KLMNO" \
--agent-name "private-agent-via-dx" \
--vpc-endpoint-id vpce-0abc12345def67890 \
--subnet-arns \
arn:aws:ec2:us-east-1:123456789012:subnet/subnet-private-az1 \
arn:aws:ec2:us-east-1:123456789012:subnet/subnet-private-az2 \
--security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-datasync-agent
9.3 Required Security Group Rules
| Direction | Port | Protocol | Source/Dest | Purpose |
|---|---|---|---|---|
| Outbound | 443 | TCP | VPC Endpoint SG | DataSync control plane |
| Outbound | 443 | TCP | s3.amazonaws.com | S3 destination |
| Outbound | 2049 | TCP/UDP | NFS server IP | NFS source mount |
| Outbound | 445 | TCP | SMB server IP | SMB source mount |
| Inbound | 80 | TCP | Admin workstation | Agent activation UI |
9.4 Terraform — VPC Endpoint for DataSync
resource "aws_vpc_endpoint" "datasync" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.datasync"
vpc_endpoint_type = "Interface"
subnet_ids = [aws_subnet.private_az1.id, aws_subnet.private_az2.id]
security_group_ids = [aws_security_group.datasync_endpoint.id]
private_dns_enabled = true
tags = {
Name = "datasync-vpce"
}
}
10. Cost Optimization — Pricing, Compression, Scheduling, and Comparisons
DataSync pricing is straightforward: you pay $0.0125 per GB of data transferred (as of 2026). There are no per-task, per-agent, or per-execution fees — only the bytes moved. However, the EC2 instance running your agent (if self-managed) and the CloudWatch Logs ingestion do add to the bill. Here is how to minimize total cost.
10.1 Pricing Breakdown
| Component | Cost | Notes |
|---|---|---|
| DataSync data transfer | $0.0125/GB | All data moved, regardless of protocol or direction |
| EC2 agent (m5.2xlarge) | ~$0.38/hr | Only when running; stop agent EC2 between transfers |
| CloudWatch Logs ingestion | $0.50/GB | Use LogLevel=BASIC for high-volume tasks |
| S3 PUT requests | $0.005/1000 requests | DataSync uses multipart upload — fewer requests per GB than S3 sync for large files |
| Direct Connect transfer | $0.02/GB outbound | vs $0.09/GB internet egress from on-prem — DX is 4.5x cheaper |
10.2 Schedule Off-Peak Transfers
# Run nightly at 01:00 UTC to avoid business-hours bandwidth contention
aws datasync update-task \
--task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
--schedule ScheduleExpression="cron(0 1 * * ? *)"
# Weekend full-resync at full bandwidth (Saturday 00:00 UTC)
aws datasync start-task-execution \
--task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
--override-options BytesPerSecond=-1,TransferMode=ALL,VerifyMode=POINT_IN_TIME_CONSISTENT
10.3 DataSync vs S3 Transfer Acceleration
# S3 Transfer Acceleration pricing: $0.04-$0.08/GB extra on top of standard PUT rates
# DataSync pricing: $0.0125/GB flat
# For bulk transfers > 1 TB, DataSync is almost always cheaper AND faster
# Compare: 10 TB transfer cost
# S3 Transfer Acceleration: 10,000 GB × ($0.023 + $0.04) = $630
# DataSync: 10,000 GB × $0.0125 = $125 (+ ~$15 EC2 agent for 40 hrs) = ~$140
# Savings: ~$490 or 78% cheaper with DataSync for large transfers
10.4 Stop Agent EC2 Between Transfers
# Use EventBridge to start agent before task and stop after
# Start agent 15 minutes before scheduled transfer
aws events put-rule \
--name "start-datasync-agent" \
--schedule-expression "cron(45 0 * * ? *)" \
--state ENABLED
# Lambda function to start/stop EC2 agent instance
# (pseudocode — implement as Lambda)
import boto3
ec2 = boto3.client('ec2')
ec2.start_instances(InstanceIds=['i-datasync-agent-001'])
10.5 Avoid Re-Transfer with Correct TransferMode
# Use CHANGED mode (default) for incremental syncs — only transfers new/modified files
# Use ALL mode only for full resync (e.g., after destination corruption)
# Wrong mode choice is the #1 cause of unexpectedly high DataSync bills
aws datasync update-task \
--task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
--options TransferMode=CHANGED,VerifyMode=ONLY_FILES_TRANSFERRED
- Use
TransferMode=CHANGEDfor all recurring sync tasks - Set
BytesPerSecondthrottle during business hours; remove at night - Stop agent EC2 instances when not transferring
- Use Direct Connect instead of internet for transfers > 1 TB/month
- Set CloudWatch log retention to 14–30 days
- Use
LogLevel=BASICfor high-volume tasks;TRANSFERonly for debugging - Target
STANDARD_IAorINTELLIGENT_TIERINGS3 storage class for non-hot data