AWS DataSync: Automated Data Transfer Between On-Premises and AWS

AWS DataSync — Automated Data Transfer

Published: June 9, 2026 • 18 min read

Migrating terabytes of file system data from an on-premises NAS to Amazon S3 using rsync can take weeks and consumes a significant portion of your WAN bandwidth. AWS DataSync changes that equation by delivering transfer speeds up to 10x faster than open-source tools, built-in encryption, automatic retry, checksum verification, and a scheduling engine — all managed from a single console or CLI. This guide covers every component of DataSync from agent deployment to cost optimization, with production-ready CLI commands and Terraform HCL throughout.

1. DataSync vs Storage Gateway vs Transfer Family vs S3 sync — Decision Table

AWS offers several managed services for moving data into and out of the cloud, and choosing the wrong one leads to unnecessary cost, complexity, or performance bottlenecks. Understanding the distinct use case for each service is the most important design decision before you write a single line of infrastructure code.

Service Primary Use Case Protocol Agent Required Ongoing vs One-Shot DataSync Wins When...
AWS DataSync Bulk file/object migration & scheduled sync NFS, SMB, HDFS, S3-compatible, EFS, FSx Yes (on-prem); No (AWS-to-AWS) Both Speed, automation, and verification matter
AWS Storage Gateway Hybrid storage — on-prem apps access S3/EFS via NFS/SMB NFS, SMB, iSCSI Yes (VM or hardware) Ongoing On-prem apps need cloud storage without code changes
AWS Transfer Family Managed SFTP/FTPS/FTP endpoint in front of S3 or EFS SFTP, FTPS, FTP, AS2 No Ongoing External partners need SFTP access to S3
aws s3 sync (CLI) Ad-hoc S3-to-S3 or local-to-S3 sync HTTPS (S3 API) No One-shot / scripted Small datasets, simple scripts, no scheduling needed
AWS Snowball Edge Offline bulk migration where WAN is unavailable or too slow Physical device No One-shot Data > 10 TB and WAN cost/time is prohibitive

DataSync wins when you need to move structured file system data (NFS shares, SMB volumes, HDFS) at high speed with integrity verification, and when you need that process to run repeatedly on a schedule — for example nightly incremental syncs from an on-premises data warehouse landing zone to S3 for downstream EMR/Glue processing. If all you need is to expose a cloud bucket to legacy apps over NFS, Storage Gateway is the correct choice. If external trading partners need SFTP, use Transfer Family.

DataSync vs rsync benchmark: In AWS internal benchmarks on a 10 Gbps link, DataSync transferred 100 TB in approximately 6 hours. The equivalent rsync job took over 60 hours — a 10x difference driven by DataSync's multi-threaded parallel transfer engine, connection pooling, and automatic flow control.

2. How DataSync Works — Architecture Deep Dive

DataSync is built around four core concepts: agents, locations, tasks, and task executions. Understanding how these relate to each other is essential before you start configuring anything in the console or CLI.

  • Agent — a VM or container you deploy in your on-premises environment (or an EC2 instance for AWS-to-AWS). The agent mounts source file systems, reads data, and sends it to AWS over TLS 1.2. For AWS-to-AWS transfers the agent is not required — DataSync operates entirely within the AWS backbone.
  • Location — a pointer to a data store: an NFS export, an SMB share, an HDFS cluster, an S3 bucket, an EFS file system, or an FSx volume. Locations are reusable — one location can be the source for many tasks.
  • Task — a pairing of a source location and a destination location, plus configuration: include/exclude filters, verify mode, bandwidth throttle, POSIX permission handling, and schedule. A task defines what to transfer and how.
  • Task Execution — a single run of a task. Each execution goes through five phases: LAUNCHING → PREPARING (directory listing) → TRANSFERRING → VERIFYING → SUCCESS or ERROR. CloudWatch emits metrics at each phase transition.

During the PREPARING phase, DataSync builds a tree of all source files and compares modification times, sizes, and checksums against the destination. Only files that differ are transferred. This incremental approach means subsequent executions of a nightly sync job transfer only the delta, not the full dataset — dramatically reducing bandwidth consumption and transfer time after the initial load.

Internally, DataSync parallelizes across multiple TCP connections per agent and can use multiple agents for a single task (multi-agent mode) to saturate even a 10 Gbps Direct Connect link. The service handles congestion control, error recovery, and automatic retry transparently, so you never need to write retry logic in a shell script.

Bandwidth throttling is configurable per task execution. You can cap DataSync to 100 Mbps during business hours and allow full line rate overnight. This is critical when the agent shares the same WAN link as production traffic.

3. Agent Deployment — On-Premises VM and EC2

The DataSync agent is distributed as a virtual machine image for VMware ESXi, Microsoft Hyper-V, and Linux KVM. For AWS-to-AWS transfers you can launch the agent as an EC2 instance from the official AMI. The agent requires outbound HTTPS (port 443) to the DataSync service endpoints in your target region, plus access to the source file system protocols (NFS port 2049, SMB port 445, etc.).

3.1 Deploy Agent on VMware ESXi

# 1. Download the DataSync agent OVA from the AWS console:
#    Services → DataSync → Agents → Create agent → Download OVA

# 2. Deploy via VMware OVF Tool
ovftool \
  --name=datasync-agent-prod \
  --datastore=SSD-Datastore \
  --network="VM Network" \
  --diskMode=thin \
  aws-datasync-agent.ova \
  vi://administrator@vcenter.example.com/Datacenter/host/Cluster

# 3. Power on the VM and note the IP address from the VMware console
# The agent exposes a local activation UI on port 80

# 4. Activate the agent via CLI (replace ACTIVATION_KEY with the 5-part key
#    displayed on the agent's local web UI, e.g., 12345-ABCDE-67890-FGHIJ-KLMNO)
aws datasync create-agent \
  --activation-key "12345-ABCDE-67890-FGHIJ-KLMNO" \
  --agent-name "prod-onprem-agent-01" \
  --tags Key=Environment,Value=Production \
         Key=Location,Value=MysoreDatacenter \
  --region us-east-1

3.2 Deploy Agent on EC2 (AWS-to-AWS or co-located transfers)

# Launch the DataSync agent AMI in your VPC
# Find the current AMI ID for your region:
aws ssm get-parameter \
  --name "/aws/service/datasync/ami" \
  --region us-east-1 \
  --query "Parameter.Value" \
  --output text

# Launch EC2 agent instance (m5.2xlarge recommended for high throughput)
aws ec2 run-instances \
  --image-id ami-0abc123def456789  \
  --instance-type m5.2xlarge \
  --subnet-id subnet-0abc12345 \
  --security-group-ids sg-0abc12345 \
  --iam-instance-profile Name=DataSyncAgentRole \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=datasync-ec2-agent}]' \
  --count 1

# Activate as above using the IP or DNS of the EC2 instance
aws datasync create-agent \
  --activation-key "AAAAA-BBBBB-CCCCC-DDDDD-EEEEE" \
  --agent-name "ec2-agent-us-east-1" \
  --vpc-endpoint-id vpce-0abc12345def67890 \
  --subnet-arns arn:aws:ec2:us-east-1:123456789012:subnet/subnet-0abc12345 \
  --security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-0abc12345 \
  --region us-east-1
Agent sizing guide: A single DataSync agent VM can sustain approximately 1 Gbps throughput. For 10 Gbps Direct Connect links, deploy 10 agents and assign them all to a single task — DataSync distributes the workload automatically. Allocate at least 4 vCPUs and 32 GB RAM per agent for maximum performance.

3.3 Terraform — Agent Resource

resource "aws_datasync_agent" "onprem_agent" {
  name           = "prod-onprem-agent-01"
  activation_key = var.datasync_activation_key   # from agent local UI

  tags = {
    Environment = "Production"
    Location    = "MysoreDatacenter"
  }
}

4. Source Locations — NFS, SMB, HDFS, Object Storage

A DataSync location is created once and reused across multiple tasks. Source locations point DataSync at the data you want to transfer. The agent must have network access to the source system using the appropriate protocol and credentials.

4.1 NFS Source Location

aws datasync create-location-nfs \
  --server-hostname "nas01.corp.example.com" \
  --subdirectory "/exports/analytics-data" \
  --on-prem-config AgentArns=arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123 \
  --mount-options Version=NFS4_1 \
  --tags Key=Source,Value=NAS01

4.2 SMB Source Location

aws datasync create-location-smb \
  --server-hostname "fileserver01.corp.example.com" \
  --subdirectory "/DataShare/Projects" \
  --user "datasync-svc" \
  --password "$(aws secretsmanager get-secret-value --secret-id datasync/smb-password --query SecretString --output text)" \
  --domain "CORP" \
  --agent-arns arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123 \
  --mount-options Version=SMB3

4.3 HDFS Source Location

aws datasync create-location-hdfs \
  --name-nodes "[{\"Hostname\":\"namenode01.corp.example.com\",\"Port\":8020}]" \
  --subdirectory "/user/hive/warehouse/transactions" \
  --replication-factor 3 \
  --authentication-type KERBEROS \
  --kerberos-principal "datasync/datasync-agent@CORP.EXAMPLE.COM" \
  --kerberos-keytab fileb://datasync.keytab \
  --kerberos-krb5-conf fileb:///etc/krb5.conf \
  --agent-arns arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123

4.4 Terraform — NFS Location

resource "aws_datasync_location_nfs" "nas_analytics" {
  server_hostname = "nas01.corp.example.com"
  subdirectory    = "/exports/analytics-data"

  on_prem_config {
    agent_arns = [aws_datasync_agent.onprem_agent.arn]
  }

  mount_options {
    version = "NFS4_1"
  }

  tags = {
    Source = "NAS01"
  }
}
NFS mount options: Always prefer NFS4_1 when your server supports it — it provides better parallelism via session trunking. Fall back to NFS4_0 or AUTOMATIC only if your NAS firmware requires it.

5. Destination Locations — S3, EFS, FSx for Windows/Lustre/OpenZFS

DataSync supports five destination storage services. Each has different performance characteristics, use cases, and IAM requirements. The destination location IAM role must grant DataSync permission to write objects or files into the target storage.

5.1 S3 Destination

# Create IAM role for DataSync S3 access
aws iam create-role \
  --role-name DataSyncS3Role \
  --assume-role-policy-document '{
    "Version":"2012-10-17",
    "Statement":[{
      "Effect":"Allow",
      "Principal":{"Service":"datasync.amazonaws.com"},
      "Action":"sts:AssumeRole"
    }]
  }'

aws iam attach-role-policy \
  --role-name DataSyncS3Role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

# Create S3 destination location
aws datasync create-location-s3 \
  --s3-bucket-arn arn:aws:s3:::my-analytics-landing-bucket \
  --subdirectory /incoming/nas-data \
  --s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
  --s3-storage-class STANDARD_IA \
  --tags Key=Destination,Value=AnalyticsS3

5.2 EFS Destination

aws datasync create-location-efs \
  --efs-filesystem-arn arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-abc12345 \
  --ec2-config '{
    "SubnetArn":"arn:aws:ec2:us-east-1:123456789012:subnet/subnet-0abc12345",
    "SecurityGroupArns":["arn:aws:ec2:us-east-1:123456789012:security-group/sg-0abc12345"]
  }' \
  --subdirectory /shared/projects \
  --in-transit-encryption TLS1_2

5.3 FSx for Windows File Server Destination

aws datasync create-location-fsx-windows \
  --fsx-filesystem-arn arn:aws:fsx:us-east-1:123456789012:file-system/fs-0abc12345 \
  --user "fsx-admin" \
  --password "$(aws secretsmanager get-secret-value --secret-id datasync/fsx-password --query SecretString --output text)" \
  --domain "corp.example.com" \
  --security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-0abc12345 \
  --subdirectory /DataShare

5.4 Terraform — S3 Destination

resource "aws_datasync_location_s3" "analytics_landing" {
  s3_bucket_arn = aws_s3_bucket.analytics_landing.arn
  subdirectory  = "/incoming/nas-data"
  s3_storage_class = "STANDARD_IA"

  s3_config {
    bucket_access_role_arn = aws_iam_role.datasync_s3.arn
  }

  tags = {
    Destination = "AnalyticsS3"
  }
}
Storage class selection at destination: If you are landing archive data that won't be accessed for weeks, use GLACIER_INSTANT_RETRIEVAL. For analytics pipelines accessed within hours, STANDARD_IA balances cost and retrieval speed. DataSync sets the storage class per object at write time — you don't need a lifecycle rule to handle it.

6. Task Configuration — Filters, Verify Mode, POSIX Permissions, Scheduling

The DataSync task is where the real power lives. Beyond simply pointing source at destination, you can filter which files are transferred, control how checksums are verified, preserve POSIX metadata, throttle bandwidth, and set a recurring schedule — all without writing a single cron job or shell script.

6.1 Create a Task with Full Configuration

aws datasync create-task \
  --name "NAS-to-S3-NightlySync" \
  --source-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-abc123 \
  --destination-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-def456 \
  --cloud-watch-log-group-arn arn:aws:logs:us-east-1:123456789012:log-group:/datasync/NAS-to-S3 \
  --options '{
    "VerifyMode": "ONLY_FILES_TRANSFERRED",
    "Atime": "BEST_EFFORT",
    "Mtime": "PRESERVE",
    "Uid": "INT_VALUE",
    "Gid": "INT_VALUE",
    "PreserveDeletedFiles": "REMOVE",
    "PreserveDevices": "NONE",
    "PosixPermissions": "PRESERVE",
    "BytesPerSecond": 104857600,
    "TaskQueueing": "ENABLED",
    "LogLevel": "TRANSFER",
    "TransferMode": "CHANGED",
    "SecurityDescriptorCopyFlags": "NONE",
    "ObjectTags": "PRESERVE"
  }' \
  --excludes '[
    {"FilterType":"SIMPLE_PATTERN","Value":"*.tmp"},
    {"FilterType":"SIMPLE_PATTERN","Value":"*.log"},
    {"FilterType":"SIMPLE_PATTERN","Value":".Trash-*"}
  ]' \
  --includes '[
    {"FilterType":"SIMPLE_PATTERN","Value":"/analytics/**"},
    {"FilterType":"SIMPLE_PATTERN","Value":"/reports/**"}
  ]' \
  --schedule ScheduleExpression="cron(0 1 * * ? *)"

6.2 Key Option Explanations

OptionRecommended ValueNotes
VerifyModeONLY_FILES_TRANSFERREDVerifies checksum only for transferred files — fast. Use POINT_IN_TIME_CONSISTENT for full audits.
TransferModeCHANGEDOnly transfer files that differ. Use ALL to force full resync.
PreserveDeletedFilesREMOVEDelete destination files that were removed from source — true sync semantics.
PosixPermissionsPRESERVECritical when destination is EFS and POSIX ACLs must match source NFS export.
BytesPerSecond104857600 (100 MB/s)-1 = unlimited. Always set a daytime cap when sharing WAN with production.
LogLevelTRANSFERLogs every transferred file to CloudWatch. Use BASIC for high-volume tasks to control CWL costs.

6.3 Override Options at Execution Time

# Run a task immediately with a bandwidth override (e.g., allow full speed on a weekend)
aws datasync start-task-execution \
  --task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
  --override-options BytesPerSecond=-1,VerifyMode=POINT_IN_TIME_CONSISTENT
Schedule syntax: DataSync uses standard AWS EventBridge cron format. cron(0 1 * * ? *) runs at 01:00 UTC daily. cron(0 2 ? * MON-FRI *) runs at 02:00 UTC on weekdays only. Scheduling is built in — no Lambda or EventBridge rule required.

7. AWS-to-AWS Transfers — No Agent Required

When both source and destination are AWS storage services, DataSync operates entirely within the AWS backbone — no agent VM is needed, and data never traverses the public internet. This makes DataSync the fastest and most cost-effective option for S3 cross-region replication that requires filtering, verification, or non-S3 destinations, and for EFS-to-S3 or FSx-to-S3 data lake ingestion pipelines.

7.1 S3 Cross-Region Copy with Filtering

# Source: S3 bucket in us-east-1
aws datasync create-location-s3 \
  --s3-bucket-arn arn:aws:s3:::prod-data-us-east-1 \
  --subdirectory /raw \
  --s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
  --region us-east-1

# Destination: S3 bucket in ap-south-1
aws datasync create-location-s3 \
  --s3-bucket-arn arn:aws:s3:::dr-data-ap-south-1 \
  --subdirectory /raw \
  --s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
  --region us-east-1   # Task and locations are managed from source region

# Create cross-region replication task
aws datasync create-task \
  --name "S3-CrossRegion-DR-Sync" \
  --source-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-src123 \
  --destination-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-dst456 \
  --options TransferMode=CHANGED,VerifyMode=ONLY_FILES_TRANSFERRED \
  --schedule ScheduleExpression="cron(0 */6 * * ? *)"

7.2 EFS to S3 Data Lake Ingestion

# Source: EFS file system
aws datasync create-location-efs \
  --efs-filesystem-arn arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-app01 \
  --ec2-config SubnetArn=arn:aws:ec2:us-east-1:123456789012:subnet/subnet-private,SecurityGroupArns=arn:aws:ec2:us-east-1:123456789012:security-group/sg-datasync \
  --subdirectory /app/exports

# Destination: S3 data lake
aws datasync create-location-s3 \
  --s3-bucket-arn arn:aws:s3:::data-lake-prod \
  --subdirectory /efs-exports/app01 \
  --s3-config BucketAccessRoleArn=arn:aws:iam::123456789012:role/DataSyncS3Role \
  --s3-storage-class INTELLIGENT_TIERING

7.3 Cross-Account Transfer

# In the destination account, add a bucket policy allowing the source account's DataSync role
aws s3api put-bucket-policy \
  --bucket cross-account-destination-bucket \
  --policy '{
    "Version":"2012-10-17",
    "Statement":[{
      "Effect":"Allow",
      "Principal":{"AWS":"arn:aws:iam::SOURCE_ACCOUNT_ID:role/DataSyncS3Role"},
      "Action":["s3:GetBucketLocation","s3:ListBucket","s3:ListBucketMultipartUploads",
                "s3:PutObject","s3:AbortMultipartUpload","s3:DeleteObject"],
      "Resource":["arn:aws:s3:::cross-account-destination-bucket",
                  "arn:aws:s3:::cross-account-destination-bucket/*"]
    }]
  }'
AWS-to-AWS advantage: No egress charges apply for DataSync transfers between AWS services within the same region. Cross-region transfers incur standard inter-region data transfer fees, but these are the same rates as S3 replication — DataSync adds no markup.

8. Monitoring with CloudWatch Metrics, CloudTrail, and SNS Alerts

DataSync publishes rich telemetry to CloudWatch automatically — no additional configuration is required to start seeing metrics. The key metrics fall into two categories: throughput/volume metrics (how much data moved) and execution outcome metrics (did the task succeed or fail).

8.1 Key CloudWatch Metrics

MetricNamespaceDescription
BytesTransferredAWS/DataSyncTotal bytes moved during a task execution
FilesTransferredAWS/DataSyncNumber of files successfully transferred
BytesVerifiedAWS/DataSyncBytes verified by checksum in VERIFY phase
FilesVerifiedAWS/DataSyncFiles that passed checksum verification
FilesDeletedAWS/DataSyncFiles removed from destination (PreserveDeletedFiles=REMOVE)
TaskExecutionResultCodeAWS/DataSync0 = SUCCESS, non-zero = ERROR

8.2 CloudWatch Alarm on Task Failure

aws cloudwatch put-metric-alarm \
  --alarm-name "DataSync-Task-Failure" \
  --alarm-description "Alert when DataSync NAS-to-S3 task fails" \
  --namespace "AWS/DataSync" \
  --metric-name "TaskExecutionResultCode" \
  --dimensions Name=TaskId,Value=task-abc12345 \
  --statistic Maximum \
  --period 300 \
  --evaluation-periods 1 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --treat-missing-data notBreaching \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:datasync-alerts \
  --ok-actions arn:aws:sns:us-east-1:123456789012:datasync-alerts

8.3 CloudWatch Dashboard Query (Insights)

# Query DataSync transfer logs in CloudWatch Logs Insights
# Log group: /datasync/NAS-to-S3

fields @timestamp, Type, Category, ResourceId, ErrorCode, ErrorDetail
| filter Type = "ERROR"
| sort @timestamp desc
| limit 50

8.4 SNS Topic for Alerts

aws sns create-topic --name datasync-alerts

# Subscribe ops email
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:datasync-alerts \
  --protocol email \
  --notification-endpoint ops-team@example.com

# Subscribe a Slack webhook via Lambda (standard pattern)
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:datasync-alerts \
  --protocol lambda \
  --notification-endpoint arn:aws:lambda:us-east-1:123456789012:function:datasync-slack-notifier

8.5 CloudTrail — API-Level Auditing

Every DataSync API call — CreateTask, StartTaskExecution, DeleteAgent — is logged to CloudTrail automatically if a trail is active in your account. This is your audit trail for compliance: who created which task, when was the last execution started, which IAM principal activated an agent. Filter CloudTrail events with:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventSource,AttributeValue=datasync.amazonaws.com \
  --start-time "2026-06-01T00:00:00Z" \
  --end-time "2026-06-09T23:59:59Z" \
  --query 'Events[*].{Time:EventTime,Name:EventName,User:Username}' \
  --output table
Cost tip: DataSync CloudWatch log ingestion at LogLevel=TRANSFER can generate gigabytes of logs per day for large transfers. Set a CloudWatch log retention policy (14 or 30 days) to avoid paying indefinitely for transfer logs you'll never query again.

9. DataSync with Direct Connect and VPN — Private Connectivity

By default, the DataSync agent sends data over the public internet to the DataSync service endpoint. For workloads that require private connectivity — either for security policy compliance, consistent latency, or to avoid internet data transfer costs — you can route DataSync traffic through AWS Direct Connect or a Site-to-Site VPN, combined with VPC interface endpoints.

9.1 Architecture: Agent → Direct Connect → VPC Endpoint → DataSync

# Step 1: Create a VPC interface endpoint for DataSync
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc12345 \
  --service-name com.amazonaws.us-east-1.datasync \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-private-az1 subnet-private-az2 \
  --security-group-ids sg-datasync-endpoint \
  --private-dns-enabled \
  --tag-specifications 'ResourceType=vpc-endpoint,Tags=[{Key=Name,Value=datasync-vpce}]'

# Note the endpoint ID and DNS names
aws ec2 describe-vpc-endpoints \
  --filters Name=service-name,Values=com.amazonaws.us-east-1.datasync \
  --query 'VpcEndpoints[*].{Id:VpcEndpointId,DNS:DnsEntries[0].DnsName}'

9.2 Activate Agent via VPC Endpoint

# When activating an agent that will use private connectivity,
# specify the VPC endpoint and the subnets/SGs the agent will use
aws datasync create-agent \
  --activation-key "12345-ABCDE-67890-FGHIJ-KLMNO" \
  --agent-name "private-agent-via-dx" \
  --vpc-endpoint-id vpce-0abc12345def67890 \
  --subnet-arns \
    arn:aws:ec2:us-east-1:123456789012:subnet/subnet-private-az1 \
    arn:aws:ec2:us-east-1:123456789012:subnet/subnet-private-az2 \
  --security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-datasync-agent

9.3 Required Security Group Rules

DirectionPortProtocolSource/DestPurpose
Outbound443TCPVPC Endpoint SGDataSync control plane
Outbound443TCPs3.amazonaws.comS3 destination
Outbound2049TCP/UDPNFS server IPNFS source mount
Outbound445TCPSMB server IPSMB source mount
Inbound80TCPAdmin workstationAgent activation UI
Direct Connect cost benefit: Data transferred from on-premises to AWS via Direct Connect is charged at Direct Connect data transfer rates (typically $0.02/GB outbound from on-premises), which are significantly cheaper than internet data transfer when you have committed Direct Connect capacity. DataSync itself charges $0.0125/GB regardless of the network path — the savings come from the underlying data transfer cost.

9.4 Terraform — VPC Endpoint for DataSync

resource "aws_vpc_endpoint" "datasync" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.datasync"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = [aws_subnet.private_az1.id, aws_subnet.private_az2.id]
  security_group_ids  = [aws_security_group.datasync_endpoint.id]
  private_dns_enabled = true

  tags = {
    Name = "datasync-vpce"
  }
}

10. Cost Optimization — Pricing, Compression, Scheduling, and Comparisons

DataSync pricing is straightforward: you pay $0.0125 per GB of data transferred (as of 2026). There are no per-task, per-agent, or per-execution fees — only the bytes moved. However, the EC2 instance running your agent (if self-managed) and the CloudWatch Logs ingestion do add to the bill. Here is how to minimize total cost.

10.1 Pricing Breakdown

ComponentCostNotes
DataSync data transfer$0.0125/GBAll data moved, regardless of protocol or direction
EC2 agent (m5.2xlarge)~$0.38/hrOnly when running; stop agent EC2 between transfers
CloudWatch Logs ingestion$0.50/GBUse LogLevel=BASIC for high-volume tasks
S3 PUT requests$0.005/1000 requestsDataSync uses multipart upload — fewer requests per GB than S3 sync for large files
Direct Connect transfer$0.02/GB outboundvs $0.09/GB internet egress from on-prem — DX is 4.5x cheaper

10.2 Schedule Off-Peak Transfers

# Run nightly at 01:00 UTC to avoid business-hours bandwidth contention
aws datasync update-task \
  --task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
  --schedule ScheduleExpression="cron(0 1 * * ? *)"

# Weekend full-resync at full bandwidth (Saturday 00:00 UTC)
aws datasync start-task-execution \
  --task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
  --override-options BytesPerSecond=-1,TransferMode=ALL,VerifyMode=POINT_IN_TIME_CONSISTENT

10.3 DataSync vs S3 Transfer Acceleration

# S3 Transfer Acceleration pricing: $0.04-$0.08/GB extra on top of standard PUT rates
# DataSync pricing: $0.0125/GB flat
# For bulk transfers > 1 TB, DataSync is almost always cheaper AND faster

# Compare: 10 TB transfer cost
# S3 Transfer Acceleration: 10,000 GB × ($0.023 + $0.04) = $630
# DataSync: 10,000 GB × $0.0125 = $125 (+ ~$15 EC2 agent for 40 hrs) = ~$140
# Savings: ~$490 or 78% cheaper with DataSync for large transfers

10.4 Stop Agent EC2 Between Transfers

# Use EventBridge to start agent before task and stop after
# Start agent 15 minutes before scheduled transfer
aws events put-rule \
  --name "start-datasync-agent" \
  --schedule-expression "cron(45 0 * * ? *)" \
  --state ENABLED

# Lambda function to start/stop EC2 agent instance
# (pseudocode — implement as Lambda)
import boto3
ec2 = boto3.client('ec2')
ec2.start_instances(InstanceIds=['i-datasync-agent-001'])

10.5 Avoid Re-Transfer with Correct TransferMode

# Use CHANGED mode (default) for incremental syncs — only transfers new/modified files
# Use ALL mode only for full resync (e.g., after destination corruption)
# Wrong mode choice is the #1 cause of unexpectedly high DataSync bills

aws datasync update-task \
  --task-arn arn:aws:datasync:us-east-1:123456789012:task/task-abc12345 \
  --options TransferMode=CHANGED,VerifyMode=ONLY_FILES_TRANSFERRED
Cost optimization checklist:
  • Use TransferMode=CHANGED for all recurring sync tasks
  • Set BytesPerSecond throttle during business hours; remove at night
  • Stop agent EC2 instances when not transferring
  • Use Direct Connect instead of internet for transfers > 1 TB/month
  • Set CloudWatch log retention to 14–30 days
  • Use LogLevel=BASIC for high-volume tasks; TRANSFER only for debugging
  • Target STANDARD_IA or INTELLIGENT_TIERING S3 storage class for non-hot data

Read Next