AWS ECS: Container Orchestration with Fargate and EC2 (2026)
Amazon Elastic Container Service (ECS) is AWS's native container orchestrator. Unlike Kubernetes, ECS is a fully proprietary AWS service — which means less operational complexity but also less portability. For teams that live inside the AWS ecosystem and don't need Kubernetes-specific features, ECS is often the faster path to production. This guide covers everything from writing your first task definition to zero-downtime blue/green deployments.
Table of Contents
ECS vs EKS: When to Choose Which
This is one of the most common architectural decisions for AWS-hosted container workloads. Neither is universally better — the right answer depends on your team's expertise and your operational goals.
| Factor | ECS | EKS |
|---|---|---|
| Learning curve | Low — AWS-native concepts | High — full Kubernetes API |
| Operational overhead | Low (Fargate: near zero) | Medium (control plane managed, but add-ons, IRSA, upgrades) |
| Ecosystem / tooling | AWS-only | CNCF ecosystem (Helm, Istio, ArgoCD, etc.) |
| Multi-cloud portability | None | High |
| Control plane cost | Free | $0.10/hour (~$73/month) |
| Scheduling flexibility | Good | Excellent (node selectors, affinity, taints) |
| Best for | AWS-first teams, simple microservices | Large platforms, Kubernetes expertise |
Task Definitions
A task definition is a JSON blueprint that describes one or more containers — their image, CPU/memory allocation, environment variables, port mappings, logging config, and IAM roles.
{
"family": "my-api",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/myAppTaskRole",
"containerDefinitions": [
{
"name": "api",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-api:latest",
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "ENV", "value": "production"},
{"name": "PORT", "value": "8080"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
}
}
]
}
Register it with the CLI:
aws ecs register-task-definition \
--cli-input-json file://task-def.json
secrets field (backed by Secrets Manager or SSM Parameter Store) rather than putting credentials in environment. The executionRoleArn needs secretsmanager:GetSecretValue permission to inject secrets at startup.
ECS Services
An ECS Service keeps a specified number of task copies running, integrates with a load balancer, and handles rolling deploys. Think of it as the ECS equivalent of a Kubernetes Deployment + Service.
aws ecs create-service \
--cluster prod-cluster \
--service-name my-api \
--task-definition my-api:3 \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-abc123,subnet-def456],
securityGroups=[sg-xyz789],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=api,containerPort=8080" \
--deployment-configuration "minimumHealthyPercent=100,maximumPercent=200"
For infrastructure-as-code, use a CloudFormation or CDK definition. See the CDK guide for CDK patterns.
Fargate vs EC2 Launch Type
The launch type determines who manages the underlying infrastructure for your containers.
| Feature | Fargate | EC2 |
|---|---|---|
| Infrastructure management | AWS manages it | You manage EC2 instances |
| Billing granularity | Per-task (vCPU + GB) | Per EC2 instance |
| Cost at scale | Higher per-unit cost | Lower with reserved instances |
| Startup time | ~30-90 seconds | Faster (instance already running) |
| GPU workloads | Not supported | Supported (p3, g4dn instances) |
| Custom AMI | Not possible | Fully supported |
| DaemonSet equivalent | Not available | Daemon scheduling supported |
Use Fargate for: microservices without GPU needs, variable-load workloads, dev/staging environments. Use EC2 for: GPU workloads, high-throughput services where per-unit Fargate cost is prohibitive, or workloads requiring host-level tuning.
Capacity Providers
Capacity providers decouple where tasks run from the service definition. You define capacity provider strategies on the cluster and services pick them up automatically.
# Create a capacity provider backed by an Auto Scaling Group
aws ecs create-capacity-provider \
--name my-asg-cp \
--auto-scaling-group-provider "autoScalingGroupArn=arn:aws:autoscaling:...,
managedScaling={status=ENABLED,targetCapacity=80},
managedTerminationProtection=ENABLED"
# Attach to cluster with a strategy: prefer Fargate Spot, fall back to Fargate
aws ecs put-cluster-capacity-providers \
--cluster prod-cluster \
--capacity-providers FARGATE FARGATE_SPOT \
--default-capacity-provider-strategy \
capacityProvider=FARGATE_SPOT,weight=4,base=0 \
capacityProvider=FARGATE,weight=1,base=1
The strategy above runs 80% of tasks on Fargate Spot (up to 70% cheaper) with Fargate as a fallback. The base=1 ensures at least one task always runs on regular Fargate for stability.
Service Discovery with Cloud Map
ECS integrates with AWS Cloud Map for DNS-based service discovery. Each ECS task registers itself with a Cloud Map namespace so other services can reach it by DNS name instead of hardcoded IPs.
# Create a private DNS namespace
aws servicediscovery create-private-dns-namespace \
--name prod.local \
--vpc vpc-abc123
# Create a service in that namespace
aws servicediscovery create-service \
--name my-api \
--dns-config "NamespaceId=ns-abc123,DnsRecords=[{Type=A,TTL=10}]" \
--health-check-custom-config FailureThreshold=1
Then reference the service discovery service ARN in your ECS service definition. Tasks become reachable at my-api.prod.local from within the VPC. TTL of 10 seconds means DNS updates propagate quickly when tasks restart.
ECS Exec for Debugging
ECS Exec uses AWS Systems Manager Session Manager to open an interactive shell inside a running container — no SSH, no bastion host needed.
# Enable ECS Exec on the service (requires re-deploy)
aws ecs update-service \
--cluster prod-cluster \
--service my-api \
--enable-execute-command
# Open a shell in a running task
aws ecs execute-command \
--cluster prod-cluster \
--task arn:aws:ecs:us-east-1:123456789012:task/abc123 \
--container api \
--interactive \
--command "/bin/sh"
The task role needs these permissions:
{
"Effect": "Allow",
"Action": [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
],
"Resource": "*"
}
ecs:ExecuteCommand can shell into your containers. Restrict it via IAM conditions or enable it only via a break-glass process.
Blue/Green Deployments with CodeDeploy
ECS integrates with CodeDeploy for blue/green deployments: the new version ("green") runs alongside the old version ("blue"), traffic shifts gradually, and rollback is instant if health checks fail.
# appspec.yaml for CodeDeploy ECS blue/green
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:us-east-1:123456789012:task-definition/my-api:NEW_REVISION"
LoadBalancerInfo:
ContainerName: "api"
ContainerPort: 8080
PlatformVersion: "LATEST"
Hooks:
- BeforeAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:PreTrafficHook"
- AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:PostTrafficHook"
Configure the deployment group with a traffic shifting strategy:
aws deploy create-deployment-group \
--application-name my-api-app \
--deployment-group-name my-api-dg \
--deployment-config-name CodeDeployDefault.ECSCanary10Percent5Minutes \
--ecs-services clusterName=prod-cluster,serviceName=my-api \
--load-balancer-info "targetGroupPairInfoList=[{
targetGroups=[{name=my-api-tg-blue},{name=my-api-tg-green}],
prodTrafficRoute={listenerArns=[arn:aws:elasticloadbalancing:...]}
}]"
ECSCanary10Percent5Minutes shifts 10% of traffic to green, waits 5 minutes, then shifts 100% if health checks pass. If the Lambda hooks or health checks fail, CodeDeploy automatically rolls back to blue.
Logging to CloudWatch
The awslogs driver is the simplest way to get container logs into CloudWatch Logs. Use awsfirelens (backed by Fluent Bit) for advanced log routing to multiple destinations.
# Fluent Bit sidecar for multi-destination logging
{
"name": "log-router",
"image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:latest",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "true"
}
}
},
{
"name": "api",
"image": "...",
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"Name": "cloudwatch_logs",
"region": "us-east-1",
"log_group_name": "/ecs/my-api",
"log_stream_prefix": "from-firelens-",
"auto_create_group": "true"
}
}
}
ALB Integration
ECS services register task IPs directly with ALB target groups (IP target type). When a task is replaced during a deploy, ECS deregisters the old task IP and registers the new one — the ALB connection draining period ensures in-flight requests complete.
# Create target group (IP type for Fargate/awsvpc)
aws elbv2 create-target-group \
--name my-api-tg \
--protocol HTTP \
--port 8080 \
--vpc-id vpc-abc123 \
--target-type ip \
--health-check-path /health \
--health-check-interval-seconds 15 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--deregistration-delay-attributes deregistration_delay.timeout_seconds=30
Set deregistration-delay to match your application's maximum request duration. For APIs with short requests, 30 seconds is plenty. For file uploads or long-running requests, increase it accordingly.
Frequently Asked Questions
Use the secrets field in the task definition, pointing to AWS Secrets Manager ARNs or SSM Parameter Store paths. The ECS agent fetches the secret value at task startup and injects it as an environment variable. The executionRoleArn must have secretsmanager:GetSecretValue or ssm:GetParameters permission.
In the same VPC, tasks on awsvpc network mode get ENIs and can communicate directly by IP. For reliable service-to-service communication, use AWS Cloud Map (DNS discovery) or an internal ALB. Avoid hardcoding task IPs since they change with every deployment.
The execution role is used by the ECS agent to pull images from ECR and fetch secrets. The task role is assumed by the application code running inside the container to access AWS services (S3, DynamoDB, etc.). Always use separate roles with least-privilege policies.
Yes. Use EventBridge (CloudWatch Events) rules with ECS as the target. Set a cron expression and the rule will trigger a task run at the specified time. This replaces the older Scheduled Tasks feature in the ECS console.
Use Application Auto Scaling with ECS as the scalable target. Scale on CPU utilization, memory utilization, or custom CloudWatch metrics (e.g., SQS queue depth). Step scaling and target tracking policies are both supported. See the SQS guide for queue-based scaling patterns.