AWS CodeDeploy: Blue-Green, Canary and Rolling Deployments
Published June 2026 · 18 min read
Shipping code safely to production is the hardest part of software delivery. A bug that slips past staging can take down an entire service in seconds — and rolling it back manually under pressure is painful. AWS CodeDeploy solves this by automating the entire deployment lifecycle with built-in traffic-shifting strategies, lifecycle hooks for health checks, and automatic rollback on alarm breach. Whether you deploy to a fleet of EC2 instances, an ECS service, or a Lambda function, CodeDeploy gives you production-grade deployment semantics without building custom orchestration.
This guide covers every CodeDeploy strategy in depth — in-place, blue/green, canary, and linear — with real appspec.yml files, shell lifecycle hooks, Terraform infrastructure, and CloudWatch rollback configuration. By the end you will be able to design and operate zero-downtime deployments for any AWS compute target.
Table of Contents
- CodeDeploy Concepts — Applications, Deployment Groups, AppSpec
- Deployment Strategies — Comparison Table
- EC2 Deployments — Agent, AppSpec, Lifecycle Hooks
- ECS Blue/Green — ALB Listener Switching with Terraform
- Lambda Canary and Linear — Traffic Hooks and CloudWatch Rollback
- Rollback Configuration — Automatic and Manual
- CodePipeline Integration — Terraform Source→Build→Deploy
- Monitoring — CloudWatch Metrics, SNS, Deployment Events
1. CodeDeploy Concepts — Applications, Deployment Groups, AppSpec
AWS CodeDeploy organises everything under three logical objects: the Application, the Deployment Group, and the Revision. Understanding how they relate is essential before writing a single line of configuration.
Application
An Application is simply a named container that scopes a set of deployments to a compute platform. The platform can be EC2/On-Premises, ECS, or Lambda. You create one Application per deployable unit — typically one per microservice or application stack. The application name appears in every CLI command and CodePipeline action and must be unique within an AWS region.
Deployment Group
A Deployment Group lives inside an Application and defines where and how to deploy. It specifies:
- Target — For EC2: Auto Scaling Group name or tag filter (e.g.,
Name=myapp-prod). For ECS: cluster name + service name. For Lambda: the function name. - Deployment configuration — The traffic-shifting strategy (OneAtATime, HalfAtATime, AllAtOnce, or a custom config).
- Load balancer — Classic ELB, Application Load Balancer (ALB) target group, or Network Load Balancer for health gating.
- Service role — An IAM role that CodeDeploy assumes to interact with EC2, ECS, Lambda, ELB, CloudWatch, and S3.
- Rollback settings — Automatic rollback triggers: deployment failure or CloudWatch alarm breach.
Revision
A Revision is the combination of your application code and its appspec.yml file, packaged as a ZIP in S3 or referenced as a container image tag or Lambda function version. Every create-deployment call points to a specific revision. CodeDeploy fetches it from S3 (EC2), uses the image URI from the ECS task definition (ECS), or uses the Lambda alias/version pointer (Lambda).
appspec.yml
The appspec.yml (Application Specification file) is CodeDeploy's instruction manifest. Its structure varies by compute platform but always defines:
- The deployment target (EC2 file mappings, ECS task definition, Lambda function version)
- Lifecycle event hooks — shell scripts to run at specific points in the deployment lifecycle
- Permissions (EC2 only) — file ownership and ACL settings
# Minimal EC2 appspec.yml structure
version: 0.0
os: linux
files:
- source: / # copy entire revision root
destination: /opt/myapp
permissions:
- object: /opt/myapp
owner: ec2-user
group: ec2-user
mode: "755"
type:
- directory
- file
hooks:
BeforeInstall:
- location: scripts/stop_service.sh
timeout: 60
runas: root
AfterInstall:
- location: scripts/install_dependencies.sh
timeout: 120
runas: ec2-user
ApplicationStart:
- location: scripts/start_service.sh
timeout: 60
runas: root
ValidateService:
- location: scripts/health_check.sh
timeout: 30
runas: ec2-user
appspec.yml must be at the root of your revision ZIP for EC2 deployments. For ECS and Lambda the file is referenced differently — it lives in S3 and points to the task definition or function version, not to source files.
Deployment Configuration
CodeDeploy ships with several built-in deployment configurations. For EC2/On-Premises: CodeDeployDefault.OneAtATime, CodeDeployDefault.HalfAtATime, CodeDeployDefault.AllAtOnce. For Lambda and ECS: CodeDeployDefault.LambdaCanary10Percent5Minutes, CodeDeployDefault.LambdaLinear10PercentEvery1Minute, CodeDeployDefault.LambdaAllAtOnce, and their ECS equivalents. You can also create custom configurations specifying the exact percentage of instances or traffic weight to shift per interval.
2. Deployment Strategies — In-Place, Blue/Green, Canary, Linear
Choosing the right deployment strategy depends on your tolerance for downtime, your ability to run two environments simultaneously, and how quickly you need to detect regressions in production traffic. Here is a definitive comparison:
| Strategy | Compute | Downtime | Rollback Speed | Cost | Best For |
|---|---|---|---|---|---|
| In-Place (Rolling) | EC2 | Partial (per-batch) | Slow (re-deploy old) | Low | Non-critical services, batch workers |
| Blue/Green (EC2) | EC2 | Zero | Fast (switch ASG) | 2x during deploy | Stateless web services |
| Blue/Green (ECS) | ECS + ALB | Zero | Instant (listener rule) | 2x tasks during shift | Containerised microservices |
| Canary | ECS / Lambda | Zero | Instant (shift back) | Minimal extra | High-risk changes, A/B validation |
| Linear | ECS / Lambda | Zero | Instant (shift back) | Minimal extra | Gradual rollout, SLO-guarded releases |
In-Place (Rolling)
CodeDeploy stops the application on a batch of instances, deploys the new revision, runs lifecycle hooks (including health checks), and only advances to the next batch once the current batch is healthy. If any instance in a batch fails its health check, the deployment stops — but already-deployed instances are not automatically reverted. In-place is the simplest strategy but is not appropriate for services that cannot tolerate reduced capacity during deployment.
Blue/Green
In a blue/green deployment CodeDeploy provisions a brand-new set of resources (green) alongside the existing production set (blue). Once the green environment passes health checks, traffic is shifted in a single cut-over. The blue environment is retained for a configurable termination wait time (default: 1 hour for EC2, 0 for ECS) to allow rollback by a simple traffic re-shift. After the wait period, blue resources are terminated. This strategy completely eliminates deployment downtime and makes rollback trivially fast.
Canary
A canary deployment shifts a small initial percentage of traffic (e.g., 10%) to the new version, waits a defined interval while CloudWatch alarms monitor error rates and latency, then shifts the remaining 90% if no alarms fire. If an alarm fires during the wait interval, CodeDeploy automatically rolls back by shifting all traffic back to the original version. Canary is ideal for high-stakes changes where you want to validate production behaviour on real traffic before committing.
Linear
Linear deployments shift traffic in equal increments on a fixed schedule. For example, Linear10PercentEvery1Minute shifts 10% every minute until 100% is served by the new version — completing in 10 minutes. Each interval is monitored by CloudWatch alarms. Linear is a good middle ground between the speed of canary (two-step) and a fully manual staged rollout.
3. EC2 Deployments — Agent, AppSpec, Lifecycle Hooks
Deploying to EC2 requires the CodeDeploy agent running on each instance. The agent polls the CodeDeploy service endpoint, receives deployment instructions, executes lifecycle hooks, and reports results back. Setting this up correctly — including proper IAM instance profile permissions — is the foundation for reliable EC2 deployments.
Installing the CodeDeploy Agent
For Amazon Linux 2 / Amazon Linux 2023:
#!/bin/bash
# install-codedeploy-agent.sh
# Run as root on Amazon Linux 2 / AL2023
yum update -y
yum install -y ruby wget
cd /home/ec2-user
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
wget https://aws-codedeploy-${REGION}.s3.${REGION}.amazonaws.com/latest/install
chmod +x ./install
./install auto
systemctl enable codedeploy-agent
systemctl start codedeploy-agent
systemctl status codedeploy-agent
Add this script to your EC2 launch template or AMI user data so every new instance in your Auto Scaling Group is agent-ready at launch. The instance profile must include AmazonEC2RoleforAWSCodeDeploy (or equivalent inline policy) to allow the agent to pull revisions from S3 and report status.
Complete appspec.yml with Lifecycle Hooks
A production-grade EC2 appspec.yml with all lifecycle events and meaningful shell scripts:
# appspec.yml — production EC2 deployment
version: 0.0
os: linux
files:
- source: /app
destination: /opt/myapp
- source: /conf
destination: /etc/myapp
permissions:
- object: /opt/myapp
owner: myapp
group: myapp
mode: "755"
type: [directory]
- object: /opt/myapp/bin
owner: myapp
group: myapp
mode: "755"
type: [file]
hooks:
BeforeInstall:
- location: scripts/before_install.sh
timeout: 120
runas: root
AfterInstall:
- location: scripts/after_install.sh
timeout: 120
runas: root
ApplicationStart:
- location: scripts/start_app.sh
timeout: 60
runas: root
ApplicationStop:
- location: scripts/stop_app.sh
timeout: 60
runas: root
ValidateService:
- location: scripts/validate.sh
timeout: 60
runas: ec2-user
Lifecycle Hook Shell Scripts
#!/bin/bash
# scripts/before_install.sh
# Runs BEFORE files are copied to the instance
# Use this to: remove old application, create directories, install OS packages
set -e
APP_DIR=/opt/myapp
BACKUP_DIR=/opt/myapp-backup
# Back up current installation
if [ -d "$APP_DIR" ]; then
echo "Backing up current app to $BACKUP_DIR"
rm -rf "$BACKUP_DIR"
cp -a "$APP_DIR" "$BACKUP_DIR"
fi
# Create app user if it doesn't exist
id -u myapp &>/dev/null || useradd -r -s /bin/false myapp
# Install required system packages
yum install -y java-21-amazon-corretto-headless
#!/bin/bash
# scripts/after_install.sh
# Runs AFTER files are copied but BEFORE the app starts
# Use this to: configure, set permissions, render config templates
set -e
APP_DIR=/opt/myapp
# Pull secrets from SSM Parameter Store
DB_HOST=$(aws ssm get-parameter \
--name /myapp/prod/db-host \
--query Parameter.Value \
--output text \
--region us-east-1)
DB_PASS=$(aws ssm get-parameter \
--name /myapp/prod/db-password \
--with-decryption \
--query Parameter.Value \
--output text \
--region us-east-1)
# Write application properties
cat > /etc/myapp/application.properties << EOF
spring.datasource.url=jdbc:mysql://${DB_HOST}:3306/mydb
spring.datasource.password=${DB_PASS}
server.port=8080
EOF
chown myapp:myapp /etc/myapp/application.properties
chmod 640 /etc/myapp/application.properties
#!/bin/bash
# scripts/validate.sh
# Runs AFTER ApplicationStart — must exit 0 for deployment to succeed
set -e
MAX_ATTEMPTS=30
ATTEMPT=0
URL="http://localhost:8080/actuator/health"
echo "Waiting for application health check..."
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL" || echo "000")
if [ "$HTTP_CODE" = "200" ]; then
echo "Health check passed (HTTP 200)"
exit 0
fi
echo "Attempt $((ATTEMPT+1))/$MAX_ATTEMPTS — HTTP $HTTP_CODE — waiting 5s..."
sleep 5
ATTEMPT=$((ATTEMPT+1))
done
echo "Health check FAILED after $MAX_ATTEMPTS attempts"
exit 1
Creating the CodeDeploy Application and Deployment Group (AWS CLI)
# 1. Create application
aws deploy create-application \
--application-name myapp \
--compute-platform Server
# 2. Create deployment group targeting an Auto Scaling Group
aws deploy create-deployment-group \
--application-name myapp \
--deployment-group-name myapp-production \
--service-role-arn arn:aws:iam::123456789012:role/CodeDeployRole \
--auto-scaling-groups myapp-asg-prod \
--deployment-config-name CodeDeployDefault.HalfAtATime \
--load-balancer-info targetGroupInfoList=[{name=myapp-tg}] \
--auto-rollback-configuration enabled=true,events=DEPLOYMENT_FAILURE,DEPLOYMENT_STOP_ON_ALARM \
--alarm-configuration \
enabled=true,alarms=[{name=myapp-5xx-errors},{name=myapp-high-latency}]
# 3. Deploy a revision from S3
aws deploy create-deployment \
--application-name myapp \
--deployment-group-name myapp-production \
--s3-location bucket=myapp-artifacts,key=releases/myapp-1.2.3.zip,bundleType=zip \
--description "Release 1.2.3"
--auto-rollback-configuration when creating the deployment group. Tying rollback to both DEPLOYMENT_FAILURE and CloudWatch alarm breach gives you two independent safety nets — scripted health checks plus production signal monitoring.
4. ECS Blue/Green — ALB Listener Switching with Terraform
ECS blue/green deployments via CodeDeploy are the gold standard for containerised microservices on AWS. CodeDeploy manages the transition from the current task set (blue) to a replacement task set (green) by shifting weight between two ALB target groups. The ECS service, ALB, and CodeDeploy deployment group must all be configured to work together — and Terraform is the cleanest way to express this relationship as code.
ECS Blue/Green Architecture
The key components are:
- ALB with two target groups —
myapp-blue-tg(current production) andmyapp-green-tg(new version). The ALB production listener forwards 100% to blue; during deployment, CodeDeploy gradually shifts weight to green. - ECS service — Created with
deploymentController: type: CODE_DEPLOY. This disables the standard ECS rolling update and hands control to CodeDeploy. - CodeDeploy deployment group — References both target groups and the ECS cluster/service. After the green task set is healthy, CodeDeploy shifts the ALB listener.
- appspec.yml in S3 — Points to the new ECS task definition ARN and the container/port mapping.
Terraform — ALB + Target Groups + ECS Service
# ── ALB Target Groups ───────────────────────────────────────────
resource "aws_lb_target_group" "blue" {
name = "myapp-blue-tg"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip" # required for Fargate
health_check {
path = "/actuator/health"
healthy_threshold = 2
unhealthy_threshold = 3
interval = 15
timeout = 5
matcher = "200"
}
}
resource "aws_lb_target_group" "green" {
name = "myapp-green-tg"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
health_check {
path = "/actuator/health"
healthy_threshold = 2
unhealthy_threshold = 3
interval = 15
timeout = 5
matcher = "200"
}
}
# ── ALB Production Listener (port 443) ──────────────────────────
resource "aws_lb_listener" "production" {
load_balancer_arn = aws_lb.myapp.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.acm_certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.blue.arn
}
lifecycle {
# Prevent Terraform from reverting ALB rule after CodeDeploy shifts traffic
ignore_changes = [default_action]
}
}
# ── ALB Test Listener (port 8080) — CodeDeploy runs pre-traffic hooks here
resource "aws_lb_listener" "test" {
load_balancer_arn = aws_lb.myapp.arn
port = 8080
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.green.arn
}
lifecycle {
ignore_changes = [default_action]
}
}
# ── ECS Service — controlled by CodeDeploy ──────────────────────
resource "aws_ecs_service" "myapp" {
name = "myapp"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.myapp.arn
desired_count = 4
launch_type = "FARGATE"
deployment_controller {
type = "CODE_DEPLOY" # hands deployment control to CodeDeploy
}
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.ecs_tasks.id]
}
load_balancer {
target_group_arn = aws_lb_target_group.blue.arn
container_name = "myapp"
container_port = 8080
}
lifecycle {
ignore_changes = [task_definition, load_balancer]
}
}
# ── CodeDeploy Application ───────────────────────────────────────
resource "aws_codedeploy_app" "myapp" {
name = "myapp-ecs"
compute_platform = "ECS"
}
# ── CodeDeploy Deployment Group ──────────────────────────────────
resource "aws_codedeploy_deployment_group" "production" {
app_name = aws_codedeploy_app.myapp.name
deployment_group_name = "production"
service_role_arn = aws_iam_role.codedeploy.arn
deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"
ecs_service {
cluster_name = aws_ecs_cluster.main.name
service_name = aws_ecs_service.myapp.name
}
load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [aws_lb_listener.production.arn]
}
test_traffic_route {
listener_arns = [aws_lb_listener.test.arn]
}
target_group {
name = aws_lb_target_group.blue.name
}
target_group {
name = aws_lb_target_group.green.name
}
}
}
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
}
alarm_configuration {
alarms = [aws_cloudwatch_metric_alarm.error_rate.name]
enabled = true
}
}
ECS appspec.yml
# appspec.yml — ECS blue/green deployment
# Upload this to S3; reference it in create-deployment call
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:us-east-1:123456789012:task-definition/myapp:42"
LoadBalancerInfo:
ContainerName: "myapp"
ContainerPort: 8080
PlatformVersion: "LATEST"
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- subnet-0abc1234
- subnet-0def5678
SecurityGroups:
- sg-0aabbccdd
AssignPublicIp: "DISABLED"
Hooks:
- BeforeAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-pretraffic-check"
- AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-posttraffic-check"
ignore_changes = [task_definition, load_balancer] on the ECS service and ignore_changes = [default_action] on ALB listeners, Terraform will fight CodeDeploy — every terraform apply will revert the ALB back to blue, undoing CodeDeploy's traffic shift.
5. Lambda Canary and Linear — Traffic Hooks and CloudWatch Alarm Rollback
Lambda deployments with CodeDeploy are elegant: CodeDeploy shifts traffic between two Lambda function versions using a weighted alias. No infrastructure to provision, no instances to swap — just version pointers. Combined with pre/post traffic Lambda hooks and CloudWatch alarm rollback, you get production-grade canary deploys in minutes.
How Lambda Traffic Shifting Works
A Lambda alias can point to two versions with weighted routing. For example, alias live can route 90% to version 5 and 10% to version 6. CodeDeploy manages this weight automatically — you publish a new version, tell CodeDeploy, and it handles the alias weight progression according to your chosen configuration (canary or linear). If an alarm fires, it resets the alias to 100% on the old version instantly.
Lambda appspec.yml
# appspec.yml — Lambda canary deployment
version: 0.0
Resources:
- MyLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Name: "myapp-api"
Alias: "live"
CurrentVersion: "arn:aws:lambda:us-east-1:123456789012:function:myapp-api:5"
TargetVersion: "arn:aws:lambda:us-east-1:123456789012:function:myapp-api:6"
Hooks:
- BeforeAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-pretraffic"
- AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-posttraffic"
Pre-Traffic Hook — Synthetic Validation
"""
myapp-pretraffic Lambda function.
Runs BEFORE any production traffic hits the new version.
Must call codedeploy.put_lifecycle_event_hook_execution_status()
"""
import boto3, json, urllib.request, os
codedeploy = boto3.client("codedeploy")
def handler(event, context):
deployment_id = event["DeploymentId"]
hook_id = event["LifecycleEventHookExecutionId"]
status = "Succeeded"
try:
# Invoke the new version directly (bypassing alias) for smoke test
lambda_client = boto3.client("lambda")
response = lambda_client.invoke(
FunctionName = os.environ["TARGET_FUNCTION_ARN"],
InvocationType= "RequestResponse",
Payload = json.dumps({"httpMethod": "GET", "path": "/health"})
)
payload = json.loads(response["Payload"].read())
if response.get("FunctionError") or payload.get("statusCode") != 200:
status = "Failed"
print(f"Pre-traffic smoke test FAILED: {payload}")
else:
print(f"Pre-traffic smoke test PASSED: {payload}")
except Exception as e:
print(f"Pre-traffic hook exception: {e}")
status = "Failed"
finally:
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=deployment_id,
lifecycleEventHookExecutionId=hook_id,
status=status
)
Post-Traffic Hook — Metric Validation
"""
myapp-posttraffic Lambda function.
Runs AFTER the canary traffic has been live for the configured interval.
Checks CloudWatch metrics for the new version — fails deployment if error rate is high.
"""
import boto3, datetime, os
codedeploy = boto3.client("codedeploy")
cloudwatch = boto3.client("cloudwatch")
def handler(event, context):
deployment_id = event["DeploymentId"]
hook_id = event["LifecycleEventHookExecutionId"]
status = "Succeeded"
try:
end = datetime.datetime.utcnow()
start = end - datetime.timedelta(minutes=10)
# Get error count for the new version
errors = cloudwatch.get_metric_statistics(
Namespace = "AWS/Lambda",
MetricName = "Errors",
Dimensions = [
{"Name": "FunctionName", "Value": "myapp-api"},
{"Name": "Resource", "Value": f"myapp-api:{os.environ['NEW_VERSION']}"}
],
StartTime = start,
EndTime = end,
Period = 600,
Statistics = ["Sum"]
)
invocations = cloudwatch.get_metric_statistics(
Namespace = "AWS/Lambda",
MetricName = "Invocations",
Dimensions = [
{"Name": "FunctionName", "Value": "myapp-api"},
{"Name": "Resource", "Value": f"myapp-api:{os.environ['NEW_VERSION']}"}
],
StartTime = start,
EndTime = end,
Period = 600,
Statistics = ["Sum"]
)
error_sum = sum(d["Sum"] for d in errors["Datapoints"])
invocation_sum = sum(d["Sum"] for d in invocations["Datapoints"])
error_rate = (error_sum / invocation_sum * 100) if invocation_sum > 0 else 0
print(f"Error rate for new version: {error_rate:.2f}%")
if error_rate > 1.0:
status = "Failed"
print("Post-traffic check FAILED — error rate exceeds 1%")
except Exception as e:
print(f"Post-traffic hook exception: {e}")
status = "Failed"
finally:
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=deployment_id,
lifecycleEventHookExecutionId=hook_id,
status=status
)
Deployment Configurations for Lambda
# Canary: 10% for 5 minutes then all-at-once
aws deploy create-deployment \
--application-name myapp-lambda \
--deployment-group-name production \
--deployment-config-name CodeDeployDefault.LambdaCanary10Percent5Minutes \
--revision revisionType=S3,s3Location="{bucket=myapp-appspec,key=appspec.yaml,bundleType=YAML}"
# Linear: 10% every minute (10 minutes to full rollout)
aws deploy create-deployment \
--application-name myapp-lambda \
--deployment-group-name production \
--deployment-config-name CodeDeployDefault.LambdaLinear10PercentEvery1Minute \
--revision revisionType=S3,s3Location="{bucket=myapp-appspec,key=appspec.yaml,bundleType=YAML}"
# Custom: 25% every 2 minutes
aws deploy create-deployment-config \
--deployment-config-name MyCanary25Percent2Min \
--compute-platform Lambda \
--traffic-routing-config \
type=TimeBasedCanary,timeBasedCanary="{canaryPercentage=25,canaryInterval=2}"
6. Rollback Configuration — Automatic and Manual
Rollback in CodeDeploy means re-deploying the last known good revision to every affected instance, task, or function version. For ECS and Lambda the rollback is effectively instant — just a pointer flip. For EC2 it triggers a full deployment of the previous revision, which takes as long as a normal deployment.
Automatic Rollback Triggers
Configure automatic rollback at the deployment group level. There are three triggers:
- DEPLOYMENT_FAILURE — Rolls back when any lifecycle hook script exits non-zero or when health checks fail.
- DEPLOYMENT_STOP_ON_ALARM — Rolls back when a linked CloudWatch alarm enters the
ALARMstate during deployment. - DEPLOYMENT_STOP_ON_REQUEST — Rolls back when a user or pipeline manually stops the deployment.
# Update deployment group to add alarm-based rollback
aws deploy update-deployment-group \
--application-name myapp \
--current-deployment-group-name production \
--auto-rollback-configuration \
'enabled=true,events=["DEPLOYMENT_FAILURE","DEPLOYMENT_STOP_ON_ALARM"]' \
--alarm-configuration \
'enabled=true,alarms=[{"name":"myapp-5xx-errors"},{"name":"myapp-p99-latency"}]'
CloudWatch Alarm Setup for Rollback
# Terraform — CloudWatch alarm for API error rate
resource "aws_cloudwatch_metric_alarm" "error_rate" {
alarm_name = "myapp-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
threshold = 5
metric_query {
id = "error_rate"
expression = "errors / invocations * 100"
label = "Error Rate (%)"
return_data = true
}
metric_query {
id = "errors"
metric {
namespace = "AWS/Lambda"
metric_name = "Errors"
dimensions = { FunctionName = "myapp-api" }
period = 60
stat = "Sum"
}
}
metric_query {
id = "invocations"
metric {
namespace = "AWS/Lambda"
metric_name = "Invocations"
dimensions = { FunctionName = "myapp-api" }
period = 60
stat = "Sum"
}
}
alarm_description = "Rolls back CodeDeploy if error rate exceeds 5%"
treat_missing_data = "notBreaching"
}
Manual Rollback via CLI
# Stop an in-progress deployment and roll back
aws deploy stop-deployment \
--deployment-id d-ABC123DEF \
--auto-rollback-enabled
# Manually re-deploy the previous revision (get the ID first)
aws deploy list-deployments \
--application-name myapp \
--deployment-group-name production \
--include-only-statuses Succeeded \
--query 'deployments[0]' \
--output text
# Get the revision from the last successful deployment
LAST_DEPLOYMENT_ID=d-XYZ789GHI
aws deploy get-deployment \
--deployment-id $LAST_DEPLOYMENT_ID \
--query 'deploymentInfo.revision'
# Redeploy that revision
aws deploy create-deployment \
--application-name myapp \
--deployment-group-name production \
--s3-location bucket=myapp-artifacts,key=releases/myapp-1.2.2.zip,bundleType=zip \
--description "Manual rollback to 1.2.2"
7. CodePipeline Integration — Full Terraform Source→Build→Deploy
Integrating CodeDeploy into a CodePipeline gives you end-to-end automation: a commit to main triggers the pipeline, CodeBuild compiles and tests, an optional manual gate awaits approval, then CodeDeploy rolls out the new version to production. The following Terraform creates a complete GitHub → CodeBuild → CodeDeploy pipeline for a Lambda function.
# ── S3 Artifact Bucket ──────────────────────────────────────────
resource "aws_s3_bucket" "artifacts" {
bucket = "myapp-pipeline-artifacts-${data.aws_caller_identity.current.account_id}"
force_destroy = true
}
resource "aws_s3_bucket_versioning" "artifacts" {
bucket = aws_s3_bucket.artifacts.id
versioning_configuration { status = "Enabled" }
}
# ── CodeBuild Project ────────────────────────────────────────────
resource "aws_codebuild_project" "build" {
name = "myapp-build"
service_role = aws_iam_role.codebuild.arn
artifacts { type = "CODEPIPELINE" }
environment {
compute_type = "BUILD_GENERAL1_SMALL"
image = "aws/codebuild/standard:7.0"
type = "LINUX_CONTAINER"
privileged_mode = true # needed for Docker builds
environment_variable {
name = "AWS_ACCOUNT_ID"
value = data.aws_caller_identity.current.account_id
}
environment_variable {
name = "LAMBDA_FUNCTION"
value = aws_lambda_function.myapp.function_name
}
}
source {
type = "CODEPIPELINE"
buildspec = file("${path.module}/buildspec.yml")
}
}
# ── CodePipeline ─────────────────────────────────────────────────
resource "aws_codepipeline" "myapp" {
name = "myapp-pipeline"
role_arn = aws_iam_role.codepipeline.arn
artifact_store {
location = aws_s3_bucket.artifacts.bucket
type = "S3"
}
stage {
name = "Source"
action {
name = "GitHubSource"
category = "Source"
owner = "AWS"
provider = "CodeStarSourceConnection"
version = "1"
output_artifacts = ["SourceArtifact"]
configuration = {
ConnectionArn = var.codestar_connection_arn
FullRepositoryId = "myorg/myapp"
BranchName = "main"
DetectChanges = "true"
}
}
}
stage {
name = "Build"
action {
name = "BuildAndPackage"
category = "Build"
owner = "AWS"
provider = "CodeBuild"
version = "1"
input_artifacts = ["SourceArtifact"]
output_artifacts = ["BuildArtifact"]
configuration = {
ProjectName = aws_codebuild_project.build.name
}
}
}
stage {
name = "Approve"
action {
name = "ApproveProduction"
category = "Approval"
owner = "AWS"
provider = "Manual"
version = "1"
configuration = {
NotificationArn = aws_sns_topic.pipeline.arn
CustomData = "Review build artifacts and approve for production deploy."
}
}
}
stage {
name = "Deploy"
action {
name = "DeployToLambda"
category = "Deploy"
owner = "AWS"
provider = "CodeDeploy"
version = "1"
input_artifacts = ["BuildArtifact"]
configuration = {
ApplicationName = aws_codedeploy_app.myapp.name
DeploymentGroupName = aws_codedeploy_deployment_group.production.deployment_group_name
}
}
}
}
buildspec.yml for Lambda Pipeline
# buildspec.yml — packages Lambda and generates appspec.yml
version: 0.2
phases:
install:
runtime-versions:
python: 3.12
commands:
- pip install --upgrade pip
pre_build:
commands:
- echo "Running unit tests"
- pip install -r requirements.txt -t ./package/
- python -m pytest tests/ -v --tb=short
build:
commands:
- echo "Packaging Lambda"
- cp -r package/* .
- zip -r function.zip . -x "tests/*" -x "*.git*" -x "requirements*.txt"
- |
NEW_VERSION=$(aws lambda update-function-code \
--function-name $LAMBDA_FUNCTION \
--zip-file fileb://function.zip \
--query Version \
--output text \
--publish)
echo "Published Lambda version: $NEW_VERSION"
CURRENT_VERSION=$(aws lambda get-alias \
--function-name $LAMBDA_FUNCTION \
--name live \
--query FunctionVersion --output text)
# Generate appspec.yml pointing to new version
cat > appspec.yaml << EOF
version: 0.0
Resources:
- MyLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Name: "$LAMBDA_FUNCTION"
Alias: "live"
CurrentVersion: "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:${LAMBDA_FUNCTION}:${CURRENT_VERSION}"
TargetVersion: "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:${LAMBDA_FUNCTION}:${NEW_VERSION}"
Hooks:
- BeforeAllowTraffic: "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:myapp-pretraffic"
- AfterAllowTraffic: "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:myapp-posttraffic"
EOF
artifacts:
files:
- appspec.yaml
8. Monitoring — CloudWatch Metrics, SNS Notifications, Deployment Events
A deployment that succeeds is not a deployment you can learn from — the real value is in the data you collect around every deployment event. CodeDeploy emits detailed events to EventBridge and exposes metrics via CloudWatch. Combining these with SNS notifications and a CloudWatch dashboard gives your team full observability over every deployment lifecycle.
CodeDeploy CloudWatch Metrics
CodeDeploy publishes to the AWS/CodeDeploy CloudWatch namespace. The key metrics are:
DeploymentAttempts— Total deployments started per deployment group.DeploymentSuccesses— Successful deployments (all targets healthy).DeploymentFailures— Deployments that ended in a FAILED state.DeploymentRollbacks— Rollback deployments triggered (automatic or manual).
# Terraform — CloudWatch Dashboard for deployment health
resource "aws_cloudwatch_dashboard" "codedeploy" {
dashboard_name = "CodeDeploy-MyApp"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
properties = {
title = "Deployment Success vs Failure"
period = 86400
stat = "Sum"
metrics = [
["AWS/CodeDeploy", "DeploymentSuccesses", "Application", "myapp", "DeploymentGroup", "production"],
["AWS/CodeDeploy", "DeploymentFailures", "Application", "myapp", "DeploymentGroup", "production"],
["AWS/CodeDeploy", "DeploymentRollbacks", "Application", "myapp", "DeploymentGroup", "production"]
]
}
},
{
type = "metric"
properties = {
title = "Lambda Error Rate (canary window)"
period = 60
stat = "Average"
metrics = [
["AWS/Lambda", "Errors", "FunctionName", "myapp-api"]
]
}
}
]
})
}
EventBridge Rules for Deployment Events
CodeDeploy fires EventBridge events for every state transition. Use EventBridge rules to send notifications to SNS, trigger Lambda, or post to Slack:
resource "aws_cloudwatch_event_rule" "deploy_state" {
name = "codedeploy-state-changes"
description = "Catch CodeDeploy deployment state changes"
event_pattern = jsonencode({
source = ["aws.codedeploy"]
"detail-type" = ["CodeDeploy Deployment State-change Notification"]
detail = {
application = ["myapp"]
deploymentGroup = ["production"]
state = ["START", "SUCCESS", "FAILURE", "STOP", "ROLLBACK"]
}
})
}
resource "aws_cloudwatch_event_target" "notify_sns" {
rule = aws_cloudwatch_event_rule.deploy_state.name
arn = aws_sns_topic.deployments.arn
input_transformer {
input_paths = {
state = "$.detail.state"
deployment = "$.detail.deploymentId"
application = "$.detail.application"
group = "$.detail.deploymentGroup"
}
input_template = "\"[CodeDeploy] / — is \""
}
}
resource "aws_sns_topic" "deployments" {
name = "codedeploy-notifications"
}
resource "aws_sns_topic_subscription" "email" {
topic_arn = aws_sns_topic.deployments.arn
protocol = "email"
endpoint = "devops@mycompany.com"
}
Querying Deployment History with CLI
# List last 10 deployments for a deployment group
aws deploy list-deployments \
--application-name myapp \
--deployment-group-name production \
--query 'deployments[:10]' \
--output table
# Get detailed deployment info including lifecycle events
aws deploy get-deployment \
--deployment-id d-ABC123DEF \
--query 'deploymentInfo.{Status:status,Rollback:rollbackInfo,StartTime:startTime,Duration:deploymentDuration}'
# List instance-level deployment status
aws deploy list-deployment-instances \
--deployment-id d-ABC123DEF \
--instance-status-filter Failed \
--output table
# Get lifecycle hook execution details for a failed instance
aws deploy get-deployment-instance \
--deployment-id d-ABC123DEF \
--instance-id i-0abc1234def567890 \
--query 'instanceSummary.lifecycleEvents[?status==`Failed`]'
Deployment Frequency and DORA Metrics
The DeploymentSuccesses metric is your raw deployment frequency signal — a core DORA (DevOps Research and Assessment) metric. Create a CloudWatch Insights query on the CodeDeploy API log (via CloudTrail → CloudWatch Logs) to compute mean time to recovery (MTTR): the average time between a FAILURE event and the next SUCCESS event on the same deployment group. Teams with mature CodeDeploy setups routinely achieve sub-5-minute MTTR on ECS and Lambda deployments because rollback is instant.
treat_missing_data to notBreaching on deployment alarms. During a cold-start deployment, some metrics will have no datapoints for the first minute — you do not want an alarm to fire simply because the new version has not yet received enough traffic to produce metric data.
Cost Considerations
CodeDeploy itself is free for EC2 and Lambda deployments. You pay only for the AWS resources used during deployment: the extra ECS tasks during a blue/green swap, the S3 requests to fetch revision artifacts, and the CloudWatch Logs storage for deployment logs. For ECS blue/green at scale, keep the termination wait time short (15–30 minutes) to avoid running double the task count unnecessarily. For Lambda deployments there is effectively zero extra cost — version pointers are free.
Frequently Asked Questions
Can I use CodeDeploy with containers not on ECS?
CodeDeploy's native ECS integration requires using the ECS launch type (EC2 or Fargate) with a CodeDeploy-controlled deployment controller. If you run containers directly on EC2 (without ECS), use the EC2/On-Premises platform and treat the container lifecycle (docker pull, docker run) as lifecycle hook commands in your appspec.yml shell scripts.
What is the difference between CodeDeployDefault.ECSCanary10Percent5Minutes and a custom config?
The built-in config shifts 10% of traffic to the new task set, waits 5 minutes, then shifts the remaining 90% all at once. A custom deployment configuration lets you specify any percentage and any interval — for example, 5% every 2 minutes — giving you finer control over the exposure curve. Use built-in configs for standard services; use custom configs for high-traffic services where even 10% canary exposure is too aggressive.
Does CodeDeploy support multi-region deployments?
Not natively. CodeDeploy operates within a single AWS region. For multi-region deployments, create a separate CodeDeploy Application and Deployment Group in each target region and use CodePipeline with cross-region actions to fan out deployments. Each region's pipeline action triggers the corresponding CodeDeploy deployment group independently.
How do I deploy to on-premises servers?
Install the CodeDeploy agent on your on-premises servers, register them as on-premises instances with aws deploy register-on-premises-instance, and tag them so your deployment group's instance tag filter matches. The deployment flow is identical to EC2 — the agent polls, fetches the revision from S3, and executes lifecycle hooks. IAM credentials for the agent are provided via the on-premises instance configuration file.