AWS CodeDeploy: Blue-Green, Canary and Rolling Deployments

Published June 2026 · 18 min read

Shipping code safely to production is the hardest part of software delivery. A bug that slips past staging can take down an entire service in seconds — and rolling it back manually under pressure is painful. AWS CodeDeploy solves this by automating the entire deployment lifecycle with built-in traffic-shifting strategies, lifecycle hooks for health checks, and automatic rollback on alarm breach. Whether you deploy to a fleet of EC2 instances, an ECS service, or a Lambda function, CodeDeploy gives you production-grade deployment semantics without building custom orchestration.

This guide covers every CodeDeploy strategy in depth — in-place, blue/green, canary, and linear — with real appspec.yml files, shell lifecycle hooks, Terraform infrastructure, and CloudWatch rollback configuration. By the end you will be able to design and operate zero-downtime deployments for any AWS compute target.

CodeDeploy Concepts — Applications, Deployment Groups, AppSpec
Deployment Strategies — Comparison Table
EC2 Deployments — Agent, AppSpec, Lifecycle Hooks
ECS Blue/Green — ALB Listener Switching with Terraform
Lambda Canary and Linear — Traffic Hooks and CloudWatch Rollback
Rollback Configuration — Automatic and Manual
CodePipeline Integration — Terraform Source→Build→Deploy
Monitoring — CloudWatch Metrics, SNS, Deployment Events

1. CodeDeploy Concepts — Applications, Deployment Groups, AppSpec

AWS CodeDeploy organises everything under three logical objects: the Application, the Deployment Group, and the Revision. Understanding how they relate is essential before writing a single line of configuration.

Application

An Application is simply a named container that scopes a set of deployments to a compute platform. The platform can be EC2/On-Premises, ECS, or Lambda. You create one Application per deployable unit — typically one per microservice or application stack. The application name appears in every CLI command and CodePipeline action and must be unique within an AWS region.

Deployment Group

A Deployment Group lives inside an Application and defines where and how to deploy. It specifies:

Target — For EC2: Auto Scaling Group name or tag filter (e.g., Name=myapp-prod). For ECS: cluster name + service name. For Lambda: the function name.
Deployment configuration — The traffic-shifting strategy (OneAtATime, HalfAtATime, AllAtOnce, or a custom config).
Load balancer — Classic ELB, Application Load Balancer (ALB) target group, or Network Load Balancer for health gating.
Service role — An IAM role that CodeDeploy assumes to interact with EC2, ECS, Lambda, ELB, CloudWatch, and S3.
Rollback settings — Automatic rollback triggers: deployment failure or CloudWatch alarm breach.

Revision

A Revision is the combination of your application code and its appspec.yml file, packaged as a ZIP in S3 or referenced as a container image tag or Lambda function version. Every create-deployment call points to a specific revision. CodeDeploy fetches it from S3 (EC2), uses the image URI from the ECS task definition (ECS), or uses the Lambda alias/version pointer (Lambda).

appspec.yml

The appspec.yml (Application Specification file) is CodeDeploy's instruction manifest. Its structure varies by compute platform but always defines:

The deployment target (EC2 file mappings, ECS task definition, Lambda function version)
Lifecycle event hooks — shell scripts to run at specific points in the deployment lifecycle
Permissions (EC2 only) — file ownership and ACL settings

# Minimal EC2 appspec.yml structure
version: 0.0
os: linux
files:
  - source: /          # copy entire revision root
    destination: /opt/myapp

permissions:
  - object: /opt/myapp
    owner: ec2-user
    group: ec2-user
    mode: "755"
    type:
      - directory
      - file

hooks:
  BeforeInstall:
    - location: scripts/stop_service.sh
      timeout: 60
      runas: root
  AfterInstall:
    - location: scripts/install_dependencies.sh
      timeout: 120
      runas: ec2-user
  ApplicationStart:
    - location: scripts/start_service.sh
      timeout: 60
      runas: root
  ValidateService:
    - location: scripts/health_check.sh
      timeout: 30
      runas: ec2-user

Key insight: The appspec.yml must be at the root of your revision ZIP for EC2 deployments. For ECS and Lambda the file is referenced differently — it lives in S3 and points to the task definition or function version, not to source files.

Deployment Configuration

CodeDeploy ships with several built-in deployment configurations. For EC2/On-Premises: CodeDeployDefault.OneAtATime, CodeDeployDefault.HalfAtATime, CodeDeployDefault.AllAtOnce. For Lambda and ECS: CodeDeployDefault.LambdaCanary10Percent5Minutes, CodeDeployDefault.LambdaLinear10PercentEvery1Minute, CodeDeployDefault.LambdaAllAtOnce, and their ECS equivalents. You can also create custom configurations specifying the exact percentage of instances or traffic weight to shift per interval.

2. Deployment Strategies — In-Place, Blue/Green, Canary, Linear

Choosing the right deployment strategy depends on your tolerance for downtime, your ability to run two environments simultaneously, and how quickly you need to detect regressions in production traffic. Here is a definitive comparison:

Strategy	Compute	Downtime	Rollback Speed	Cost	Best For
In-Place (Rolling)	EC2	Partial (per-batch)	Slow (re-deploy old)	Low	Non-critical services, batch workers
Blue/Green (EC2)	EC2	Zero	Fast (switch ASG)	2x during deploy	Stateless web services
Blue/Green (ECS)	ECS + ALB	Zero	Instant (listener rule)	2x tasks during shift	Containerised microservices
Canary	ECS / Lambda	Zero	Instant (shift back)	Minimal extra	High-risk changes, A/B validation
Linear	ECS / Lambda	Zero	Instant (shift back)	Minimal extra	Gradual rollout, SLO-guarded releases

In-Place (Rolling)

CodeDeploy stops the application on a batch of instances, deploys the new revision, runs lifecycle hooks (including health checks), and only advances to the next batch once the current batch is healthy. If any instance in a batch fails its health check, the deployment stops — but already-deployed instances are not automatically reverted. In-place is the simplest strategy but is not appropriate for services that cannot tolerate reduced capacity during deployment.

Blue/Green

In a blue/green deployment CodeDeploy provisions a brand-new set of resources (green) alongside the existing production set (blue). Once the green environment passes health checks, traffic is shifted in a single cut-over. The blue environment is retained for a configurable termination wait time (default: 1 hour for EC2, 0 for ECS) to allow rollback by a simple traffic re-shift. After the wait period, blue resources are terminated. This strategy completely eliminates deployment downtime and makes rollback trivially fast.

Canary

A canary deployment shifts a small initial percentage of traffic (e.g., 10%) to the new version, waits a defined interval while CloudWatch alarms monitor error rates and latency, then shifts the remaining 90% if no alarms fire. If an alarm fires during the wait interval, CodeDeploy automatically rolls back by shifting all traffic back to the original version. Canary is ideal for high-stakes changes where you want to validate production behaviour on real traffic before committing.

Linear

Linear deployments shift traffic in equal increments on a fixed schedule. For example, Linear10PercentEvery1Minute shifts 10% every minute until 100% is served by the new version — completing in 10 minutes. Each interval is monitored by CloudWatch alarms. Linear is a good middle ground between the speed of canary (two-step) and a fully manual staged rollout.

Traffic shifting for Lambda and ECS only: Canary and Linear are native to the Lambda and ECS platforms. EC2 blue/green does a full traffic cut-over — there is no native percentage-based traffic shifting for EC2 deployments (you would need AWS Global Accelerator or weighted Route 53 records for that at the EC2 layer).

3. EC2 Deployments — Agent, AppSpec, Lifecycle Hooks

Deploying to EC2 requires the CodeDeploy agent running on each instance. The agent polls the CodeDeploy service endpoint, receives deployment instructions, executes lifecycle hooks, and reports results back. Setting this up correctly — including proper IAM instance profile permissions — is the foundation for reliable EC2 deployments.

Installing the CodeDeploy Agent

For Amazon Linux 2 / Amazon Linux 2023:

#!/bin/bash
# install-codedeploy-agent.sh
# Run as root on Amazon Linux 2 / AL2023

yum update -y
yum install -y ruby wget

cd /home/ec2-user
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
wget https://aws-codedeploy-${REGION}.s3.${REGION}.amazonaws.com/latest/install

chmod +x ./install
./install auto

systemctl enable codedeploy-agent
systemctl start codedeploy-agent
systemctl status codedeploy-agent

Add this script to your EC2 launch template or AMI user data so every new instance in your Auto Scaling Group is agent-ready at launch. The instance profile must include AmazonEC2RoleforAWSCodeDeploy (or equivalent inline policy) to allow the agent to pull revisions from S3 and report status.

Complete appspec.yml with Lifecycle Hooks

A production-grade EC2 appspec.yml with all lifecycle events and meaningful shell scripts:

# appspec.yml — production EC2 deployment
version: 0.0
os: linux

files:
  - source: /app
    destination: /opt/myapp
  - source: /conf
    destination: /etc/myapp

permissions:
  - object: /opt/myapp
    owner: myapp
    group: myapp
    mode: "755"
    type: [directory]
  - object: /opt/myapp/bin
    owner: myapp
    group: myapp
    mode: "755"
    type: [file]

hooks:
  BeforeInstall:
    - location: scripts/before_install.sh
      timeout: 120
      runas: root
  AfterInstall:
    - location: scripts/after_install.sh
      timeout: 120
      runas: root
  ApplicationStart:
    - location: scripts/start_app.sh
      timeout: 60
      runas: root
  ApplicationStop:
    - location: scripts/stop_app.sh
      timeout: 60
      runas: root
  ValidateService:
    - location: scripts/validate.sh
      timeout: 60
      runas: ec2-user

Lifecycle Hook Shell Scripts

#!/bin/bash
# scripts/before_install.sh
# Runs BEFORE files are copied to the instance
# Use this to: remove old application, create directories, install OS packages

set -e

APP_DIR=/opt/myapp
BACKUP_DIR=/opt/myapp-backup

# Back up current installation
if [ -d "$APP_DIR" ]; then
  echo "Backing up current app to $BACKUP_DIR"
  rm -rf "$BACKUP_DIR"
  cp -a "$APP_DIR" "$BACKUP_DIR"
fi

# Create app user if it doesn't exist
id -u myapp &>/dev/null || useradd -r -s /bin/false myapp

# Install required system packages
yum install -y java-21-amazon-corretto-headless

#!/bin/bash
# scripts/after_install.sh
# Runs AFTER files are copied but BEFORE the app starts
# Use this to: configure, set permissions, render config templates

set -e
APP_DIR=/opt/myapp

# Pull secrets from SSM Parameter Store
DB_HOST=$(aws ssm get-parameter \
  --name /myapp/prod/db-host \
  --query Parameter.Value \
  --output text \
  --region us-east-1)

DB_PASS=$(aws ssm get-parameter \
  --name /myapp/prod/db-password \
  --with-decryption \
  --query Parameter.Value \
  --output text \
  --region us-east-1)

# Write application properties
cat > /etc/myapp/application.properties << EOF
spring.datasource.url=jdbc:mysql://${DB_HOST}:3306/mydb
spring.datasource.password=${DB_PASS}
server.port=8080
EOF

chown myapp:myapp /etc/myapp/application.properties
chmod 640 /etc/myapp/application.properties

#!/bin/bash
# scripts/validate.sh
# Runs AFTER ApplicationStart — must exit 0 for deployment to succeed

set -e

MAX_ATTEMPTS=30
ATTEMPT=0
URL="http://localhost:8080/actuator/health"

echo "Waiting for application health check..."
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL" || echo "000")
  if [ "$HTTP_CODE" = "200" ]; then
    echo "Health check passed (HTTP 200)"
    exit 0
  fi
  echo "Attempt $((ATTEMPT+1))/$MAX_ATTEMPTS — HTTP $HTTP_CODE — waiting 5s..."
  sleep 5
  ATTEMPT=$((ATTEMPT+1))
done

echo "Health check FAILED after $MAX_ATTEMPTS attempts"
exit 1

Creating the CodeDeploy Application and Deployment Group (AWS CLI)

# 1. Create application
aws deploy create-application \
  --application-name myapp \
  --compute-platform Server

# 2. Create deployment group targeting an Auto Scaling Group
aws deploy create-deployment-group \
  --application-name myapp \
  --deployment-group-name myapp-production \
  --service-role-arn arn:aws:iam::123456789012:role/CodeDeployRole \
  --auto-scaling-groups myapp-asg-prod \
  --deployment-config-name CodeDeployDefault.HalfAtATime \
  --load-balancer-info targetGroupInfoList=[{name=myapp-tg}] \
  --auto-rollback-configuration enabled=true,events=DEPLOYMENT_FAILURE,DEPLOYMENT_STOP_ON_ALARM \
  --alarm-configuration \
    enabled=true,alarms=[{name=myapp-5xx-errors},{name=myapp-high-latency}]

# 3. Deploy a revision from S3
aws deploy create-deployment \
  --application-name myapp \
  --deployment-group-name myapp-production \
  --s3-location bucket=myapp-artifacts,key=releases/myapp-1.2.3.zip,bundleType=zip \
  --description "Release 1.2.3"

Pro tip: Always set --auto-rollback-configuration when creating the deployment group. Tying rollback to both DEPLOYMENT_FAILURE and CloudWatch alarm breach gives you two independent safety nets — scripted health checks plus production signal monitoring.

4. ECS Blue/Green — ALB Listener Switching with Terraform

ECS blue/green deployments via CodeDeploy are the gold standard for containerised microservices on AWS. CodeDeploy manages the transition from the current task set (blue) to a replacement task set (green) by shifting weight between two ALB target groups. The ECS service, ALB, and CodeDeploy deployment group must all be configured to work together — and Terraform is the cleanest way to express this relationship as code.

ECS Blue/Green Architecture

The key components are:

ALB with two target groups — myapp-blue-tg (current production) and myapp-green-tg (new version). The ALB production listener forwards 100% to blue; during deployment, CodeDeploy gradually shifts weight to green.
ECS service — Created with deploymentController: type: CODE_DEPLOY. This disables the standard ECS rolling update and hands control to CodeDeploy.
CodeDeploy deployment group — References both target groups and the ECS cluster/service. After the green task set is healthy, CodeDeploy shifts the ALB listener.
appspec.yml in S3 — Points to the new ECS task definition ARN and the container/port mapping.

Terraform — ALB + Target Groups + ECS Service

# ── ALB Target Groups ───────────────────────────────────────────
resource "aws_lb_target_group" "blue" {
  name        = "myapp-blue-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"   # required for Fargate

  health_check {
    path                = "/actuator/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 15
    timeout             = 5
    matcher             = "200"
  }
}

resource "aws_lb_target_group" "green" {
  name        = "myapp-green-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    path                = "/actuator/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 15
    timeout             = 5
    matcher             = "200"
  }
}

# ── ALB Production Listener (port 443) ──────────────────────────
resource "aws_lb_listener" "production" {
  load_balancer_arn = aws_lb.myapp.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.acm_certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue.arn
  }

  lifecycle {
    # Prevent Terraform from reverting ALB rule after CodeDeploy shifts traffic
    ignore_changes = [default_action]
  }
}

# ── ALB Test Listener (port 8080) — CodeDeploy runs pre-traffic hooks here
resource "aws_lb_listener" "test" {
  load_balancer_arn = aws_lb.myapp.arn
  port              = 8080
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.green.arn
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}

# ── ECS Service — controlled by CodeDeploy ──────────────────────
resource "aws_ecs_service" "myapp" {
  name            = "myapp"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.myapp.arn
  desired_count   = 4
  launch_type     = "FARGATE"

  deployment_controller {
    type = "CODE_DEPLOY"   # hands deployment control to CodeDeploy
  }

  network_configuration {
    subnets         = var.private_subnet_ids
    security_groups = [aws_security_group.ecs_tasks.id]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "myapp"
    container_port   = 8080
  }

  lifecycle {
    ignore_changes = [task_definition, load_balancer]
  }
}

# ── CodeDeploy Application ───────────────────────────────────────
resource "aws_codedeploy_app" "myapp" {
  name             = "myapp-ecs"
  compute_platform = "ECS"
}

# ── CodeDeploy Deployment Group ──────────────────────────────────
resource "aws_codedeploy_deployment_group" "production" {
  app_name               = aws_codedeploy_app.myapp.name
  deployment_group_name  = "production"
  service_role_arn       = aws_iam_role.codedeploy.arn
  deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"

  ecs_service {
    cluster_name = aws_ecs_cluster.main.name
    service_name = aws_ecs_service.myapp.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.production.arn]
      }
      test_traffic_route {
        listener_arns = [aws_lb_listener.test.arn]
      }
      target_group {
        name = aws_lb_target_group.blue.name
      }
      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
  }

  alarm_configuration {
    alarms  = [aws_cloudwatch_metric_alarm.error_rate.name]
    enabled = true
  }
}

ECS appspec.yml

# appspec.yml — ECS blue/green deployment
# Upload this to S3; reference it in create-deployment call
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "arn:aws:ecs:us-east-1:123456789012:task-definition/myapp:42"
        LoadBalancerInfo:
          ContainerName: "myapp"
          ContainerPort: 8080
        PlatformVersion: "LATEST"
        NetworkConfiguration:
          AwsvpcConfiguration:
            Subnets:
              - subnet-0abc1234
              - subnet-0def5678
            SecurityGroups:
              - sg-0aabbccdd
            AssignPublicIp: "DISABLED"
Hooks:
  - BeforeAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-pretraffic-check"
  - AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-posttraffic-check"

lifecycle_ignore_changes is critical: Without ignore_changes = [task_definition, load_balancer] on the ECS service and ignore_changes = [default_action] on ALB listeners, Terraform will fight CodeDeploy — every terraform apply will revert the ALB back to blue, undoing CodeDeploy's traffic shift.

5. Lambda Canary and Linear — Traffic Hooks and CloudWatch Alarm Rollback

Lambda deployments with CodeDeploy are elegant: CodeDeploy shifts traffic between two Lambda function versions using a weighted alias. No infrastructure to provision, no instances to swap — just version pointers. Combined with pre/post traffic Lambda hooks and CloudWatch alarm rollback, you get production-grade canary deploys in minutes.

How Lambda Traffic Shifting Works

A Lambda alias can point to two versions with weighted routing. For example, alias live can route 90% to version 5 and 10% to version 6. CodeDeploy manages this weight automatically — you publish a new version, tell CodeDeploy, and it handles the alias weight progression according to your chosen configuration (canary or linear). If an alarm fires, it resets the alias to 100% on the old version instantly.

Lambda appspec.yml

# appspec.yml — Lambda canary deployment
version: 0.0
Resources:
  - MyLambdaFunction:
      Type: AWS::Lambda::Function
      Properties:
        Name: "myapp-api"
        Alias: "live"
        CurrentVersion: "arn:aws:lambda:us-east-1:123456789012:function:myapp-api:5"
        TargetVersion:  "arn:aws:lambda:us-east-1:123456789012:function:myapp-api:6"

Hooks:
  - BeforeAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:myapp-pretraffic"
  - AfterAllowTraffic:  "arn:aws:lambda:us-east-1:123456789012:function:myapp-posttraffic"

Pre-Traffic Hook — Synthetic Validation

"""
myapp-pretraffic Lambda function.
Runs BEFORE any production traffic hits the new version.
Must call codedeploy.put_lifecycle_event_hook_execution_status()
"""
import boto3, json, urllib.request, os

codedeploy = boto3.client("codedeploy")

def handler(event, context):
    deployment_id = event["DeploymentId"]
    hook_id       = event["LifecycleEventHookExecutionId"]
    status        = "Succeeded"

    try:
        # Invoke the new version directly (bypassing alias) for smoke test
        lambda_client = boto3.client("lambda")
        response = lambda_client.invoke(
            FunctionName  = os.environ["TARGET_FUNCTION_ARN"],
            InvocationType= "RequestResponse",
            Payload       = json.dumps({"httpMethod": "GET", "path": "/health"})
        )
        payload = json.loads(response["Payload"].read())
        if response.get("FunctionError") or payload.get("statusCode") != 200:
            status = "Failed"
            print(f"Pre-traffic smoke test FAILED: {payload}")
        else:
            print(f"Pre-traffic smoke test PASSED: {payload}")

    except Exception as e:
        print(f"Pre-traffic hook exception: {e}")
        status = "Failed"

    finally:
        codedeploy.put_lifecycle_event_hook_execution_status(
            deploymentId=deployment_id,
            lifecycleEventHookExecutionId=hook_id,
            status=status
        )

Post-Traffic Hook — Metric Validation

"""
myapp-posttraffic Lambda function.
Runs AFTER the canary traffic has been live for the configured interval.
Checks CloudWatch metrics for the new version — fails deployment if error rate is high.
"""
import boto3, datetime, os

codedeploy  = boto3.client("codedeploy")
cloudwatch  = boto3.client("cloudwatch")

def handler(event, context):
    deployment_id = event["DeploymentId"]
    hook_id       = event["LifecycleEventHookExecutionId"]
    status        = "Succeeded"

    try:
        end   = datetime.datetime.utcnow()
        start = end - datetime.timedelta(minutes=10)

        # Get error count for the new version
        errors = cloudwatch.get_metric_statistics(
            Namespace  = "AWS/Lambda",
            MetricName = "Errors",
            Dimensions = [
                {"Name": "FunctionName", "Value": "myapp-api"},
                {"Name": "Resource",     "Value": f"myapp-api:{os.environ['NEW_VERSION']}"}
            ],
            StartTime   = start,
            EndTime     = end,
            Period      = 600,
            Statistics  = ["Sum"]
        )
        invocations = cloudwatch.get_metric_statistics(
            Namespace  = "AWS/Lambda",
            MetricName = "Invocations",
            Dimensions = [
                {"Name": "FunctionName", "Value": "myapp-api"},
                {"Name": "Resource",     "Value": f"myapp-api:{os.environ['NEW_VERSION']}"}
            ],
            StartTime   = start,
            EndTime     = end,
            Period      = 600,
            Statistics  = ["Sum"]
        )
        error_sum       = sum(d["Sum"] for d in errors["Datapoints"])
        invocation_sum  = sum(d["Sum"] for d in invocations["Datapoints"])
        error_rate      = (error_sum / invocation_sum * 100) if invocation_sum > 0 else 0

        print(f"Error rate for new version: {error_rate:.2f}%")
        if error_rate > 1.0:
            status = "Failed"
            print("Post-traffic check FAILED — error rate exceeds 1%")

    except Exception as e:
        print(f"Post-traffic hook exception: {e}")
        status = "Failed"

    finally:
        codedeploy.put_lifecycle_event_hook_execution_status(
            deploymentId=deployment_id,
            lifecycleEventHookExecutionId=hook_id,
            status=status
        )

Deployment Configurations for Lambda

# Canary: 10% for 5 minutes then all-at-once
aws deploy create-deployment \
  --application-name myapp-lambda \
  --deployment-group-name production \
  --deployment-config-name CodeDeployDefault.LambdaCanary10Percent5Minutes \
  --revision revisionType=S3,s3Location="{bucket=myapp-appspec,key=appspec.yaml,bundleType=YAML}"

# Linear: 10% every minute (10 minutes to full rollout)
aws deploy create-deployment \
  --application-name myapp-lambda \
  --deployment-group-name production \
  --deployment-config-name CodeDeployDefault.LambdaLinear10PercentEvery1Minute \
  --revision revisionType=S3,s3Location="{bucket=myapp-appspec,key=appspec.yaml,bundleType=YAML}"

# Custom: 25% every 2 minutes
aws deploy create-deployment-config \
  --deployment-config-name MyCanary25Percent2Min \
  --compute-platform Lambda \
  --traffic-routing-config \
    type=TimeBasedCanary,timeBasedCanary="{canaryPercentage=25,canaryInterval=2}"

6. Rollback Configuration — Automatic and Manual

Rollback in CodeDeploy means re-deploying the last known good revision to every affected instance, task, or function version. For ECS and Lambda the rollback is effectively instant — just a pointer flip. For EC2 it triggers a full deployment of the previous revision, which takes as long as a normal deployment.

Automatic Rollback Triggers

Configure automatic rollback at the deployment group level. There are three triggers:

DEPLOYMENT_FAILURE — Rolls back when any lifecycle hook script exits non-zero or when health checks fail.
DEPLOYMENT_STOP_ON_ALARM — Rolls back when a linked CloudWatch alarm enters the ALARM state during deployment.
DEPLOYMENT_STOP_ON_REQUEST — Rolls back when a user or pipeline manually stops the deployment.

# Update deployment group to add alarm-based rollback
aws deploy update-deployment-group \
  --application-name myapp \
  --current-deployment-group-name production \
  --auto-rollback-configuration \
    'enabled=true,events=["DEPLOYMENT_FAILURE","DEPLOYMENT_STOP_ON_ALARM"]' \
  --alarm-configuration \
    'enabled=true,alarms=[{"name":"myapp-5xx-errors"},{"name":"myapp-p99-latency"}]'

CloudWatch Alarm Setup for Rollback

# Terraform — CloudWatch alarm for API error rate
resource "aws_cloudwatch_metric_alarm" "error_rate" {
  alarm_name          = "myapp-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  threshold           = 5

  metric_query {
    id          = "error_rate"
    expression  = "errors / invocations * 100"
    label       = "Error Rate (%)"
    return_data = true
  }
  metric_query {
    id = "errors"
    metric {
      namespace   = "AWS/Lambda"
      metric_name = "Errors"
      dimensions  = { FunctionName = "myapp-api" }
      period      = 60
      stat        = "Sum"
    }
  }
  metric_query {
    id = "invocations"
    metric {
      namespace   = "AWS/Lambda"
      metric_name = "Invocations"
      dimensions  = { FunctionName = "myapp-api" }
      period      = 60
      stat        = "Sum"
    }
  }

  alarm_description = "Rolls back CodeDeploy if error rate exceeds 5%"
  treat_missing_data = "notBreaching"
}

Manual Rollback via CLI

# Stop an in-progress deployment and roll back
aws deploy stop-deployment \
  --deployment-id d-ABC123DEF \
  --auto-rollback-enabled

# Manually re-deploy the previous revision (get the ID first)
aws deploy list-deployments \
  --application-name myapp \
  --deployment-group-name production \
  --include-only-statuses Succeeded \
  --query 'deployments[0]' \
  --output text

# Get the revision from the last successful deployment
LAST_DEPLOYMENT_ID=d-XYZ789GHI
aws deploy get-deployment \
  --deployment-id $LAST_DEPLOYMENT_ID \
  --query 'deploymentInfo.revision'

# Redeploy that revision
aws deploy create-deployment \
  --application-name myapp \
  --deployment-group-name production \
  --s3-location bucket=myapp-artifacts,key=releases/myapp-1.2.2.zip,bundleType=zip \
  --description "Manual rollback to 1.2.2"

Note: For ECS blue/green, CodeDeploy retains the original (blue) task set for the termination wait time configured on the deployment group. If you catch a problem within that window, you can roll back by stopping the deployment — CodeDeploy will shift all traffic back to blue and drain the green task set. Once the termination wait expires and blue is terminated, you must trigger a new deployment to roll forward to the previous image.

7. CodePipeline Integration — Full Terraform Source→Build→Deploy

Integrating CodeDeploy into a CodePipeline gives you end-to-end automation: a commit to main triggers the pipeline, CodeBuild compiles and tests, an optional manual gate awaits approval, then CodeDeploy rolls out the new version to production. The following Terraform creates a complete GitHub → CodeBuild → CodeDeploy pipeline for a Lambda function.

# ── S3 Artifact Bucket ──────────────────────────────────────────
resource "aws_s3_bucket" "artifacts" {
  bucket        = "myapp-pipeline-artifacts-${data.aws_caller_identity.current.account_id}"
  force_destroy = true
}

resource "aws_s3_bucket_versioning" "artifacts" {
  bucket = aws_s3_bucket.artifacts.id
  versioning_configuration { status = "Enabled" }
}

# ── CodeBuild Project ────────────────────────────────────────────
resource "aws_codebuild_project" "build" {
  name         = "myapp-build"
  service_role = aws_iam_role.codebuild.arn

  artifacts { type = "CODEPIPELINE" }

  environment {
    compute_type    = "BUILD_GENERAL1_SMALL"
    image           = "aws/codebuild/standard:7.0"
    type            = "LINUX_CONTAINER"
    privileged_mode = true   # needed for Docker builds

    environment_variable {
      name  = "AWS_ACCOUNT_ID"
      value = data.aws_caller_identity.current.account_id
    }
    environment_variable {
      name  = "LAMBDA_FUNCTION"
      value = aws_lambda_function.myapp.function_name
    }
  }

  source {
    type      = "CODEPIPELINE"
    buildspec = file("${path.module}/buildspec.yml")
  }
}

# ── CodePipeline ─────────────────────────────────────────────────
resource "aws_codepipeline" "myapp" {
  name     = "myapp-pipeline"
  role_arn = aws_iam_role.codepipeline.arn

  artifact_store {
    location = aws_s3_bucket.artifacts.bucket
    type     = "S3"
  }

  stage {
    name = "Source"
    action {
      name             = "GitHubSource"
      category         = "Source"
      owner            = "AWS"
      provider         = "CodeStarSourceConnection"
      version          = "1"
      output_artifacts = ["SourceArtifact"]
      configuration = {
        ConnectionArn        = var.codestar_connection_arn
        FullRepositoryId     = "myorg/myapp"
        BranchName           = "main"
        DetectChanges        = "true"
      }
    }
  }

  stage {
    name = "Build"
    action {
      name             = "BuildAndPackage"
      category         = "Build"
      owner            = "AWS"
      provider         = "CodeBuild"
      version          = "1"
      input_artifacts  = ["SourceArtifact"]
      output_artifacts = ["BuildArtifact"]
      configuration = {
        ProjectName = aws_codebuild_project.build.name
      }
    }
  }

  stage {
    name = "Approve"
    action {
      name     = "ApproveProduction"
      category = "Approval"
      owner    = "AWS"
      provider = "Manual"
      version  = "1"
      configuration = {
        NotificationArn = aws_sns_topic.pipeline.arn
        CustomData      = "Review build artifacts and approve for production deploy."
      }
    }
  }

  stage {
    name = "Deploy"
    action {
      name            = "DeployToLambda"
      category        = "Deploy"
      owner           = "AWS"
      provider        = "CodeDeploy"
      version         = "1"
      input_artifacts = ["BuildArtifact"]
      configuration = {
        ApplicationName     = aws_codedeploy_app.myapp.name
        DeploymentGroupName = aws_codedeploy_deployment_group.production.deployment_group_name
      }
    }
  }
}

buildspec.yml for Lambda Pipeline

# buildspec.yml — packages Lambda and generates appspec.yml
version: 0.2
phases:
  install:
    runtime-versions:
      python: 3.12
    commands:
      - pip install --upgrade pip

  pre_build:
    commands:
      - echo "Running unit tests"
      - pip install -r requirements.txt -t ./package/
      - python -m pytest tests/ -v --tb=short

  build:
    commands:
      - echo "Packaging Lambda"
      - cp -r package/* .
      - zip -r function.zip . -x "tests/*" -x "*.git*" -x "requirements*.txt"
      - |
        NEW_VERSION=$(aws lambda update-function-code \
          --function-name $LAMBDA_FUNCTION \
          --zip-file fileb://function.zip \
          --query Version \
          --output text \
          --publish)
        echo "Published Lambda version: $NEW_VERSION"
        CURRENT_VERSION=$(aws lambda get-alias \
          --function-name $LAMBDA_FUNCTION \
          --name live \
          --query FunctionVersion --output text)
        # Generate appspec.yml pointing to new version
        cat > appspec.yaml << EOF
        version: 0.0
        Resources:
          - MyLambdaFunction:
              Type: AWS::Lambda::Function
              Properties:
                Name: "$LAMBDA_FUNCTION"
                Alias: "live"
                CurrentVersion: "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:${LAMBDA_FUNCTION}:${CURRENT_VERSION}"
                TargetVersion:  "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:${LAMBDA_FUNCTION}:${NEW_VERSION}"
        Hooks:
          - BeforeAllowTraffic: "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:myapp-pretraffic"
          - AfterAllowTraffic:  "arn:aws:lambda:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:function:myapp-posttraffic"
        EOF

artifacts:
  files:
    - appspec.yaml

8. Monitoring — CloudWatch Metrics, SNS Notifications, Deployment Events

A deployment that succeeds is not a deployment you can learn from — the real value is in the data you collect around every deployment event. CodeDeploy emits detailed events to EventBridge and exposes metrics via CloudWatch. Combining these with SNS notifications and a CloudWatch dashboard gives your team full observability over every deployment lifecycle.

CodeDeploy CloudWatch Metrics

CodeDeploy publishes to the AWS/CodeDeploy CloudWatch namespace. The key metrics are:

DeploymentAttempts — Total deployments started per deployment group.
DeploymentSuccesses — Successful deployments (all targets healthy).
DeploymentFailures — Deployments that ended in a FAILED state.
DeploymentRollbacks — Rollback deployments triggered (automatic or manual).

# Terraform — CloudWatch Dashboard for deployment health
resource "aws_cloudwatch_dashboard" "codedeploy" {
  dashboard_name = "CodeDeploy-MyApp"
  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          title  = "Deployment Success vs Failure"
          period = 86400
          stat   = "Sum"
          metrics = [
            ["AWS/CodeDeploy", "DeploymentSuccesses", "Application", "myapp", "DeploymentGroup", "production"],
            ["AWS/CodeDeploy", "DeploymentFailures",  "Application", "myapp", "DeploymentGroup", "production"],
            ["AWS/CodeDeploy", "DeploymentRollbacks", "Application", "myapp", "DeploymentGroup", "production"]
          ]
        }
      },
      {
        type = "metric"
        properties = {
          title  = "Lambda Error Rate (canary window)"
          period = 60
          stat   = "Average"
          metrics = [
            ["AWS/Lambda", "Errors", "FunctionName", "myapp-api"]
          ]
        }
      }
    ]
  })
}

EventBridge Rules for Deployment Events

CodeDeploy fires EventBridge events for every state transition. Use EventBridge rules to send notifications to SNS, trigger Lambda, or post to Slack:

resource "aws_cloudwatch_event_rule" "deploy_state" {
  name        = "codedeploy-state-changes"
  description = "Catch CodeDeploy deployment state changes"

  event_pattern = jsonencode({
    source      = ["aws.codedeploy"]
    "detail-type" = ["CodeDeploy Deployment State-change Notification"]
    detail = {
      application       = ["myapp"]
      deploymentGroup   = ["production"]
      state             = ["START", "SUCCESS", "FAILURE", "STOP", "ROLLBACK"]
    }
  })
}

resource "aws_cloudwatch_event_target" "notify_sns" {
  rule = aws_cloudwatch_event_rule.deploy_state.name
  arn  = aws_sns_topic.deployments.arn

  input_transformer {
    input_paths = {
      state       = "$.detail.state"
      deployment  = "$.detail.deploymentId"
      application = "$.detail.application"
      group       = "$.detail.deploymentGroup"
    }
    input_template = "\"[CodeDeploy] / —  is \""
  }
}

resource "aws_sns_topic" "deployments" {
  name = "codedeploy-notifications"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.deployments.arn
  protocol  = "email"
  endpoint  = "devops@mycompany.com"
}

Querying Deployment History with CLI

# List last 10 deployments for a deployment group
aws deploy list-deployments \
  --application-name myapp \
  --deployment-group-name production \
  --query 'deployments[:10]' \
  --output table

# Get detailed deployment info including lifecycle events
aws deploy get-deployment \
  --deployment-id d-ABC123DEF \
  --query 'deploymentInfo.{Status:status,Rollback:rollbackInfo,StartTime:startTime,Duration:deploymentDuration}'

# List instance-level deployment status
aws deploy list-deployment-instances \
  --deployment-id d-ABC123DEF \
  --instance-status-filter Failed \
  --output table

# Get lifecycle hook execution details for a failed instance
aws deploy get-deployment-instance \
  --deployment-id d-ABC123DEF \
  --instance-id i-0abc1234def567890 \
  --query 'instanceSummary.lifecycleEvents[?status==`Failed`]'

Deployment Frequency and DORA Metrics

The DeploymentSuccesses metric is your raw deployment frequency signal — a core DORA (DevOps Research and Assessment) metric. Create a CloudWatch Insights query on the CodeDeploy API log (via CloudTrail → CloudWatch Logs) to compute mean time to recovery (MTTR): the average time between a FAILURE event and the next SUCCESS event on the same deployment group. Teams with mature CodeDeploy setups routinely achieve sub-5-minute MTTR on ECS and Lambda deployments because rollback is instant.

Best practice: Pin your CloudWatch alarm treat_missing_data to notBreaching on deployment alarms. During a cold-start deployment, some metrics will have no datapoints for the first minute — you do not want an alarm to fire simply because the new version has not yet received enough traffic to produce metric data.

Cost Considerations

CodeDeploy itself is free for EC2 and Lambda deployments. You pay only for the AWS resources used during deployment: the extra ECS tasks during a blue/green swap, the S3 requests to fetch revision artifacts, and the CloudWatch Logs storage for deployment logs. For ECS blue/green at scale, keep the termination wait time short (15—30 minutes) to avoid running double the task count unnecessarily. For Lambda deployments there is effectively zero extra cost — version pointers are free.

Frequently Asked Questions

Can I use CodeDeploy with containers not on ECS?

CodeDeploy's native ECS integration requires using the ECS launch type (EC2 or Fargate) with a CodeDeploy-controlled deployment controller. If you run containers directly on EC2 (without ECS), use the EC2/On-Premises platform and treat the container lifecycle (docker pull, docker run) as lifecycle hook commands in your appspec.yml shell scripts.

What is the difference between CodeDeployDefault.ECSCanary10Percent5Minutes and a custom config?

The built-in config shifts 10% of traffic to the new task set, waits 5 minutes, then shifts the remaining 90% all at once. A custom deployment configuration lets you specify any percentage and any interval — for example, 5% every 2 minutes — giving you finer control over the exposure curve. Use built-in configs for standard services; use custom configs for high-traffic services where even 10% canary exposure is too aggressive.

Does CodeDeploy support multi-region deployments?

Not natively. CodeDeploy operates within a single AWS region. For multi-region deployments, create a separate CodeDeploy Application and Deployment Group in each target region and use CodePipeline with cross-region actions to fan out deployments. Each region's pipeline action triggers the corresponding CodeDeploy deployment group independently.

How do I deploy to on-premises servers?

Install the CodeDeploy agent on your on-premises servers, register them as on-premises instances with aws deploy register-on-premises-instance, and tag them so your deployment group's instance tag filter matches. The deployment flow is identical to EC2 — the agent polls, fetches the revision from S3, and executes lifecycle hooks. IAM credentials for the agent are provided via the on-premises instance configuration file.