AWS Auto Scaling: Policies, Scheduled Scaling and Predictive
AWS Auto Scaling is the mechanism that makes cloud economics work — you pay for exactly the compute you need, when you need it, and never overpay for idle capacity during quiet hours. But "just turn on Auto Scaling" is not a strategy. The difference between an ASG that works and one that over-provisions, thrashes, or fails to scale fast enough comes down to understanding which scaling policy to use and how to tune it. This guide covers every major Auto Scaling capability: EC2 Auto Scaling Groups, all three scaling policy types, scheduled scaling, predictive scaling, warm pools, lifecycle hooks, and Application Auto Scaling for ECS, DynamoDB, and Aurora.
Table of Contents
- EC2 Auto Scaling Group Fundamentals
- Launch Templates for ASGs
- Scaling Policies: Target Tracking, Step, and Simple
- Scheduled Scaling
- Predictive Scaling
- Warm Pools for Fast Scale-Out
- Lifecycle Hooks
- ALB Integration and Health Checks
- Application Auto Scaling: ECS, DynamoDB, Aurora
- Best Practices for Cost and Performance
- Frequently Asked Questions
EC2 Auto Scaling Group Fundamentals
An Auto Scaling Group (ASG) is a logical collection of EC2 instances that AWS treats as a fleet. You define the minimum, maximum, and desired capacity. The ASG maintains the desired capacity by launching or terminating instances, replacing unhealthy instances automatically, and distributing instances across multiple Availability Zones for high availability.
Three numbers define the size boundaries of every ASG:
- MinSize — the floor. AWS will never terminate instances below this count, even if all scaling policies say to shrink.
- MaxSize — the ceiling. No matter how much load spikes, AWS will not exceed this count (a guard against runaway cost).
- DesiredCapacity — the current target. Scaling policies and manual updates change this value; the ASG then converges the actual instance count to match it.
ASGs span Availability Zones. When you specify multiple subnets (one per AZ), the ASG uses a balanced distribution strategy — it tries to keep an equal number of instances in each AZ. If an AZ loses instances, the ASG re-balances by launching in others.
# Create an Auto Scaling Group with the AWS CLI
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name my-app-asg \
--launch-template LaunchTemplateId=lt-0123456789abcdef0,Version='$Latest' \
--min-size 2 \
--max-size 20 \
--desired-capacity 4 \
--vpc-zone-identifier "subnet-aaa111,subnet-bbb222,subnet-ccc333" \
--health-check-type ELB \
--health-check-grace-period 300 \
--target-group-arns "arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123" \
--tags "Key=Environment,Value=production,PropagateAtLaunch=true" \
"Key=Name,Value=my-app,PropagateAtLaunch=true"
--health-check-type ELB instead of the default EC2. With EC2 health checks, the ASG only detects a failed instance when the EC2 hypervisor marks it terminated. With ELB health checks, the ASG also detects when your application is unhealthy (e.g., returning 5xx errors or failing health endpoint checks) and replaces those instances automatically.Launch Templates for ASGs
Every ASG needs a launch template (or the older launch configuration — always prefer launch templates). The launch template specifies the AMI, instance type, key pair, security groups, IAM instance profile, user data, and storage configuration for instances the ASG will launch.
Launch templates support mixed instances policies, which let you run a blend of On-Demand and Spot instances across multiple instance types. This is the key to 60–80% cost reduction while maintaining availability:
{
"AutoScalingGroupName": "my-app-asg",
"MixedInstancesPolicy": {
"InstancesDistribution": {
"OnDemandBaseCapacity": 2,
"OnDemandPercentageAboveBaseCapacity": 20,
"SpotAllocationStrategy": "capacity-optimized-prioritized"
},
"LaunchTemplate": {
"LaunchTemplateSpecification": {
"LaunchTemplateId": "lt-0123456789abcdef0",
"Version": "$Latest"
},
"Overrides": [
{"InstanceType": "m6i.large"},
{"InstanceType": "m5.large"},
{"InstanceType": "m5a.large"},
{"InstanceType": "m6a.large"},
{"InstanceType": "m7i.large"}
]
}
},
"MinSize": 2,
"MaxSize": 30,
"DesiredCapacity": 6
}
capacity-optimized-prioritized picks the Spot pool with the most available capacity (lowest interruption risk) while still respecting the order of your instance type overrides. Use this over lowest-price — saving an extra $0.01/hr is not worth higher interruption rates in production.You can also override the instance type with attribute-based instance type selection — specify vCPU and memory ranges instead of an explicit list, and AWS selects all matching instance types automatically. This future-proofs your fleet against new instance generations:
# Attribute-based selection: 2-4 vCPUs, 8-16 GiB RAM, x86_64
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name attr-based-asg \
--mixed-instances-policy '{
"LaunchTemplate": {
"LaunchTemplateSpecification": {"LaunchTemplateId": "lt-abc", "Version": "$Latest"},
"Overrides": [{
"InstanceRequirements": {
"VCpuCount": {"Min": 2, "Max": 4},
"MemoryMiB": {"Min": 8192, "Max": 16384},
"CpuManufacturers": ["intel", "amd"],
"InstanceGenerations": ["current"]
}
}]
},
"InstancesDistribution": {
"SpotAllocationStrategy": "capacity-optimized"
}
}' \
--min-size 2 --max-size 20 --desired-capacity 4 \
--vpc-zone-identifier "subnet-aaa111,subnet-bbb222"
Scaling Policies: Target Tracking, Step, and Simple
AWS provides three types of dynamic scaling policies. Each has a distinct use case and trade-off between simplicity and control.
Target Tracking Scaling (Recommended for Most Cases)
Target tracking is the simplest and most effective policy for the majority of workloads. You declare a target metric value, and the ASG continuously adjusts capacity to maintain that target — similar to a thermostat. AWS automatically creates and manages the CloudWatch alarms behind the scenes.
# Target tracking: maintain average CPU at 60%
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 60.0,
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60,
"DisableScaleIn": false
}'
# Target tracking on ALB RequestCountPerTarget (better for web apps)
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name alb-request-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ALBRequestCountPerTarget",
"ResourceLabel": "app/my-alb/abc123/targetgroup/my-tg/def456"
},
"TargetValue": 1000.0,
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}'
Predefined metrics available for target tracking: ASGAverageCPUUtilization, ASGAverageNetworkIn, ASGAverageNetworkOut, ALBRequestCountPerTarget. For custom metrics (e.g., queue depth, latency p99), use a custom metric specification.
Step Scaling
Step scaling gives you precise control over how aggressively the ASG responds at different alarm thresholds. You define steps — ranges of metric values — and specify how many instances to add or remove in each step. This is useful when you want a conservative response to mild load but an aggressive response to a spike.
# Step scaling: scale out 2 at 70% CPU, 4 at 85% CPU, 6 at 95% CPU
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name step-scale-out \
--policy-type StepScaling \
--adjustment-type ChangeInCapacity \
--step-adjustments '[
{
"MetricIntervalLowerBound": 0,
"MetricIntervalUpperBound": 15,
"ScalingAdjustment": 2
},
{
"MetricIntervalLowerBound": 15,
"MetricIntervalUpperBound": 25,
"ScalingAdjustment": 4
},
{
"MetricIntervalLowerBound": 25,
"ScalingAdjustment": 6
}
]' \
--estimated-instance-warmup 120
Simple Scaling
Simple scaling is the original policy type. It fires when a CloudWatch alarm breaches, adds or removes a fixed number of instances, and then waits for a cooldown period before it can fire again. It is slower to respond than step scaling because it waits for the full cooldown even if the alarm is still breaching. Use simple scaling only for legacy integrations; prefer step or target tracking for new work.
ScaleInCooldown and ScaleOutCooldown values (in seconds) control how long the ASG waits after a scaling activity before evaluating another one. Set scale-out cooldown low (60–120s) so the ASG responds to spikes quickly. Set scale-in cooldown high (300–600s) to avoid terminating instances during a brief traffic dip that's about to recover.Scheduled Scaling
Scheduled scaling lets you pre-emptively adjust capacity on a time-based schedule. It is the right tool when you have predictable traffic patterns — business hours traffic, nightly batch jobs, weekly marketing email blasts, or known events like product launches.
Unlike dynamic scaling that reacts to current load, scheduled scaling acts before the load arrives. You update the MinSize, MaxSize, or DesiredCapacity on a cron or one-time schedule.
# Scale up before business hours (8 AM Mon-Fri UTC)
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-app-asg \
--scheduled-action-name scale-up-business-hours \
--recurrence "0 8 * * MON-FRI" \
--min-size 6 \
--max-size 30 \
--desired-capacity 10
# Scale down after hours (8 PM Mon-Fri UTC)
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-app-asg \
--scheduled-action-name scale-down-after-hours \
--recurrence "0 20 * * MON-FRI" \
--min-size 2 \
--max-size 10 \
--desired-capacity 3
# One-time scale-up for a product launch on July 4th at midnight UTC
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-app-asg \
--scheduled-action-name product-launch-surge \
--start-time "2026-07-04T00:00:00Z" \
--min-size 20 \
--max-size 60 \
--desired-capacity 30
All times are in UTC by default. You can specify a time zone in the --time-zone parameter (e.g., America/New_York) if you prefer wall-clock scheduling.
Predictive Scaling
Predictive scaling uses machine learning to analyze your historical traffic patterns and proactively scale out capacity before demand arrives. It looks at 14 days of CloudWatch metric history for the ASG and identifies repeating patterns (daily and weekly cycles). It then forecasts the next 48 hours and schedules pre-emptive scaling actions.
Predictive scaling works best when your traffic has regular weekly patterns — the classic Monday-morning surge, the lunch dip, the Friday evening drop. It does not help with irregular spikes (use target tracking for those).
# Enable predictive scaling with ForecastAndScale mode
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-app-asg \
--policy-name predictive-cpu \
--policy-type PredictiveScaling \
--predictive-scaling-configuration '{
"MetricSpecifications": [{
"TargetValue": 60.0,
"PredefinedMetricPairSpecification": {
"PredefinedMetricType": "ASGCPUUtilization"
}
}],
"Mode": "ForecastAndScale",
"SchedulingBufferTime": 300,
"MaxCapacityBreachBehavior": "IncreaseMaxCapacity",
"MaxCapacityBuffer": 10
}'
The Mode has two options:
- ForecastOnly — AWS generates forecasts and shows you what it would do, but takes no action. Use this first to validate that the predictions match reality before committing to automated scaling.
- ForecastAndScale — AWS both forecasts and automatically scales out ahead of predicted demand.
SchedulingBufferTime (300 = 5 minutes) tells the ASG to launch instances this many seconds before the predicted traffic increase arrives, accounting for instance warm-up time. MaxCapacityBuffer (10 = 10%) adds a buffer above the predicted required capacity — useful when your forecast model underestimates peak.
ForecastOnly mode for 1–2 weeks to review predictions in the AWS console (Auto Scaling > Predictive scaling > Forecast). Once predictions look accurate, switch to ForecastAndScale and keep target tracking enabled simultaneously. The two policies complement each other: predictive handles anticipated load, target tracking handles unexpected spikes.Warm Pools for Fast Scale-Out
One of the most impactful but underused Auto Scaling features is Warm Pools. A warm pool is a collection of pre-initialized EC2 instances that are kept in a Stopped (or Running) state, outside the active ASG capacity. When the ASG needs to scale out, it pulls instances from the warm pool instead of launching cold ones. The result: scale-out latency drops from 3–5 minutes to under 30 seconds.
Warm pools are most valuable when your instances have a long initialization time — installing packages, loading large model weights, warming up a JVM, hydrating a local cache, or running database migration checks on startup.
# Add a warm pool: keep up to 5 instances in Stopped state
aws autoscaling put-warm-pool \
--auto-scaling-group-name my-app-asg \
--pool-state Stopped \
--min-size 2 \
--max-group-prepared-capacity 5
# Check warm pool status
aws autoscaling describe-warm-pool \
--auto-scaling-group-name my-app-asg
Pool state options:
- Stopped — instances are stopped (you pay only for EBS storage, not compute). Start time adds ~30 seconds. Best cost/speed balance for most applications.
- Running — instances are running and fully initialized (you pay full compute cost). Use when application startup itself is slow even after OS boot — e.g., a Java service that takes 2 minutes to warm its cache.
- Hibernated — instances are hibernated (RAM state preserved). Fastest resume time, but requires hibernation-compatible instance types and EBS root volumes.
Lifecycle Hooks
Lifecycle hooks pause the ASG during two critical transitions: when an instance is being added to the group (launching) and when an instance is being removed (terminating). During the pause, you can run custom logic — install software, register with a service mesh, drain connections, push logs, deregister from a monitoring system.
Without lifecycle hooks, instances enter the load balancer immediately on launch (before your app might be ready) and are terminated immediately on scale-in (potentially cutting active connections). Lifecycle hooks solve both problems.
# Lifecycle hook for scale-out: pause for 5 minutes during launch
aws autoscaling put-lifecycle-hook \
--auto-scaling-group-name my-app-asg \
--lifecycle-hook-name launch-warmup-hook \
--lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
--default-result CONTINUE \
--heartbeat-timeout 300 \
--notification-metadata '{"action":"warmup"}'
# Lifecycle hook for scale-in: drain connections before termination
aws autoscaling put-lifecycle-hook \
--auto-scaling-group-name my-app-asg \
--lifecycle-hook-name termination-drain-hook \
--lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING \
--default-result CONTINUE \
--heartbeat-timeout 120
When a lifecycle hook fires, the instance enters a wait state (e.g., Pending:Wait for launch hooks). Your code — typically a Lambda function triggered by EventBridge — performs the necessary action and then calls complete-lifecycle-action to signal completion:
# Signal the lifecycle hook to proceed (called from Lambda or the instance itself)
aws autoscaling complete-lifecycle-action \
--auto-scaling-group-name my-app-asg \
--lifecycle-hook-name launch-warmup-hook \
--lifecycle-action-result CONTINUE \
--instance-id i-0123456789abcdef0
resource "aws_autoscaling_lifecycle_hook" "termination_drain" {
name = "termination-drain-hook"
autoscaling_group_name = aws_autoscaling_group.app.name
lifecycle_transition = "autoscaling:EC2_INSTANCE_TERMINATING"
default_result = "CONTINUE"
heartbeat_timeout = 120
notification_target_arn = aws_sns_topic.lifecycle_events.arn
role_arn = aws_iam_role.lifecycle_role.arn
}
resource "aws_autoscaling_lifecycle_hook" "launch_hook" {
name = "launch-warmup-hook"
autoscaling_group_name = aws_autoscaling_group.app.name
lifecycle_transition = "autoscaling:EC2_INSTANCE_LAUNCHING"
default_result = "CONTINUE"
heartbeat_timeout = 300
}
ALB Integration and Health Checks
Auto Scaling Groups integrate tightly with Application Load Balancers. The ASG registers new instances with the ALB target group on launch and deregisters them on termination. Getting this integration right is critical for zero-downtime deployments and reliable health replacement.
# Attach an ALB target group to an existing ASG
aws autoscaling attach-load-balancer-target-groups \
--auto-scaling-group-name my-app-asg \
--target-group-arns "arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/my-tg/abc"
# Update health check settings
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-app-asg \
--health-check-type ELB \
--health-check-grace-period 300
The health check grace period (300 seconds in the example) tells the ASG to ignore health check failures for this many seconds after an instance launches. Without this, the ASG might terminate an instance that is still initializing — its health endpoint isn't ready yet — causing a launch/terminate loop.
For connection draining (preventing in-flight requests from being cut off during scale-in), configure deregistration delay on the ALB target group:
# Set deregistration delay to 30 seconds (reduce from default 300s)
aws elbv2 modify-target-group-attributes \
--target-group-arn "arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/my-tg/abc" \
--attributes Key=deregistration_delay.timeout_seconds,Value=30
During scale-in, the sequence is: ASG puts instance in InService:Terminating → ALB stops sending new requests → deregistration delay timer runs → ALB deregisters instance → ASG terminates instance. Existing in-flight requests complete during the delay window.
Application Auto Scaling: ECS, DynamoDB, Aurora
EC2 Auto Scaling handles EC2 fleets. Application Auto Scaling is the unified scaling service for other AWS resources: ECS tasks, DynamoDB read/write capacity units, Aurora read replicas, ElastiCache shards, SageMaker endpoints, and more. The same concepts (target tracking, step scaling, scheduled scaling) apply, but through the Application Auto Scaling API.
ECS Service Auto Scaling
# Register an ECS service as a scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id "service/my-cluster/my-service" \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 50
# Target tracking: keep ECS CPU at 60%
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id "service/my-cluster/my-service" \
--scalable-dimension ecs:service:DesiredCount \
--policy-name ecs-cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"TargetValue": 60.0,
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}'
DynamoDB Auto Scaling
# Register DynamoDB table read capacity as scalable target
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id "table/my-table" \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--min-capacity 5 \
--max-capacity 1000
# Target tracking: keep consumed read capacity at 70%
aws application-autoscaling put-scaling-policy \
--service-namespace dynamodb \
--resource-id "table/my-table" \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--policy-name dynamodb-read-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBReadCapacityUtilization"
},
"TargetValue": 70.0
}'
Aurora Auto Scaling Read Replicas
# Auto scale Aurora read replicas (1 to 5 replicas)
aws application-autoscaling register-scalable-target \
--service-namespace rds \
--resource-id "cluster:my-aurora-cluster" \
--scalable-dimension rds:cluster:ReadReplicaCount \
--min-capacity 1 \
--max-capacity 5
aws application-autoscaling put-scaling-policy \
--service-namespace rds \
--resource-id "cluster:my-aurora-cluster" \
--scalable-dimension rds:cluster:ReadReplicaCount \
--policy-name aurora-replica-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "RDSReaderAverageDatabaseConnections"
},
"TargetValue": 200.0,
"ScaleInCooldown": 600,
"ScaleOutCooldown": 120
}'
Best Practices for Cost and Performance
Choose the Right Policy Combination
- For most web applications: predictive scaling + target tracking on ALB RequestCountPerTarget. Predictive handles anticipated load; target tracking handles unexpected spikes.
- For queue-based workers: target tracking on SQS queue depth (custom metric: messages visible / number of instances).
- For strict cost ceilings: step scaling with PercentChangeInCapacity so scaling steps are proportional to fleet size as it grows.
Golden AMIs + Warm Pools = Fast Scale-Out
Every minute your instance spends installing packages is a minute your users experience degraded performance. Bake a golden AMI with all dependencies pre-installed. Add a warm pool for sub-30-second scale-out. For Java apps, consider compiling to GraalVM native images or use container-based deployment on ECS for faster startup.
Use Multiple Instance Types
Always configure at least 5 instance types in your mixed instances policy. Spot capacity availability varies by instance type and AZ. More types = lower interruption probability. Use attribute-based selection to automatically include new generation types as AWS releases them.
Termination Policies
Configure termination policies to control which instances the ASG terminates first during scale-in. The default policy terminates the oldest instances, which can work against you if you want to protect recently launched instances with warm caches. Consider OldestLaunchTemplate to preferentially remove instances running old AMI versions during rolling updates.
# Set termination policy: prefer oldest launch template, then oldest instance
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-app-asg \
--termination-policies "OldestLaunchTemplate" "OldestInstance"
Monitor with CloudWatch Metrics
Enable detailed monitoring on your ASG instances (1-minute granularity instead of 5-minute) to reduce scaling lag. Key metrics to watch: GroupInServiceInstances, GroupPendingInstances, GroupTerminatingInstances, and the WarmPoolMinSize / WarmPoolWarmedCapacity metrics if using warm pools.
# Enable detailed monitoring for faster CloudWatch metric resolution
aws autoscaling enable-metrics-collection \
--auto-scaling-group-name my-app-asg \
--granularity "1Minute"
Frequently Asked Questions
What is the difference between target tracking and predictive scaling?
Target tracking is reactive — it responds to current metric values. If CPU hits 80% right now, it scales out. Predictive scaling is proactive — it analyzes historical patterns and scales out before demand arrives. Use both together: predictive handles the load you can anticipate (morning traffic surge, weekly peak), target tracking handles unexpected spikes on top of that.
How do I prevent scale-in from terminating instances with active sessions?
Use a termination lifecycle hook (autoscaling:EC2_INSTANCE_TERMINATING) to drain connections before termination. Also set an appropriate ALB deregistration delay (30–60 seconds for most APIs). For stateful applications, use instance scale-in protection (aws autoscaling set-instance-protection) on instances that currently hold active session state, and clear the protection when they are ready to terminate.
How many instance types should I include in a mixed instances policy?
At least 5, ideally 8–10. AWS recommends a diverse pool because Spot interruption rates vary significantly by instance type, AZ, and time of day. More diversity means the ASG can always find capacity. Use the EC2 Spot Instance Advisor to identify instance types with low interruption rates in your region.
Can I use Auto Scaling with containers?
Yes, in two layers. EC2 Auto Scaling manages the EC2 instances in your ECS cluster (the underlying hosts). Application Auto Scaling manages the ECS service task count (the containers). For serverless containers, use ECS Fargate — Application Auto Scaling still manages task count, but you have no EC2 instances to manage. EKS (Kubernetes) uses the Cluster Autoscaler (which calls EC2 Auto Scaling) or Karpenter for node scaling, and the Kubernetes Horizontal Pod Autoscaler for pod scaling.
What happens if an instance fails its health check during the grace period?
During the health check grace period, the ASG ignores ELB health check failures. It still responds to EC2-level failures (if the instance is terminated at the hypervisor level). Once the grace period expires, if the ELB health check shows the instance unhealthy, the ASG marks it for replacement, terminates it, and launches a new one.
How do warm pools interact with scheduled scaling?
Warm pools and scheduled scaling work together well. When a scheduled action increases desired capacity, the ASG pulls from the warm pool first (near-instant) before launching cold instances. The warm pool then replenishes asynchronously. Pre-populate your warm pool before a scheduled scale-up event by increasing its minimum size an hour beforehand.