Kubernetes Cluster Autoscaler: Dynamic Node Provisioning (2026)
Running Kubernetes in production means dealing with unpredictable workload spikes. Pods queue up as Pending because no node has enough CPU or memory, while at night those same nodes sit at 5% utilization burning your cloud budget. The Cluster Autoscaler (CA) solves both problems: it adds nodes when pods cannot be scheduled and removes them when they have been idle long enough to be safe to drain.
This guide covers the complete lifecycle of CA on Amazon EKS in 2026 — from installation and IAM permissions to expander strategies, spot instance node groups, PodDisruptionBudget interactions, and a head-to-head comparison with Karpenter.
1. HPA vs VPA vs Cluster Autoscaler — Which Scales What
Kubernetes ships with three complementary autoscalers. Confusing them is the most common reason teams end up either wasting money or dropping traffic.
| Autoscaler | What It Scales | Metric Source | Best For |
|---|---|---|---|
| HPA (Horizontal Pod Autoscaler) | Replica count of a Deployment / StatefulSet | CPU, memory, custom metrics (KEDA) | Stateless web services, APIs |
| VPA (Vertical Pod Autoscaler) | CPU/memory requests of existing pods | Historical usage via Metrics Server | Batch jobs, ML training pods |
| Cluster Autoscaler | Number of nodes in a node group | Pending pods + node utilization | Node-level capacity management |
See Kubernetes HPA Scaling and Kubernetes Resource Management for the pod-side of this equation.
2. How Cluster Autoscaler Works
CA runs as a Deployment inside your cluster (typically in the kube-system namespace). Every 10 seconds it evaluates two questions:
- Are there Pending pods? If yes, find a node group that could accommodate them and call the cloud provider API to add a node.
- Are any nodes underutilized? If a node has been below the utilization threshold for a configurable window, drain it and remove it.
Scale-Up Trigger
CA watches for pods in the Pending state with the condition reason: Unschedulable. It simulates placing those pods on each node group's hypothetical new node. If the simulation succeeds, it increments the node group's desired count by the minimum number of nodes needed to place all pending pods.
Scale-Down Idle Threshold
A node is a candidate for removal when all of the following are true for at least --scale-down-unneeded-time (default 10 minutes):
- Node CPU + memory requests are below
--scale-down-utilization-threshold(default 50%) - All pods on the node can be safely evicted (no blocking PodDisruptionBudgets, no
safe-to-evict: "false"annotations) - The node group is above its minimum size
--scale-down-delay-after-add (default 10 minutes) before considering scale-down on any node. This prevents thrashing when new nodes are still receiving pods.
3. Installing Cluster Autoscaler on EKS
Step 1 — IAM Policy
CA needs permission to describe and modify Auto Scaling Groups. Create a policy and attach it via IRSA (IAM Roles for Service Accounts).
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeInstanceTypes",
"eks:DescribeNodegroup"
],
"Resource": "*"
}
]
}
Step 2 — Create IRSA Role
eksctl create iamserviceaccount \
--cluster=my-cluster \
--namespace=kube-system \
--name=cluster-autoscaler \
--attach-policy-arn=arn:aws:iam::ACCOUNT_ID:policy/ClusterAutoscalerPolicy \
--approve \
--override-existing-serviceaccounts
Step 3 — Install via Helm
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=us-east-1 \
--set rbac.serviceAccount.create=false \
--set rbac.serviceAccount.name=cluster-autoscaler \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-system-pods=false \
--set extraArgs.scale-down-utilization-threshold=0.5 \
--set extraArgs.scale-down-unneeded-time=10m \
--set extraArgs.scale-down-delay-after-add=10m
k8s.io/cluster-autoscaler/my-cluster = ownedk8s.io/cluster-autoscaler/enabled = true
Deployment YAML (alternative to Helm)
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
containers:
- image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --scale-down-utilization-threshold=0.5
env:
- name: AWS_REGION
value: us-east-1
resources:
limits:
cpu: 100m
memory: 600Mi
requests:
cpu: 100m
memory: 600Mi
4. Node Group Configuration
Each EKS managed node group exposes min/max/desired capacity to CA via ASG tags. Proper labeling lets you create specialized node groups (e.g., GPU nodes, high-memory nodes) and use node selectors / affinity rules to route workloads correctly.
# eksctl cluster config snippet
managedNodeGroups:
- name: general-workers
instanceType: m5.xlarge
minSize: 2
maxSize: 20
desiredCapacity: 4
labels:
role: worker
workload-type: general
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
- name: gpu-workers
instanceType: g4dn.xlarge
minSize: 0
maxSize: 5
desiredCapacity: 0
labels:
role: gpu-worker
nvidia.com/gpu: "true"
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
Setting minSize: 0 on the GPU node group allows CA to scale it all the way to zero during off-peak hours — a significant cost saving for batch ML workloads. See Kubernetes Taints and Tolerations for how to route only GPU pods to that node group.
5. Scale-Up: Triggers, Node Group Selection, and Timing
When a pod remains Pending for more than --max-node-provision-time (default 15 minutes) without a node being provisioned, CA logs an error. Here is the normal scale-up flow:
- Scheduler marks pod as
Unschedulable— no current node fits the pod's resource requests. - CA's main loop (every 10 seconds) detects the Pending pod.
- CA simulates placing the pod on a new node from each eligible node group using the node group's instance type, labels, and taints.
- CA chooses the best node group according to the configured expander (see Section 8).
- CA calls the AWS Auto Scaling API to increment the ASG's desired count.
- The new EC2 instance registers with EKS, kubelet starts, node becomes
Ready— typically 60–180 seconds. - The Scheduler places the pending pod on the new node.
resources.requests unset. Always set CPU and memory requests on every container. See Kubernetes Resource Management.
6. Scale-Down: Utilization Threshold, Delays, and Annotations
Scale-down is more conservative than scale-up because evicting pods carries risk. CA uses a multi-stage check:
- Utilization check: sum of all pod requests on the node divided by node allocatable. If below
--scale-down-utilization-threshold(default 0.5 = 50%), the node is "unneeded". - Unneeded timer: the node must remain unneeded for
--scale-down-unneeded-time(default 10m) before it is eligible for removal. - Post-add delay: after any scale-up event, CA waits
--scale-down-delay-after-add(default 10m) before starting scale-down evaluation across the whole cluster. - Eviction check: CA dry-runs an eviction of every pod on the candidate node. If any pod would violate a PDB or has the annotation below, the node is skipped.
To prevent a pod from being evicted during scale-down:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
safe-to-evict: "false" block the entire node from being scaled down, not just themselves. Use this annotation sparingly — typically only for pods running critical local state (e.g., a node-local cache that cannot be rebuilt quickly).
7. PodDisruptionBudgets and Scale-Down
A PodDisruptionBudget (PDB) tells Kubernetes the minimum number of replicas that must remain available during voluntary disruptions — including CA-initiated evictions.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: frontend-pdb
namespace: production
spec:
minAvailable: 2 # at least 2 replicas must stay up
selector:
matchLabels:
app: frontend
During scale-down, CA will attempt to evict all pods on the candidate node. If evicting a pod would violate any PDB, CA cancels the eviction for that node and moves on to the next candidate. The node remains in the cluster until the PDB allows the eviction.
See Kubernetes Deployments for rolling update strategies that work alongside PDBs.
8. Expanders: Choosing the Right Node Group
When multiple node groups could satisfy a pending pod, CA uses an expander to pick one. Set it with --expander=<name>.
| Expander | Selection Strategy | Best For |
|---|---|---|
least-waste | Pick the node group that wastes the least CPU/memory after placing pods | General production clusters — minimizes cost |
most-pods | Pick the node group that can schedule the most pods after scaling | Batch workloads with many small pods |
random | Randomly pick an eligible node group | Testing, simple homogeneous clusters |
priority | Rank node groups by a user-defined ConfigMap priority list | Spot-first strategies: prefer spot, fall back to on-demand |
grpc | Delegate the decision to an external gRPC service | Custom business logic (e.g., compliance, geographic placement) |
Priority Expander Example
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
data:
priorities: |-
100:
- .*spot.* # prefer any node group with "spot" in the name
50:
- .*on-demand.* # fall back to on-demand if no spot capacity
10:
- .* # catch-all
With this ConfigMap and --expander=priority, CA will always try to scale the spot node group first and only provision on-demand nodes if the spot group is at max capacity or unavailable.
9. Spot Instance Node Groups and Mixed Instance Policy
Spot instances offer up to 90% discount over on-demand pricing but can be interrupted with a 2-minute warning. The recommended pattern on EKS is a mixed instance policy with an on-demand base capacity.
# eksctl managed node group with spot + on-demand mix
managedNodeGroups:
- name: spot-mixed
instanceTypes:
- m5.xlarge
- m5a.xlarge
- m4.xlarge
- m5d.xlarge
spot: true
minSize: 0
maxSize: 30
desiredCapacity: 5
labels:
lifecycle: spot
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
- name: on-demand-base
instanceType: m5.xlarge
minSize: 2 # always keep 2 on-demand nodes as baseline
maxSize: 10
desiredCapacity: 2
labels:
lifecycle: on-demand
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
Use --balance-similar-node-groups=true to have CA spread nodes evenly across multiple similar node groups (e.g., different AZs), which improves availability for stateful workloads using EBS volumes.
10. Karpenter vs Cluster Autoscaler
Karpenter is an open-source node provisioner from AWS that operates at a lower level than CA — it provisions EC2 instances directly without managing Auto Scaling Groups. In 2026, both are production-ready; the choice depends on your requirements.
| Feature | Cluster Autoscaler | Karpenter |
|---|---|---|
| Provisioning model | Pre-defined node groups (ASGs) | Dynamic, per-pod instance selection |
| Instance flexibility | Limited to node group instance types | Any EC2 instance type matching pod requirements |
| Scale-up latency | ~90–180 seconds | ~45–90 seconds (direct EC2 API) |
| Bin packing | Via expander heuristics | Native — picks right-sized instance for pods |
| Spot consolidation | Manual via multiple node groups | Built-in disruption controller |
| Cloud support | AWS, GCP, Azure, and more | AWS, Azure (preview), GCP (community) |
| Configuration complexity | Low — familiar Deployment + flags | Medium — NodePool + NodeClass CRDs |
| Migration effort | Baseline | Medium (need to replace node groups with NodePools) |
Stick with CA if: you already have well-tuned node groups, use a multi-cloud setup, or your team is not ready to adopt new CRDs.
Migrate to Karpenter if: you want faster scale-up, better spot diversity, or automatic node consolidation (Karpenter's disruption controller can replace two under-utilized nodes with one larger node without any manual configuration).
11. Overprovisioning with Placeholder Pause Pods
CA only adds nodes after pods go Pending, which introduces a delay (the time to provision and boot a new EC2 instance). For latency-sensitive workloads, you can pre-warm spare capacity using placeholder pause pods.
apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning
namespace: kube-system
spec:
replicas: 3 # 3 placeholder pods = ~3 spare node slots
selector:
matchLabels:
app: overprovisioning
template:
metadata:
labels:
app: overprovisioning
spec:
priorityClassName: overprovisioning # low priority — real pods evict these
terminationGracePeriodSeconds: 0
containers:
- name: pause
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "1500m" # size to match one typical real pod
memory: "2Gi"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: overprovisioning
value: -1 # below default (0) — real pods always win
globalDefault: false
description: "Placeholder pods for node overprovisioning"
When a real pod arrives and the cluster is full, it evicts a placeholder pod (lower priority), the placeholder goes Pending, CA sees the Pending placeholder and provisions a new node, and that new node absorbs future real pods with zero wait time.
12. Monitoring Cluster Autoscaler
CA Logs
# Stream CA logs in real time
kubectl logs -n kube-system \
-l app=cluster-autoscaler \
--follow --tail=100
# Filter for scale-up events
kubectl logs -n kube-system \
-l app=cluster-autoscaler \
--tail=500 | grep "scale up"
# Filter for scale-down
kubectl logs -n kube-system \
-l app=cluster-autoscaler \
--tail=500 | grep "removing node"
Status ConfigMap
# CA writes its current status here every 60 seconds
kubectl get configmap cluster-autoscaler-status \
-n kube-system \
-o yaml
The status ConfigMap includes node group status (healthy/unhealthy), last scale-up time, nodes that are candidates for removal, and any errors CA encountered.
Key Prometheus Metrics
CA exposes metrics on port 8085. Scrape them with your Prometheus instance and alert on these key signals:
| Metric | Description |
|---|---|
cluster_autoscaler_nodes_count | Current node count per state (ready, unready, cordoned) |
cluster_autoscaler_unschedulable_pods_count | Number of pods CA is trying to schedule — should trend to 0 |
cluster_autoscaler_scale_up_in_cooldown | 1 if scale-up is blocked by cooldown period |
cluster_autoscaler_skipped_scale_events_count | Events skipped due to max node group size or other limits |
cluster_autoscaler_last_activity | Timestamp of last CA decision loop — alert if stale |
For dashboards and alerting setup, see Kubernetes Monitoring with Prometheus.
13. Common Issues and Troubleshooting
Issue 1: Scale-Down Blocked by Pod Annotations
Symptom: Nodes are consistently at low utilization but CA never removes them. CA logs show: pod ... has ClusterAutoscaler annotation preventing scale-down
Fix: Audit your pods for safe-to-evict: "false" annotations. DaemonSet pods and pods with local storage are automatically blocked — this is expected. Non-DaemonSet pods with the annotation set by hand need review.
# Find all pods blocking scale-down
kubectl get pods --all-namespaces -o json | \
jq '.items[] | select(.metadata.annotations["cluster-autoscaler.kubernetes.io/safe-to-evict"]=="false") | .metadata.namespace + "/" + .metadata.name'
Issue 2: Unschedulable Pods Not Triggering Scale-Up
Symptom: Pods are Pending but CA is not adding nodes.
Common causes and checks:
- Pod requests exceed any single node type's capacity — check
kubectl describe pod <pod>for the exactInsufficient cpu/memorymessage. - Node group is already at
maxSize— checkkubectl get configmap cluster-autoscaler-status -n kube-system -o yaml. - Pod has node affinity or a taint toleration that no node group can satisfy — CA cannot schedule across incompatible node groups.
- Missing resource requests on the pod — CA ignores pods with no requests set.
Issue 3: Nodes Cycling (Scale-Up Immediately Followed by Scale-Down)
Symptom: A new node appears, gets a few pods, then gets removed a few minutes later.
Fix: Increase --scale-down-delay-after-add to give pods time to stabilize (try 15–20 minutes). Also ensure pods have proper readiness probes so they report Ready before CA evaluates utilization.
Issue 4: CA Cannot Assume IAM Role
Symptom: CA logs show NoCredentialProviders: no valid providers in chain
Fix: Verify the Service Account annotation points to the correct IAM role ARN:
kubectl get serviceaccount cluster-autoscaler \
-n kube-system \
-o jsonpath='{.metadata.annotations}'
Expected output: {"eks.amazonaws.com/role-arn":"arn:aws:iam::ACCOUNT:role/ClusterAutoscalerRole"}
Summary
Kubernetes Cluster Autoscaler is essential infrastructure for any production EKS cluster. To recap the key points:
- CA scales nodes; HPA scales pods. They are complementary — deploy both.
- CA triggers scale-up when pods are
Unschedulable, and scale-down when node utilization stays below 50% for 10 minutes. - Install via IRSA + Helm for proper IAM scoping and GitOps compatibility.
- Use the
priorityexpander with spot-first node groups to cut costs by 60–80%. - Define PodDisruptionBudgets for all production Deployments to ensure safe eviction during scale-down.
- Use placeholder pause pods at low priority to pre-warm capacity and eliminate scale-up latency for burst traffic.
- Monitor via the
cluster-autoscaler-statusConfigMap and Prometheus metrics. - Consider migrating to Karpenter when you need faster provisioning, right-sizing, or automatic consolidation.
For a broader Kubernetes foundation, start with Kubernetes Complete Guide and Kubernetes Pods Guide. For cluster security hardening alongside autoscaling, see Kubernetes Security Best Practices. To manage CA installation as a Helm chart in your GitOps pipeline, see Kubernetes Helm Guide.