Kubernetes Cluster Autoscaler: Dynamic Node Provisioning (2026)

Kubernetes Cluster Autoscaler

Running Kubernetes in production means dealing with unpredictable workload spikes. Pods queue up as Pending because no node has enough CPU or memory, while at night those same nodes sit at 5% utilization burning your cloud budget. The Cluster Autoscaler (CA) solves both problems: it adds nodes when pods cannot be scheduled and removes them when they have been idle long enough to be safe to drain.

This guide covers the complete lifecycle of CA on Amazon EKS in 2026 — from installation and IAM permissions to expander strategies, spot instance node groups, PodDisruptionBudget interactions, and a head-to-head comparison with Karpenter.

1. HPA vs VPA vs Cluster Autoscaler — Which Scales What

Kubernetes ships with three complementary autoscalers. Confusing them is the most common reason teams end up either wasting money or dropping traffic.

AutoscalerWhat It ScalesMetric SourceBest For
HPA (Horizontal Pod Autoscaler)Replica count of a Deployment / StatefulSetCPU, memory, custom metrics (KEDA)Stateless web services, APIs
VPA (Vertical Pod Autoscaler)CPU/memory requests of existing podsHistorical usage via Metrics ServerBatch jobs, ML training pods
Cluster AutoscalerNumber of nodes in a node groupPending pods + node utilizationNode-level capacity management
Key rule: HPA and CA work together. HPA scales pods up → pods go Pending → CA adds a node. HPA scales pods down → node utilization drops below threshold → CA removes the node.

See Kubernetes HPA Scaling and Kubernetes Resource Management for the pod-side of this equation.

2. How Cluster Autoscaler Works

CA runs as a Deployment inside your cluster (typically in the kube-system namespace). Every 10 seconds it evaluates two questions:

  1. Are there Pending pods? If yes, find a node group that could accommodate them and call the cloud provider API to add a node.
  2. Are any nodes underutilized? If a node has been below the utilization threshold for a configurable window, drain it and remove it.

Scale-Up Trigger

CA watches for pods in the Pending state with the condition reason: Unschedulable. It simulates placing those pods on each node group's hypothetical new node. If the simulation succeeds, it increments the node group's desired count by the minimum number of nodes needed to place all pending pods.

Scale-Down Idle Threshold

A node is a candidate for removal when all of the following are true for at least --scale-down-unneeded-time (default 10 minutes):

  • Node CPU + memory requests are below --scale-down-utilization-threshold (default 50%)
  • All pods on the node can be safely evicted (no blocking PodDisruptionBudgets, no safe-to-evict: "false" annotations)
  • The node group is above its minimum size
Timing nuance: After a scale-up event, CA waits --scale-down-delay-after-add (default 10 minutes) before considering scale-down on any node. This prevents thrashing when new nodes are still receiving pods.

3. Installing Cluster Autoscaler on EKS

Step 1 — IAM Policy

CA needs permission to describe and modify Auto Scaling Groups. Create a policy and attach it via IRSA (IAM Roles for Service Accounts).

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:DescribeInstanceTypes",
        "eks:DescribeNodegroup"
      ],
      "Resource": "*"
    }
  ]
}

Step 2 — Create IRSA Role

eksctl create iamserviceaccount \
  --cluster=my-cluster \
  --namespace=kube-system \
  --name=cluster-autoscaler \
  --attach-policy-arn=arn:aws:iam::ACCOUNT_ID:policy/ClusterAutoscalerPolicy \
  --approve \
  --override-existing-serviceaccounts

Step 3 — Install via Helm

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update

helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=us-east-1 \
  --set rbac.serviceAccount.create=false \
  --set rbac.serviceAccount.name=cluster-autoscaler \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-system-pods=false \
  --set extraArgs.scale-down-utilization-threshold=0.5 \
  --set extraArgs.scale-down-unneeded-time=10m \
  --set extraArgs.scale-down-delay-after-add=10m
Auto-discovery mode requires your node group ASGs to have these tags:
k8s.io/cluster-autoscaler/my-cluster = owned
k8s.io/cluster-autoscaler/enabled = true

Deployment YAML (alternative to Helm)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
      containers:
        - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0
          name: cluster-autoscaler
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
            - --balance-similar-node-groups
            - --scale-down-utilization-threshold=0.5
          env:
            - name: AWS_REGION
              value: us-east-1
          resources:
            limits:
              cpu: 100m
              memory: 600Mi
            requests:
              cpu: 100m
              memory: 600Mi

4. Node Group Configuration

Each EKS managed node group exposes min/max/desired capacity to CA via ASG tags. Proper labeling lets you create specialized node groups (e.g., GPU nodes, high-memory nodes) and use node selectors / affinity rules to route workloads correctly.

# eksctl cluster config snippet
managedNodeGroups:
  - name: general-workers
    instanceType: m5.xlarge
    minSize: 2
    maxSize: 20
    desiredCapacity: 4
    labels:
      role: worker
      workload-type: general
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"

  - name: gpu-workers
    instanceType: g4dn.xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 0
    labels:
      role: gpu-worker
      nvidia.com/gpu: "true"
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"

Setting minSize: 0 on the GPU node group allows CA to scale it all the way to zero during off-peak hours — a significant cost saving for batch ML workloads. See Kubernetes Taints and Tolerations for how to route only GPU pods to that node group.

5. Scale-Up: Triggers, Node Group Selection, and Timing

When a pod remains Pending for more than --max-node-provision-time (default 15 minutes) without a node being provisioned, CA logs an error. Here is the normal scale-up flow:

  1. Scheduler marks pod as Unschedulable — no current node fits the pod's resource requests.
  2. CA's main loop (every 10 seconds) detects the Pending pod.
  3. CA simulates placing the pod on a new node from each eligible node group using the node group's instance type, labels, and taints.
  4. CA chooses the best node group according to the configured expander (see Section 8).
  5. CA calls the AWS Auto Scaling API to increment the ASG's desired count.
  6. The new EC2 instance registers with EKS, kubelet starts, node becomes Ready — typically 60–180 seconds.
  7. The Scheduler places the pending pod on the new node.
Important: CA will not scale up if the pod has resources.requests unset. Always set CPU and memory requests on every container. See Kubernetes Resource Management.

6. Scale-Down: Utilization Threshold, Delays, and Annotations

Scale-down is more conservative than scale-up because evicting pods carries risk. CA uses a multi-stage check:

  1. Utilization check: sum of all pod requests on the node divided by node allocatable. If below --scale-down-utilization-threshold (default 0.5 = 50%), the node is "unneeded".
  2. Unneeded timer: the node must remain unneeded for --scale-down-unneeded-time (default 10m) before it is eligible for removal.
  3. Post-add delay: after any scale-up event, CA waits --scale-down-delay-after-add (default 10m) before starting scale-down evaluation across the whole cluster.
  4. Eviction check: CA dry-runs an eviction of every pod on the candidate node. If any pod would violate a PDB or has the annotation below, the node is skipped.

To prevent a pod from being evicted during scale-down:

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
Warning: Pods with safe-to-evict: "false" block the entire node from being scaled down, not just themselves. Use this annotation sparingly — typically only for pods running critical local state (e.g., a node-local cache that cannot be rebuilt quickly).

7. PodDisruptionBudgets and Scale-Down

A PodDisruptionBudget (PDB) tells Kubernetes the minimum number of replicas that must remain available during voluntary disruptions — including CA-initiated evictions.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: frontend-pdb
  namespace: production
spec:
  minAvailable: 2          # at least 2 replicas must stay up
  selector:
    matchLabels:
      app: frontend

During scale-down, CA will attempt to evict all pods on the candidate node. If evicting a pod would violate any PDB, CA cancels the eviction for that node and moves on to the next candidate. The node remains in the cluster until the PDB allows the eviction.

Best practice: Always define a PDB for any Deployment with 2+ replicas in production. Without a PDB, CA (and rolling updates, node drains, etc.) can take all replicas down simultaneously. Combine PDBs with HPA to ensure you always have spare replicas available.

See Kubernetes Deployments for rolling update strategies that work alongside PDBs.

8. Expanders: Choosing the Right Node Group

When multiple node groups could satisfy a pending pod, CA uses an expander to pick one. Set it with --expander=<name>.

ExpanderSelection StrategyBest For
least-wastePick the node group that wastes the least CPU/memory after placing podsGeneral production clusters — minimizes cost
most-podsPick the node group that can schedule the most pods after scalingBatch workloads with many small pods
randomRandomly pick an eligible node groupTesting, simple homogeneous clusters
priorityRank node groups by a user-defined ConfigMap priority listSpot-first strategies: prefer spot, fall back to on-demand
grpcDelegate the decision to an external gRPC serviceCustom business logic (e.g., compliance, geographic placement)

Priority Expander Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    100:
      - .*spot.*          # prefer any node group with "spot" in the name
    50:
      - .*on-demand.*     # fall back to on-demand if no spot capacity
    10:
      - .*               # catch-all

With this ConfigMap and --expander=priority, CA will always try to scale the spot node group first and only provision on-demand nodes if the spot group is at max capacity or unavailable.

9. Spot Instance Node Groups and Mixed Instance Policy

Spot instances offer up to 90% discount over on-demand pricing but can be interrupted with a 2-minute warning. The recommended pattern on EKS is a mixed instance policy with an on-demand base capacity.

# eksctl managed node group with spot + on-demand mix
managedNodeGroups:
  - name: spot-mixed
    instanceTypes:
      - m5.xlarge
      - m5a.xlarge
      - m4.xlarge
      - m5d.xlarge
    spot: true
    minSize: 0
    maxSize: 30
    desiredCapacity: 5
    labels:
      lifecycle: spot
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"

  - name: on-demand-base
    instanceType: m5.xlarge
    minSize: 2            # always keep 2 on-demand nodes as baseline
    maxSize: 10
    desiredCapacity: 2
    labels:
      lifecycle: on-demand
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"
Spot interruption handling: Install the AWS Node Termination Handler DaemonSet. It watches for spot interruption notices via EC2 metadata and cordons/drains the node before AWS reclaims it, giving pods time to reschedule gracefully.

Use --balance-similar-node-groups=true to have CA spread nodes evenly across multiple similar node groups (e.g., different AZs), which improves availability for stateful workloads using EBS volumes.

10. Karpenter vs Cluster Autoscaler

Karpenter is an open-source node provisioner from AWS that operates at a lower level than CA — it provisions EC2 instances directly without managing Auto Scaling Groups. In 2026, both are production-ready; the choice depends on your requirements.

FeatureCluster AutoscalerKarpenter
Provisioning modelPre-defined node groups (ASGs)Dynamic, per-pod instance selection
Instance flexibilityLimited to node group instance typesAny EC2 instance type matching pod requirements
Scale-up latency~90–180 seconds~45–90 seconds (direct EC2 API)
Bin packingVia expander heuristicsNative — picks right-sized instance for pods
Spot consolidationManual via multiple node groupsBuilt-in disruption controller
Cloud supportAWS, GCP, Azure, and moreAWS, Azure (preview), GCP (community)
Configuration complexityLow — familiar Deployment + flagsMedium — NodePool + NodeClass CRDs
Migration effortBaselineMedium (need to replace node groups with NodePools)

Stick with CA if: you already have well-tuned node groups, use a multi-cloud setup, or your team is not ready to adopt new CRDs.

Migrate to Karpenter if: you want faster scale-up, better spot diversity, or automatic node consolidation (Karpenter's disruption controller can replace two under-utilized nodes with one larger node without any manual configuration).

11. Overprovisioning with Placeholder Pause Pods

CA only adds nodes after pods go Pending, which introduces a delay (the time to provision and boot a new EC2 instance). For latency-sensitive workloads, you can pre-warm spare capacity using placeholder pause pods.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: overprovisioning
  namespace: kube-system
spec:
  replicas: 3                # 3 placeholder pods = ~3 spare node slots
  selector:
    matchLabels:
      app: overprovisioning
  template:
    metadata:
      labels:
        app: overprovisioning
    spec:
      priorityClassName: overprovisioning   # low priority — real pods evict these
      terminationGracePeriodSeconds: 0
      containers:
        - name: pause
          image: registry.k8s.io/pause:3.9
          resources:
            requests:
              cpu: "1500m"        # size to match one typical real pod
              memory: "2Gi"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: overprovisioning
value: -1                    # below default (0) — real pods always win
globalDefault: false
description: "Placeholder pods for node overprovisioning"

When a real pod arrives and the cluster is full, it evicts a placeholder pod (lower priority), the placeholder goes Pending, CA sees the Pending placeholder and provisions a new node, and that new node absorbs future real pods with zero wait time.

Cost trade-off: Each placeholder reserves one node's worth of capacity. Size placeholder requests to match your typical burst pod, and limit the replica count to 1–3 unless you have very aggressive burst requirements.

12. Monitoring Cluster Autoscaler

CA Logs

# Stream CA logs in real time
kubectl logs -n kube-system \
  -l app=cluster-autoscaler \
  --follow --tail=100

# Filter for scale-up events
kubectl logs -n kube-system \
  -l app=cluster-autoscaler \
  --tail=500 | grep "scale up"

# Filter for scale-down
kubectl logs -n kube-system \
  -l app=cluster-autoscaler \
  --tail=500 | grep "removing node"

Status ConfigMap

# CA writes its current status here every 60 seconds
kubectl get configmap cluster-autoscaler-status \
  -n kube-system \
  -o yaml

The status ConfigMap includes node group status (healthy/unhealthy), last scale-up time, nodes that are candidates for removal, and any errors CA encountered.

Key Prometheus Metrics

CA exposes metrics on port 8085. Scrape them with your Prometheus instance and alert on these key signals:

MetricDescription
cluster_autoscaler_nodes_countCurrent node count per state (ready, unready, cordoned)
cluster_autoscaler_unschedulable_pods_countNumber of pods CA is trying to schedule — should trend to 0
cluster_autoscaler_scale_up_in_cooldown1 if scale-up is blocked by cooldown period
cluster_autoscaler_skipped_scale_events_countEvents skipped due to max node group size or other limits
cluster_autoscaler_last_activityTimestamp of last CA decision loop — alert if stale

For dashboards and alerting setup, see Kubernetes Monitoring with Prometheus.

13. Common Issues and Troubleshooting

Issue 1: Scale-Down Blocked by Pod Annotations

Symptom: Nodes are consistently at low utilization but CA never removes them. CA logs show: pod ... has ClusterAutoscaler annotation preventing scale-down

Fix: Audit your pods for safe-to-evict: "false" annotations. DaemonSet pods and pods with local storage are automatically blocked — this is expected. Non-DaemonSet pods with the annotation set by hand need review.

# Find all pods blocking scale-down
kubectl get pods --all-namespaces -o json | \
  jq '.items[] | select(.metadata.annotations["cluster-autoscaler.kubernetes.io/safe-to-evict"]=="false") | .metadata.namespace + "/" + .metadata.name'

Issue 2: Unschedulable Pods Not Triggering Scale-Up

Symptom: Pods are Pending but CA is not adding nodes.

Common causes and checks:

  • Pod requests exceed any single node type's capacity — check kubectl describe pod <pod> for the exact Insufficient cpu/memory message.
  • Node group is already at maxSize — check kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml.
  • Pod has node affinity or a taint toleration that no node group can satisfy — CA cannot schedule across incompatible node groups.
  • Missing resource requests on the pod — CA ignores pods with no requests set.

Issue 3: Nodes Cycling (Scale-Up Immediately Followed by Scale-Down)

Symptom: A new node appears, gets a few pods, then gets removed a few minutes later.

Fix: Increase --scale-down-delay-after-add to give pods time to stabilize (try 15–20 minutes). Also ensure pods have proper readiness probes so they report Ready before CA evaluates utilization.

Issue 4: CA Cannot Assume IAM Role

Symptom: CA logs show NoCredentialProviders: no valid providers in chain

Fix: Verify the Service Account annotation points to the correct IAM role ARN:

kubectl get serviceaccount cluster-autoscaler \
  -n kube-system \
  -o jsonpath='{.metadata.annotations}'

Expected output: {"eks.amazonaws.com/role-arn":"arn:aws:iam::ACCOUNT:role/ClusterAutoscalerRole"}

Summary

Kubernetes Cluster Autoscaler is essential infrastructure for any production EKS cluster. To recap the key points:

  • CA scales nodes; HPA scales pods. They are complementary — deploy both.
  • CA triggers scale-up when pods are Unschedulable, and scale-down when node utilization stays below 50% for 10 minutes.
  • Install via IRSA + Helm for proper IAM scoping and GitOps compatibility.
  • Use the priority expander with spot-first node groups to cut costs by 60–80%.
  • Define PodDisruptionBudgets for all production Deployments to ensure safe eviction during scale-down.
  • Use placeholder pause pods at low priority to pre-warm capacity and eliminate scale-up latency for burst traffic.
  • Monitor via the cluster-autoscaler-status ConfigMap and Prometheus metrics.
  • Consider migrating to Karpenter when you need faster provisioning, right-sizing, or automatic consolidation.

For a broader Kubernetes foundation, start with Kubernetes Complete Guide and Kubernetes Pods Guide. For cluster security hardening alongside autoscaling, see Kubernetes Security Best Practices. To manage CA installation as a Helm chart in your GitOps pipeline, see Kubernetes Helm Guide.