Kubernetes Cost Optimization: Right-Sizing and Spot Nodes

Kubernetes makes it easy to run workloads at scale, but without deliberate cost management it equally makes it easy to waste 50-70% of your cloud compute budget on over-provisioned pods and idle nodes. A systematic cost optimisation program — combining right-sized resource requests, spot/preemptible instances, cluster autoscaling, and waste detection tooling — can reduce Kubernetes infrastructure spend by 40-70% without sacrificing reliability or performance.

Cost Visibility with Kubecost
Right-Sizing with VPA Recommendations
Spot and Preemptible Nodes
Cluster Autoscaler Configuration
Karpenter: Next-Generation Node Provisioning
Eliminating Idle and Zombie Workloads
Cost-Aware Scheduling Strategies
Reserved Instances and Savings Plans

Cost Visibility with Kubecost

You cannot optimise what you cannot measure. Kubecost allocates cloud spend down to the namespace, deployment, and pod level, giving teams visibility into exactly what each workload costs per day.

# Install Kubecost with Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm upgrade --install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token" \
  --set global.prometheus.enabled=true

# Access the Kubecost UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

Key Kubecost reports to review weekly:

Namespace allocation: Which teams are spending the most? Are costs trending up unexpectedly?
Efficiency score: Ratio of requested resources to actual usage. Scores below 50% indicate significant over-provisioning.
Savings opportunities: Kubecost identifies specific deployments and namespaces where right-sizing would reduce cost.
Idle cost: Compute allocated but unused — the single largest source of Kubernetes waste in most clusters.

OpenCost: Kubecost open-sourced its core cost allocation engine as OpenCost, now a CNCF project. For basic cost visibility without the Kubecost UI, OpenCost can be deployed standalone and queried via Prometheus metrics.

Right-Sizing with VPA Recommendations

Vertical Pod Autoscaler (VPA) in recommendation mode analyses actual CPU and memory usage over time and suggests right-sized resource requests. Most workloads are over-provisioned by 3-5x because developers set "safe" limits without data.

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Deploy VPA in recommendation-only mode (does not modify pods)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"    # Recommendation only — do not auto-update pods

# Check VPA recommendations
kubectl describe vpa api-server-vpa -n production

# Output includes:
# Recommendation:
#   Container Recommendations:
#     Container Name: api-server
#     Lower Bound:
#       Cpu:     50m
#       Memory:  128Mi
#     Target:
#       Cpu:     230m      # VPA recommended request
#       Memory:  380Mi
#     Upper Bound:
#       Cpu:     1500m
#       Memory:  2Gi

If the current deployment requests 2 CPU and 2Gi but VPA recommends 230m and 380Mi, you are paying for ~8x more compute than needed. Applying VPA recommendations across a cluster typically reduces compute costs by 40-60%.

Spot and Preemptible Nodes

Spot instances (AWS) and Preemptible VMs (GCP) offer 60-90% discount compared to on-demand pricing, with the trade-off that the cloud provider can reclaim them with 2 minutes notice. For fault-tolerant workloads, this trade-off is almost always worth taking.

# AWS EKS — Spot node group via eksctl
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: production
  region: us-east-1
nodeGroups:
  # On-demand node group for critical workloads
  - name: on-demand-critical
    instanceType: m5.xlarge
    desiredCapacity: 3
    minSize: 3
    maxSize: 5
    labels:
      node-type: on-demand
    taints:
      - key: node-type
        value: on-demand
        effect: NoSchedule

  # Spot node group for fault-tolerant workloads
  - name: spot-workers
    instancesDistribution:
      maxPrice: 0.20
      instanceTypes: ["m5.xlarge", "m5a.xlarge", "m4.xlarge", "m5d.xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotAllocationStrategy: capacity-optimized
    desiredCapacity: 5
    minSize: 0
    maxSize: 20
    labels:
      node-type: spot
    taints:
      - key: spot
        value: "true"
        effect: NoSchedule

# Tolerate spot nodes in your deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  template:
    spec:
      tolerations:
        - key: spot
          operator: Equal
          value: "true"
          effect: NoSchedule
      nodeSelector:
        node-type: spot
      # Spread across AZs to reduce spot interruption impact
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: batch-processor

Spot interruption handling: Deploy the AWS Node Termination Handler (or GCP equivalent) to gracefully drain workloads from spot instances when a 2-minute interruption notice is received. This prevents in-flight requests from being dropped on spot reclamation.

Cluster Autoscaler Configuration

Cluster Autoscaler adds nodes when pods are pending due to insufficient resources, and removes underutilised nodes after a configurable cooldown period. Proper configuration prevents both under-provisioning (pod scheduling failures) and over-provisioning (idle nodes).

# Cluster Autoscaler deployment (key flags)
containers:
  - command:
    - ./cluster-autoscaler
    - --cloud-provider=aws
    - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
    - --balance-similar-node-groups=true    # spread evenly across AZs
    - --skip-nodes-with-system-pods=false
    - --skip-nodes-with-local-storage=false
    - --scale-down-enabled=true
    - --scale-down-utilization-threshold=0.5    # remove node if under 50% utilisation
    - --scale-down-delay-after-add=5m
    - --scale-down-unneeded-time=10m
    - --max-graceful-termination-sec=300
    - --expander=least-waste    # pick the node group that wastes fewest resources

Karpenter: Next-Generation Node Provisioning

Karpenter (open-sourced by AWS, now CNCF) is a more intelligent alternative to Cluster Autoscaler. Rather than scaling predefined node groups, Karpenter reads pod requirements and provisions the optimal EC2 instance type directly — choosing instance families, sizes, and Spot vs on-demand dynamically.

# Karpenter NodePool — provision mixed Spot and on-demand nodes
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]    # compute, memory, general purpose families
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]              # only use 3rd gen+ instances
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000          # max total CPU across all Karpenter nodes
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s   # very aggressive consolidation for cost savings

Karpenter's consolidation feature continuously re-packs pods onto fewer, larger nodes as workloads scale down — something Cluster Autoscaler does not do. This alone can reduce node count by 20-30% in clusters with variable load.

Eliminating Idle and Zombie Workloads

In mature clusters, 15-30% of running pods are serving zero or negligible traffic — forgotten development environments, stale preview deployments, or abandoned experiments. Identifying and removing these is free cost savings.

# Find deployments with 0 replicas (paying for quota but no pods)
kubectl get deployments -A -o json | \
  jq '.items[] | select(.spec.replicas == 0) | {ns: .metadata.namespace, name: .metadata.name}'

# Find pods with near-zero CPU usage (Prometheus query)
# kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Then query:
# avg_over_time(rate(container_cpu_usage_seconds_total[5m])[7d:]) < 0.001

# Scale down non-production deployments outside business hours
# Use a CronJob or a tool like Kube-Downscaler
helm upgrade --install kube-downscaler codeberg/kube-downscaler \
  --set args[0]="--interval=60" \
  --set args[1]="--default-uptime=Mon-Fri 08:00-20:00 Europe/London" \
  --set args[2]="--default-downtime=Mon-Sun 00:00-24:00 Europe/London" \
  --namespace kube-system

Cost-Aware Scheduling Strategies

Bin-pack pods onto fewer nodes to improve utilisation and reduce node count. The default Kubernetes scheduler uses LeastAllocated priority which spreads pods across nodes — better for resilience but worse for cost. The MostAllocated strategy packs pods densely.

# KubeSchedulerConfiguration — enable MostAllocated for cost savings
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: MostAllocated    # pack densely rather than spread
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

For workloads that don't need dedicated nodes, use pod topology spread constraints to allow dense packing while maintaining availability zone diversity.

Reserved Instances and Savings Plans

After right-sizing and Spot adoption, the remaining on-demand baseline should be covered with Reserved Instances or Savings Plans for an additional 30-60% discount on that portion of compute.

Savings Plans (AWS): Commit to a dollar/hour spend level across any EC2 instance family and region. More flexible than Reserved Instances and recommended for Kubernetes because pod scheduling changes instance requirements dynamically.
Committed Use Discounts (GCP): 1-year or 3-year commitments for vCPUs and memory in a specific region.
Reserved VM Instances (Azure): 1 or 3-year commitment for specific VM sizes.

Target 70-80% of your steady-state on-demand compute with Reserved Instances or Savings Plans, leaving 20-30% uncovered to handle growth without commitment waste. Review commitments quarterly as cluster size changes.

Combined strategy target: A mature Kubernetes cost optimization program typically achieves: 40% reduction from right-sizing, 30% additional reduction from Spot adoption, 25% additional reduction from Reserved coverage of on-demand baseline. Combined, this reduces the original cloud compute bill by 60-70%.