Kubernetes Cost Optimization: Right-Sizing and Spot Nodes
Kubernetes makes it easy to run workloads at scale, but without deliberate cost management it equally makes it easy to waste 50-70% of your cloud compute budget on over-provisioned pods and idle nodes. A systematic cost optimisation program — combining right-sized resource requests, spot/preemptible instances, cluster autoscaling, and waste detection tooling — can reduce Kubernetes infrastructure spend by 40-70% without sacrificing reliability or performance.
Table of Contents
Cost Visibility with Kubecost
You cannot optimise what you cannot measure. Kubecost allocates cloud spend down to the namespace, deployment, and pod level, giving teams visibility into exactly what each workload costs per day.
# Install Kubecost with Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm upgrade --install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token" \
--set global.prometheus.enabled=true
# Access the Kubecost UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
Key Kubecost reports to review weekly:
- Namespace allocation: Which teams are spending the most? Are costs trending up unexpectedly?
- Efficiency score: Ratio of requested resources to actual usage. Scores below 50% indicate significant over-provisioning.
- Savings opportunities: Kubecost identifies specific deployments and namespaces where right-sizing would reduce cost.
- Idle cost: Compute allocated but unused — the single largest source of Kubernetes waste in most clusters.
Right-Sizing with VPA Recommendations
Vertical Pod Autoscaler (VPA) in recommendation mode analyses actual CPU and memory usage over time and suggests right-sized resource requests. Most workloads are over-provisioned by 3-5x because developers set "safe" limits without data.
# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
# Deploy VPA in recommendation-only mode (does not modify pods)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommendation only — do not auto-update pods
# Check VPA recommendations
kubectl describe vpa api-server-vpa -n production
# Output includes:
# Recommendation:
# Container Recommendations:
# Container Name: api-server
# Lower Bound:
# Cpu: 50m
# Memory: 128Mi
# Target:
# Cpu: 230m # VPA recommended request
# Memory: 380Mi
# Upper Bound:
# Cpu: 1500m
# Memory: 2Gi
If the current deployment requests 2 CPU and 2Gi but VPA recommends 230m and 380Mi, you are paying for ~8x more compute than needed. Applying VPA recommendations across a cluster typically reduces compute costs by 40-60%.
Spot and Preemptible Nodes
Spot instances (AWS) and Preemptible VMs (GCP) offer 60-90% discount compared to on-demand pricing, with the trade-off that the cloud provider can reclaim them with 2 minutes notice. For fault-tolerant workloads, this trade-off is almost always worth taking.
# AWS EKS — Spot node group via eksctl
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
nodeGroups:
# On-demand node group for critical workloads
- name: on-demand-critical
instanceType: m5.xlarge
desiredCapacity: 3
minSize: 3
maxSize: 5
labels:
node-type: on-demand
taints:
- key: node-type
value: on-demand
effect: NoSchedule
# Spot node group for fault-tolerant workloads
- name: spot-workers
instancesDistribution:
maxPrice: 0.20
instanceTypes: ["m5.xlarge", "m5a.xlarge", "m4.xlarge", "m5d.xlarge"]
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotAllocationStrategy: capacity-optimized
desiredCapacity: 5
minSize: 0
maxSize: 20
labels:
node-type: spot
taints:
- key: spot
value: "true"
effect: NoSchedule
# Tolerate spot nodes in your deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
replicas: 10
template:
spec:
tolerations:
- key: spot
operator: Equal
value: "true"
effect: NoSchedule
nodeSelector:
node-type: spot
# Spread across AZs to reduce spot interruption impact
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: batch-processor
Cluster Autoscaler Configuration
Cluster Autoscaler adds nodes when pods are pending due to insufficient resources, and removes underutilised nodes after a configurable cooldown period. Proper configuration prevents both under-provisioning (pod scheduling failures) and over-provisioning (idle nodes).
# Cluster Autoscaler deployment (key flags)
containers:
- command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
- --balance-similar-node-groups=true # spread evenly across AZs
- --skip-nodes-with-system-pods=false
- --skip-nodes-with-local-storage=false
- --scale-down-enabled=true
- --scale-down-utilization-threshold=0.5 # remove node if under 50% utilisation
- --scale-down-delay-after-add=5m
- --scale-down-unneeded-time=10m
- --max-graceful-termination-sec=300
- --expander=least-waste # pick the node group that wastes fewest resources
Karpenter: Next-Generation Node Provisioning
Karpenter (open-sourced by AWS, now CNCF) is a more intelligent alternative to Cluster Autoscaler. Rather than scaling predefined node groups, Karpenter reads pod requirements and provisions the optimal EC2 instance type directly — choosing instance families, sizes, and Spot vs on-demand dynamically.
# Karpenter NodePool — provision mixed Spot and on-demand nodes
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"] # compute, memory, general purpose families
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"] # only use 3rd gen+ instances
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
limits:
cpu: 1000 # max total CPU across all Karpenter nodes
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s # very aggressive consolidation for cost savings
Karpenter's consolidation feature continuously re-packs pods onto fewer, larger nodes as workloads scale down — something Cluster Autoscaler does not do. This alone can reduce node count by 20-30% in clusters with variable load.
Eliminating Idle and Zombie Workloads
In mature clusters, 15-30% of running pods are serving zero or negligible traffic — forgotten development environments, stale preview deployments, or abandoned experiments. Identifying and removing these is free cost savings.
# Find deployments with 0 replicas (paying for quota but no pods)
kubectl get deployments -A -o json | \
jq '.items[] | select(.spec.replicas == 0) | {ns: .metadata.namespace, name: .metadata.name}'
# Find pods with near-zero CPU usage (Prometheus query)
# kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Then query:
# avg_over_time(rate(container_cpu_usage_seconds_total[5m])[7d:]) < 0.001
# Scale down non-production deployments outside business hours
# Use a CronJob or a tool like Kube-Downscaler
helm upgrade --install kube-downscaler codeberg/kube-downscaler \
--set args[0]="--interval=60" \
--set args[1]="--default-uptime=Mon-Fri 08:00-20:00 Europe/London" \
--set args[2]="--default-downtime=Mon-Sun 00:00-24:00 Europe/London" \
--namespace kube-system
Cost-Aware Scheduling Strategies
Bin-pack pods onto fewer nodes to improve utilisation and reduce node count. The default Kubernetes scheduler uses LeastAllocated priority which spreads pods across nodes — better for resilience but worse for cost. The MostAllocated strategy packs pods densely.
# KubeSchedulerConfiguration — enable MostAllocated for cost savings
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated # pack densely rather than spread
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
For workloads that don't need dedicated nodes, use pod topology spread constraints to allow dense packing while maintaining availability zone diversity.
Reserved Instances and Savings Plans
After right-sizing and Spot adoption, the remaining on-demand baseline should be covered with Reserved Instances or Savings Plans for an additional 30-60% discount on that portion of compute.
- Savings Plans (AWS): Commit to a dollar/hour spend level across any EC2 instance family and region. More flexible than Reserved Instances and recommended for Kubernetes because pod scheduling changes instance requirements dynamically.
- Committed Use Discounts (GCP): 1-year or 3-year commitments for vCPUs and memory in a specific region.
- Reserved VM Instances (Azure): 1 or 3-year commitment for specific VM sizes.
Target 70-80% of your steady-state on-demand compute with Reserved Instances or Savings Plans, leaving 20-30% uncovered to handle growth without commitment waste. Review commitments quarterly as cluster size changes.