Kubernetes VPA: Vertical Pod Autoscaler Guide
The Kubernetes Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for your pods based on observed usage, eliminating the guesswork of manually right-sizing resources. Unlike the Horizontal Pod Autoscaler which adds more pod replicas, VPA changes the resource profile of individual pods — making it ideal for workloads that cannot be easily scaled horizontally, such as stateful applications, singletons, or batch jobs with variable resource requirements.
Table of Contents
VPA Architecture and Components
The Vertical Pod Autoscaler consists of three separate controllers that work together to observe, recommend, and apply resource changes:
- VPA Recommender — watches the Kubernetes Metrics API and historical resource usage stored in Prometheus or the metrics server. It computes target, lower bound, upper bound, and uncapped target recommendations for each pod and writes them to the VPA object status.
- VPA Updater — checks running pods against current VPA recommendations. When a pod's actual requests deviate too far from the recommended values, the Updater evicts the pod so it can be recreated with updated requests by the VPA Admission Controller.
- VPA Admission Controller — a mutating webhook that intercepts pod creation requests. Whenever a pod matching a VPA selector is created (whether from a fresh deploy or after an Updater eviction), the Admission Controller patches the pod spec with the recommended CPU and memory requests before the pod is scheduled.
This three-component model means VPA can operate in a completely passive advisory mode (Recommender only), in a mode where it only sets resources at pod creation (Initial mode), or in a fully automatic mode where it evicts and recreates pods to apply updated recommendations.
Installing the Vertical Pod Autoscaler
VPA is not bundled with Kubernetes and must be installed separately. The official repository provides a shell script that installs the CRDs and all three VPA components. For production use, Helm charts from the community are also available.
# Clone the autoscaler repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Install VPA (installs CRDs, RBAC, and all three components)
./hack/vpa-up.sh
# Verify the components are running
kubectl get pods -n kube-system | grep vpa
# Expected output:
# vpa-admission-controller-xxx 1/1 Running
# vpa-recommender-xxx 1/1 Running
# vpa-updater-xxx 1/1 Running
# Check that VPA CRDs are registered
kubectl get crd | grep verticalpodautoscaler
For Helm-based installations, the fairwinds-stable chart is widely used in production:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update
helm upgrade --install vpa fairwinds-stable/vpa \
--namespace kube-system \
--set recommender.enabled=true \
--set updater.enabled=true \
--set admissionController.enabled=true
VPA Update Modes: Off, Initial, Recreate, Auto
VPA supports four update modes that control how aggressively it applies recommendations. Choosing the right mode depends on your workload's tolerance for pod restarts and how confident you are in the recommendations.
- Off — The Recommender computes recommendations and writes them to the VPA status, but no pods are modified. Use this to evaluate what VPA would recommend before granting it any control.
- Initial — Recommendations are applied only when pods are first created. Running pods are never evicted. This is safe for stateful workloads where pod restarts are costly.
- Recreate — The Updater evicts pods whose resources deviate from recommendations. The Admission Controller then sets the recommended values when the pods are recreated. This gives VPA full control but causes pod restarts when recommendations change significantly.
- Auto — Currently behaves identically to Recreate. In future Kubernetes versions, Auto may use in-place pod resource updates (KEP-1287) to resize pods without eviction.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Off | Initial | Recreate | Auto
Configuring a VerticalPodAutoscaler Object
A complete VPA object specifies the target workload, the update mode, container-level resource policies, and optionally minimum eviction intervals to protect pod stability.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto"
minReplicas: 2 # Only evict if deployment has >= 2 replicas
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
- containerName: sidecar-proxy
mode: "Off" # Exclude this container from VPA control
The minAllowed and maxAllowed fields define hard boundaries that VPA will never exceed regardless of usage patterns. The controlledValues field determines whether VPA sets only requests (RequestsOnly) or both requests and limits (RequestsAndLimits).
Reading VPA Recommendations
After VPA has collected enough metrics (typically 24 hours of data for accurate recommendations), you can read the recommendations from the VPA object status.
# Get VPA recommendations
kubectl describe vpa api-server-vpa -n production
# Or use kubectl get with output format
kubectl get vpa api-server-vpa -n production -o yaml
# Example VPA status output
status:
recommendation:
containerRecommendations:
- containerName: api-server
lowerBound:
cpu: 150m
memory: 200Mi
target:
cpu: 300m
memory: 512Mi
uncappedTarget:
cpu: 300m
memory: 512Mi
upperBound:
cpu: 800m
memory: 1200Mi
The four recommendation values have distinct meanings: lowerBound is the minimum safe value, target is what VPA will apply, upperBound is an estimate above which resources are likely wasted, and uncappedTarget is the raw recommendation before applying your minAllowed/maxAllowed bounds.
Resource Policies and Boundaries
Without explicit boundaries, VPA may recommend values that are unsafe for your cluster — either too low (causing OOMKills) or too high (consuming excessive node resources). Always set minAllowed and maxAllowed for production workloads.
resourcePolicy:
containerPolicies:
- containerName: "*" # Apply policy to all containers
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 2Gi
controlledResources: ["cpu", "memory"]
For Java and JVM-based workloads, set a meaningful minAllowed.memory value — the JVM itself requires at least 256Mi for basic operation, and VPA without a lower bound may recommend values below the JVM minimum heap. You should also consider setting controlledResources: ["cpu"] only and managing JVM memory limits manually, since JVM heap sizing interacts poorly with Kubernetes memory limits.
Using VPA with HPA
Running VPA and HPA simultaneously on the same deployment can cause conflicts when both controllers attempt to change the same resource dimension. The conflict rules are straightforward: never use VPA in CPU mode alongside HPA using CPU as its scaling metric — they will fight each other in a feedback loop.
The safe combinations are:
- VPA managing CPU+Memory (mode: Auto/Recreate) + HPA scaling on custom metrics (not CPU)
- VPA in mode: Off or Initial (advisory only) + HPA on any metric
- VPA managing only memory + HPA scaling on CPU
# Safe: HPA on custom metric + VPA on cpu+memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: queue_depth
target:
type: AverageValue
averageValue: "100"
Production Best Practices
VPA is powerful but requires careful configuration to avoid disruptions in production environments:
- Start with Off mode — Run VPA in Off mode for 1–2 weeks to collect recommendations before enabling automatic updates. This lets you validate recommendations against your knowledge of the workload.
- Set minReplicas — Use the
minReplicasfield inupdatePolicyto prevent VPA from evicting pods when only one replica is running. Single-replica workloads become unavailable during eviction. - Use PodDisruptionBudgets — VPA's Updater respects PodDisruptionBudgets. Set a PDB with
minAvailable: 1to ensure at least one pod stays running during VPA-initiated evictions. - Monitor OOMKills — Track the
kube_pod_container_status_last_terminated_reasonmetric in Prometheus. A spike in OOMKill reasons after VPA changes indicates the recommendations are too aggressive. - Exclude init containers — VPA can manage init containers, but their resource patterns differ significantly from main containers. Use
containerNameselectors to target only the containers where VPA adds value.
# PodDisruptionBudget protecting against VPA evictions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
namespace: production
spec:
minAvailable: 1
selector:
matchLabels:
app: api-server