Kubernetes VPA: Vertical Pod Autoscaler Guide

The Kubernetes Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for your pods based on observed usage, eliminating the guesswork of manually right-sizing resources. Unlike the Horizontal Pod Autoscaler which adds more pod replicas, VPA changes the resource profile of individual pods — making it ideal for workloads that cannot be easily scaled horizontally, such as stateful applications, singletons, or batch jobs with variable resource requirements.

VPA Architecture and Components
Installing the Vertical Pod Autoscaler
VPA Update Modes: Off, Initial, Recreate, Auto
Configuring a VerticalPodAutoscaler Object
Reading VPA Recommendations
Resource Policies and Boundaries
Using VPA with HPA
Production Best Practices

VPA Architecture and Components

The Vertical Pod Autoscaler consists of three separate controllers that work together to observe, recommend, and apply resource changes:

VPA Recommender — watches the Kubernetes Metrics API and historical resource usage stored in Prometheus or the metrics server. It computes target, lower bound, upper bound, and uncapped target recommendations for each pod and writes them to the VPA object status.
VPA Updater — checks running pods against current VPA recommendations. When a pod's actual requests deviate too far from the recommended values, the Updater evicts the pod so it can be recreated with updated requests by the VPA Admission Controller.
VPA Admission Controller — a mutating webhook that intercepts pod creation requests. Whenever a pod matching a VPA selector is created (whether from a fresh deploy or after an Updater eviction), the Admission Controller patches the pod spec with the recommended CPU and memory requests before the pod is scheduled.

This three-component model means VPA can operate in a completely passive advisory mode (Recommender only), in a mode where it only sets resources at pod creation (Initial mode), or in a fully automatic mode where it evicts and recreates pods to apply updated recommendations.

Note: VPA requires the Kubernetes Metrics Server to be installed, or a compatible Prometheus adapter. Without metrics, the Recommender cannot compute any recommendations and VPA objects will show no suggested values.

Installing the Vertical Pod Autoscaler

VPA is not bundled with Kubernetes and must be installed separately. The official repository provides a shell script that installs the CRDs and all three VPA components. For production use, Helm charts from the community are also available.

# Clone the autoscaler repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install VPA (installs CRDs, RBAC, and all three components)
./hack/vpa-up.sh

# Verify the components are running
kubectl get pods -n kube-system | grep vpa
# Expected output:
# vpa-admission-controller-xxx   1/1   Running
# vpa-recommender-xxx            1/1   Running
# vpa-updater-xxx                1/1   Running

# Check that VPA CRDs are registered
kubectl get crd | grep verticalpodautoscaler

For Helm-based installations, the fairwinds-stable chart is widely used in production:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

helm upgrade --install vpa fairwinds-stable/vpa \
  --namespace kube-system \
  --set recommender.enabled=true \
  --set updater.enabled=true \
  --set admissionController.enabled=true

Tip: On managed Kubernetes services like GKE, AKS, and EKS, VPA may be available as a managed add-on. GKE's managed VPA is GA and is the recommended installation path for GKE clusters.

VPA Update Modes: Off, Initial, Recreate, Auto

VPA supports four update modes that control how aggressively it applies recommendations. Choosing the right mode depends on your workload's tolerance for pod restarts and how confident you are in the recommendations.

Off — The Recommender computes recommendations and writes them to the VPA status, but no pods are modified. Use this to evaluate what VPA would recommend before granting it any control.
Initial — Recommendations are applied only when pods are first created. Running pods are never evicted. This is safe for stateful workloads where pod restarts are costly.
Recreate — The Updater evicts pods whose resources deviate from recommendations. The Admission Controller then sets the recommended values when the pods are recreated. This gives VPA full control but causes pod restarts when recommendations change significantly.
Auto — Currently behaves identically to Recreate. In future Kubernetes versions, Auto may use in-place pod resource updates (KEP-1287) to resize pods without eviction.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"   # Off | Initial | Recreate | Auto

Configuring a VerticalPodAutoscaler Object

A complete VPA object specifies the target workload, the update mode, container-level resource policies, and optionally minimum eviction intervals to protect pod stability.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2       # Only evict if deployment has >= 2 replicas
  resourcePolicy:
    containerPolicies:
      - containerName: api-server
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 4Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits
      - containerName: sidecar-proxy
        mode: "Off"      # Exclude this container from VPA control

The minAllowed and maxAllowed fields define hard boundaries that VPA will never exceed regardless of usage patterns. The controlledValues field determines whether VPA sets only requests (RequestsOnly) or both requests and limits (RequestsAndLimits).

Reading VPA Recommendations

After VPA has collected enough metrics (typically 24 hours of data for accurate recommendations), you can read the recommendations from the VPA object status.

# Get VPA recommendations
kubectl describe vpa api-server-vpa -n production

# Or use kubectl get with output format
kubectl get vpa api-server-vpa -n production -o yaml

# Example VPA status output
status:
  recommendation:
    containerRecommendations:
    - containerName: api-server
      lowerBound:
        cpu: 150m
        memory: 200Mi
      target:
        cpu: 300m
        memory: 512Mi
      uncappedTarget:
        cpu: 300m
        memory: 512Mi
      upperBound:
        cpu: 800m
        memory: 1200Mi

The four recommendation values have distinct meanings: lowerBound is the minimum safe value, target is what VPA will apply, upperBound is an estimate above which resources are likely wasted, and uncappedTarget is the raw recommendation before applying your minAllowed/maxAllowed bounds.

Note: VPA needs at least 8 days of data to produce accurate long-term recommendations. Short-lived workloads or workloads with very spiky traffic may receive less accurate suggestions. Always validate recommendations against your workload's actual peak usage before setting updateMode to Auto.

Resource Policies and Boundaries

Without explicit boundaries, VPA may recommend values that are unsafe for your cluster — either too low (causing OOMKills) or too high (consuming excessive node resources). Always set minAllowed and maxAllowed for production workloads.

resourcePolicy:
  containerPolicies:
    - containerName: "*"          # Apply policy to all containers
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
      controlledResources: ["cpu", "memory"]

For Java and JVM-based workloads, set a meaningful minAllowed.memory value — the JVM itself requires at least 256Mi for basic operation, and VPA without a lower bound may recommend values below the JVM minimum heap. You should also consider setting controlledResources: ["cpu"] only and managing JVM memory limits manually, since JVM heap sizing interacts poorly with Kubernetes memory limits.

Using VPA with HPA

Running VPA and HPA simultaneously on the same deployment can cause conflicts when both controllers attempt to change the same resource dimension. The conflict rules are straightforward: never use VPA in CPU mode alongside HPA using CPU as its scaling metric — they will fight each other in a feedback loop.

The safe combinations are:

VPA managing CPU+Memory (mode: Auto/Recreate) + HPA scaling on custom metrics (not CPU)
VPA in mode: Off or Initial (advisory only) + HPA on any metric
VPA managing only memory + HPA scaling on CPU

# Safe: HPA on custom metric + VPA on cpu+memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: queue_depth
        target:
          type: AverageValue
          averageValue: "100"

KEDA integration: KEDA (Kubernetes Event-Driven Autoscaling) is fully compatible with VPA because KEDA scales on external event sources rather than CPU. This combination — VPA right-sizing pods + KEDA scaling replica count based on queue depth — is one of the most effective autoscaling architectures available.

Production Best Practices

VPA is powerful but requires careful configuration to avoid disruptions in production environments:

Start with Off mode — Run VPA in Off mode for 1–2 weeks to collect recommendations before enabling automatic updates. This lets you validate recommendations against your knowledge of the workload.
Set minReplicas — Use the minReplicas field in updatePolicy to prevent VPA from evicting pods when only one replica is running. Single-replica workloads become unavailable during eviction.
Use PodDisruptionBudgets — VPA's Updater respects PodDisruptionBudgets. Set a PDB with minAvailable: 1 to ensure at least one pod stays running during VPA-initiated evictions.
Monitor OOMKills — Track the kube_pod_container_status_last_terminated_reason metric in Prometheus. A spike in OOMKill reasons after VPA changes indicates the recommendations are too aggressive.
Exclude init containers — VPA can manage init containers, but their resource patterns differ significantly from main containers. Use containerName selectors to target only the containers where VPA adds value.

# PodDisruptionBudget protecting against VPA evictions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: api-server