Kubernetes Resource Management: Requests, Limits and QoS (2026)

Misconfigured resource requests and limits are responsible for a large share of production Kubernetes incidents — from pods being OOMKilled mid-request to CPU-throttled services with mysterious latency spikes. This guide explains exactly how requests, limits, and QoS classes work, how to debug resource-related failures, and how to govern resource usage at the namespace level with ResourceQuota and LimitRange.

Resource Units: Millicores and Mi/Gi

Kubernetes uses specific units for CPU and memory that trip up newcomers.

CPU Units

CPU is measured in cores or millicores (m). One core = 1000m. Fractions of a core are expressed in millicores:

  • 250m = 0.25 cores (one quarter of a CPU)
  • 500m = 0.5 cores (half a CPU)
  • 1 or 1000m = 1 full core
  • 2.5 = 2500m = two and a half cores

On a 4-core node, you can schedule pods whose CPU requests sum to at most 4 cores (minus system overhead). CPU is a compressible resource — a container that exceeds its CPU limit is throttled, not killed.

Memory Units

Memory uses binary prefixes (powers of 1024) or SI prefixes (powers of 1000):

  • 128Mi = 128 mebibytes = 134,217,728 bytes (use this)
  • 1Gi = 1 gibibyte = 1,073,741,824 bytes (use this)
  • 128M = 128 megabytes = 128,000,000 bytes (SI, avoid — confusing)

Memory is an incompressible resource — a container that exceeds its memory limit is killed with OOMKilled, not throttled.

Requests vs Limits: The Core Difference

These two fields serve entirely different purposes and are enforced at different times:

AspectRequestLimit
PurposeScheduling guarantee — minimum reservedRuntime cap — maximum allowed
Enforced byKubernetes scheduler (at pod placement)Linux cgroups (at runtime)
If exceeded (CPU)N/A — can use more if availableContainer is CPU-throttled
If exceeded (memory)N/A — can use more if availableContainer is OOMKilled
HPA usesYes (utilization % is usage/request)No
Node allocationSubtracted from allocatable capacityNot used for scheduling
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: my-app:1.4.2
        resources:
          requests:
            cpu: "250m"       # Scheduler reserves 0.25 cores on the node
            memory: "256Mi"   # Scheduler reserves 256Mi on the node
          limits:
            cpu: "1000m"      # Container can burst up to 1 core
            memory: "512Mi"   # Container is killed if it exceeds 512Mi
Note: Setting requests == limits for both CPU and memory creates a Guaranteed QoS pod — the highest priority class. This is the right choice for latency-sensitive, production workloads. The scheduler knows exactly what the pod needs and won't place it on an overcommitted node.

QoS Classes Explained

Kubernetes assigns one of three Quality of Service classes to every pod based on its resource configuration. This class determines eviction priority when a node runs low on memory.

QoS ClassEviction PriorityCondition
GuaranteedLast to be evictedEvery container has requests == limits for both CPU and memory
BurstableMiddle priorityAt least one container has requests set, but requests != limits
BestEffortFirst to be evictedNo container has any requests or limits set

Guaranteed QoS

apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
spec:
  containers:
  - name: app
    image: my-app:1.4.2
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "500m"      # Must equal request
        memory: "512Mi"  # Must equal request
# kubectl get pod guaranteed-pod -o jsonpath='{.status.qosClass}'
# Output: Guaranteed

Burstable QoS

apiVersion: v1
kind: Pod
metadata:
  name: burstable-pod
spec:
  containers:
  - name: app
    image: my-app:1.4.2
    resources:
      requests:
        cpu: "250m"
        memory: "256Mi"
      limits:
        cpu: "1000m"     # Limit > request — Burstable
        memory: "1Gi"
# QoS class: Burstable
# Pod can use up to 1 CPU core when available, but scheduler
# only guarantees 250m. If node is under memory pressure,
# this pod is evicted before Guaranteed pods.

BestEffort QoS

apiVersion: v1
kind: Pod
metadata:
  name: besteffort-pod
spec:
  containers:
  - name: batch-job
    image: batch-processor:latest
    # No resources block at all — BestEffort
    # Uses whatever CPU/memory is available
    # First to be evicted under memory pressure
    # Appropriate only for non-critical batch jobs
Pro Tip: Check the QoS class of any running pod with: kubectl get pod <name> -o jsonpath='{.status.qosClass}'. For production deployments, aim for Guaranteed or at minimum Burstable. Never run stateful services (databases, caches) as BestEffort.

Debugging OOMKilled Pods

OOMKilled means the Linux kernel's Out-Of-Memory killer terminated the container because it exceeded its memory limit. Here is the systematic debug process.

# Step 1: Identify the OOMKilled container
kubectl get pods -n production
# Look for pods with high RESTARTS count

# Step 2: Describe the pod for termination reason
kubectl describe pod my-app-7d9f4b8c6-xk2pq -n production
# Look for:
# Last State: Terminated
#   Reason: OOMKilled
#   Exit Code: 137

# Step 3: Check recent events
kubectl get events -n production --sort-by='.lastTimestamp' | grep -i oom

# Step 4: See memory usage before the kill (if Prometheus is available)
# container_memory_working_set_bytes{pod="my-app-7d9f4b8c6-xk2pq"}

# Step 5: Check current memory usage across all pods
kubectl top pods -n production --sort-by=memory
# Step 6: Look at the kubelet logs on the node for OOM details
NODE=$(kubectl get pod my-app-7d9f4b8c6-xk2pq -n production \
  -o jsonpath='{.spec.nodeName}')
kubectl get node $NODE
# SSH to node and check: journalctl -k | grep -i "oom\|killed process"

# Step 7: Check if the limit is set too low vs actual usage
kubectl get pod my-app-7d9f4b8c6-xk2pq -n production \
  -o jsonpath='{.spec.containers[0].resources}'

Common root causes and fixes:

Root CauseSymptomFix
Memory limit too lowRestarts under normal loadIncrease limit; use VPA recommendations
Memory leak in appMemory grows over hours then OOMKillFix the leak; add heap dump on OOM
JVM heap not boundedJava app grows to node memorySet -XX:MaxRAMPercentage=75 or explicit -Xmx
Sudden traffic spikeOOMKill only during peak loadIncrease limit; add HPA; use memory-aware autoscaling
# Fix: JVM container with memory-aware settings
containers:
- name: java-app
  image: my-java-app:1.4.2
  env:
  - name: JAVA_OPTS
    value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof"
  resources:
    requests:
      memory: "512Mi"
    limits:
      memory: "1Gi"   # JVM will use max 750Mi (75% of 1Gi)

ResourceQuota per Namespace

ResourceQuota enforces aggregate resource caps on a namespace. It prevents any single team or application from monopolizing cluster resources.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    # Compute resources
    requests.cpu: "20"          # Total CPU requests across all pods
    requests.memory: "40Gi"     # Total memory requests across all pods
    limits.cpu: "40"            # Total CPU limits
    limits.memory: "80Gi"       # Total memory limits

    # Object count limits
    pods: "100"                 # Maximum number of pods
    services: "20"
    persistentvolumeclaims: "30"
    secrets: "50"
    configmaps: "50"

    # Storage
    requests.storage: "500Gi"   # Total PVC storage requested
    count/deployments.apps: "20"
    count/statefulsets.apps: "5"
# Check quota usage in a namespace
kubectl describe resourcequota production-quota -n production

# Output example:
# Name:                    production-quota
# Namespace:               production
# Resource                 Used    Hard
# --------                 ----    ----
# limits.cpu               12500m  40
# limits.memory            24Gi    80Gi
# pods                     47      100
# requests.cpu             6250m   20
# requests.memory          12Gi    40Gi
Note: Once a ResourceQuota is set on a namespace, every pod must specify resource requests and limits or it will be rejected by the API server. Use LimitRange defaults (see below) to set sensible defaults so pods without explicit resources are still accepted.

LimitRange Defaults

LimitRange sets default requests and limits for containers that don't specify them, and enforces minimum/maximum bounds per container, pod, or PVC.

apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
  # Container-level defaults and bounds
  - type: Container
    default:                  # Applied when no limit is set
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:           # Applied when no request is set
      cpu: "100m"
      memory: "128Mi"
    max:                      # No container can exceed these
      cpu: "4"
      memory: "8Gi"
    min:                      # No container can go below these
      cpu: "50m"
      memory: "64Mi"

  # Pod-level bounds (sum of all containers)
  - type: Pod
    max:
      cpu: "8"
      memory: "16Gi"

  # PVC storage bounds
  - type: PersistentVolumeClaim
    max:
      storage: "100Gi"
    min:
      storage: "1Gi"
# Describe the LimitRange to see what defaults are applied
kubectl describe limitrange production-limits -n production

# Test: create a pod without resources — LimitRange defaults apply
kubectl run test-pod --image=nginx -n production
kubectl get pod test-pod -n production -o jsonpath='{.spec.containers[0].resources}'
# Output: {"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}

VPA Recommendations for Right-Sizing

Setting accurate requests requires observing actual usage over time. VPA in Off mode provides recommendations without making any changes — ideal for right-sizing analysis.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"      # Recommendations only — no changes applied
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 8
        memory: 16Gi
# Read VPA recommendations after ~24h of data collection
kubectl get vpa my-app-vpa -n production -o yaml

# The recommendation section shows:
# status:
#   recommendation:
#     containerRecommendations:
#     - containerName: app
#       lowerBound:          # Conservative minimum
#         cpu: 87m
#         memory: 262144k
#       target:              # Recommended value to set
#         cpu: 587m
#         memory: 524288k
#       upperBound:          # Upper bound for limit
#         cpu: 2347m
#         memory: 2097152k
Pro Tip: Run VPA in Off mode for one full week (including weekend traffic patterns) before applying recommendations. Set your new requests to the VPA target value and your limits to the upperBound value. Then switch to VPA Initial mode to automatically right-size new pods at scheduling time.

Common Mistakes and How to Avoid Them

Mistake 1: Setting CPU Limits Too Low

CPU throttling is silent — pods don't crash, they just run slowly. A Java app with a 200m CPU limit may take 5x longer to complete startup than without a limit because JIT compilation is CPU-intensive.

# Detect CPU throttling
kubectl exec -n production my-app-pod -- cat /sys/fs/cgroup/cpu/cpu.stat
# Look for: throttled_time (nanoseconds spent throttled)

# Or via Prometheus:
# rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
# If throttling > 10%, your CPU limit is too low

Mistake 2: Not Setting Memory Limits

Without a memory limit, a leaking or misbehaving container can consume all node memory, triggering OOMKill on other pods (the kernel kills the largest consumer). Always set memory limits in production.

Mistake 3: Setting Requests Equal to Limits for CPU on Batch Jobs

This wastes node capacity. Batch jobs that run occasionally don't need guaranteed CPU. Set a low CPU request (so the scheduler places them efficiently) and a higher limit (so they finish quickly when they run). Use Guaranteed QoS only for latency-sensitive services.

Mistake 4: No Requests Set at All

Without requests, the scheduler has no resource information and may place many pods on a single node, causing contention. The pod gets BestEffort QoS and is evicted first under memory pressure. Always set at minimum a memory request.

Frequently Asked Questions

What happens when a node runs out of memory?

The Linux OOM killer activates and kills processes to reclaim memory. Kubernetes applies eviction policies before this happens: the kubelet evicts BestEffort pods first, then Burstable pods (in order of how much they exceed their requests), then Guaranteed pods. Configure eviction thresholds in the kubelet config with evictionHard (e.g., memory.available: "200Mi"). Evicted pods are rescheduled on other nodes; OOMKilled containers restart on the same pod.

Should I set CPU limits in production?

This is genuinely debated. The argument against CPU limits: they cause throttling even when the node has spare CPU, increasing tail latency for no benefit. The argument for: without limits, a noisy neighbor can consume all node CPU, degrading every other pod. The pragmatic approach: set CPU limits at 2–4x your CPU request for most services, monitor throttling, and remove limits only for latency-critical paths where you've confirmed the node is never saturated. Always set memory limits — the case against memory limits is much weaker.

How do ResourceQuota and LimitRange interact?

They work together. LimitRange provides defaults so pods without explicit resources are auto-populated (required when ResourceQuota is present). ResourceQuota enforces aggregate caps across all pods in the namespace. A pod can pass LimitRange validation (its individual limits are within bounds) but still be rejected if creating it would exceed the namespace's ResourceQuota. Check both when debugging pod creation failures.

Why does my pod show OOMKilled with Exit Code 137?

Exit code 137 = 128 + 9 (SIGKILL). The container was killed with signal 9, which in Kubernetes almost always means the cgroup memory limit was exceeded (OOMKill) or a liveness probe failed and the kubelet sent SIGKILL after the grace period. Check kubectl describe pod for Reason: OOMKilled in the Last State section to confirm it's memory-related. Exit code 137 from a liveness probe failure shows Reason: Error instead.

How do I find which pods are consuming the most resources cluster-wide?

Use kubectl top pods --all-namespaces --sort-by=memory for a quick view. For deeper analysis, use PromQL: topk(10, sum by(pod, namespace)(container_memory_working_set_bytes{container!=""})) gives the top 10 memory consumers across the entire cluster. For CPU: topk(10, sum by(pod, namespace)(rate(container_cpu_usage_seconds_total{container!=""}[5m]))). The Grafana "Kubernetes / Compute Resources / Cluster" dashboard shows this visually.