Kubernetes Resource Quotas and LimitRanges Guide

Without resource controls, a single misbehaving application can consume all available CPU and memory on a cluster, starving every other workload. Kubernetes provides two complementary admission-level controls to prevent this: ResourceQuota sets hard ceilings on the total resources a namespace can consume, while LimitRange enforces per-container and per-pod constraints and injects sensible defaults when developers forget to set them. Together, they are the foundation of fair resource sharing in multi-tenant Kubernetes clusters.

Requests vs Limits: The Foundation

Before understanding quotas, you must understand the difference between resource requests and limits, as both are tracked differently by the scheduler and the quota system.

  • Requests: The amount of resource the scheduler guarantees will be available to the container. The scheduler uses requests to decide which node has enough free capacity for the pod. ResourceQuota tracks total requests across all pods in a namespace.
  • Limits: The maximum resource a container is allowed to use. Exceeding CPU limits causes throttling; exceeding memory limits causes the container to be OOM-killed.
apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
    - name: app
      image: nginx
      resources:
        requests:
          cpu: 250m       # 0.25 CPU cores guaranteed
          memory: 256Mi   # 256MB RAM guaranteed
        limits:
          cpu: 1000m      # max 1 CPU core
          memory: 512Mi   # max 512MB RAM (OOM if exceeded)
QoS classes: Kubernetes assigns Quality of Service classes based on how requests and limits are set. Guaranteed (requests == limits) pods are the last to be evicted under memory pressure. Burstable (requests < limits) are evicted next. BestEffort (no requests or limits) are evicted first. Set requests and limits on all production containers to achieve Guaranteed or Burstable QoS.

ResourceQuota: Namespace Ceilings

ResourceQuota is enforced at admission time — when a pod is created or updated. If creating the pod would exceed the namespace quota, the admission controller rejects it with a clear error message.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: team-payments
spec:
  hard:
    # CPU: sum of all container requests/limits in the namespace
    requests.cpu: "10"          # 10 CPU cores total requests
    limits.cpu: "20"            # 20 CPU cores total limits
    # Memory
    requests.memory: 20Gi
    limits.memory: 40Gi
    # Pods
    pods: "100"
    # Services
    services: "30"
    services.loadbalancers: "3"
    services.nodeports: "0"     # block NodePort services
    # Storage
    requests.storage: 500Gi
    persistentvolumeclaims: "20"
    # Extended resources (e.g., GPUs)
    requests.nvidia.com/gpu: "4"
# Check quota status — used vs hard limits
kubectl describe resourcequota production-quota -n team-payments

# Output shows:
# Name:            production-quota
# Namespace:       team-payments
# Resource         Used    Hard
# --------         ----    ----
# limits.cpu       6500m   20
# limits.memory    12Gi    40Gi
# pods             23      100
# requests.cpu     3200m   10
# requests.memory  6Gi     20Gi

Quota Scopes and Priority Classes

Quota scopes let you apply different quota rules to different subsets of pods within the same namespace. This is particularly useful when combined with PriorityClasses to grant production-critical pods more resources than batch jobs.

# Quota for BestEffort pods only (no requests/limits)
apiVersion: v1
kind: ResourceQuota
metadata:
  name: besteffort-quota
  namespace: team-payments
spec:
  hard:
    pods: "10"
  scopeSelector:
    matchExpressions:
      - operator: In
        scopeName: QOSClass
        values: ["BestEffort"]

---
# Separate quota for high-priority production pods
apiVersion: v1
kind: ResourceQuota
metadata:
  name: high-priority-quota
  namespace: team-payments
spec:
  hard:
    pods: "20"
    requests.cpu: "8"
    requests.memory: 16Gi
  scopeSelector:
    matchExpressions:
      - operator: In
        scopeName: PriorityClass
        values: ["high-priority"]
# Define the PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "For critical production workloads"

LimitRange: Per-Object Constraints

LimitRange operates at the individual container/pod/PVC level. Its most important function is injecting default resource requests and limits for pods that don't specify them — without this, pods without resource specs can consume unlimited resources and bypass quota tracking (since quota only counts resources that are explicitly requested).

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-payments
spec:
  limits:
    # Container-level defaults and bounds
    - type: Container
      default:
        cpu: 500m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: "4"
        memory: 4Gi
      min:
        cpu: 50m
        memory: 64Mi
      # maxLimitRequestRatio limits the burst factor
      # (limit / request <= ratio)
      maxLimitRequestRatio:
        cpu: "4"        # limit can be at most 4x the request
        memory: "2"

    # Pod-level max (sum of all containers)
    - type: Pod
      max:
        cpu: "8"
        memory: 8Gi

    # PVC size bounds
    - type: PersistentVolumeClaim
      max:
        storage: 100Gi
      min:
        storage: 1Gi
Interaction with quota: When a LimitRange injects default requests/limits into a pod, those values count toward the namespace ResourceQuota. This means a namespace with a quota but no LimitRange will reject pods with no resource specs (because the quota requires requests to be set). Always deploy LimitRange alongside ResourceQuota.

Storage Quotas and StorageClass Limits

Storage quotas prevent teams from provisioning more persistent storage than allocated. You can set quotas globally or per StorageClass, which is useful when different StorageClasses have different costs (e.g., SSD vs HDD).

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: team-payments
spec:
  hard:
    # Total storage across all PVCs
    requests.storage: 500Gi
    # Total PVC count
    persistentvolumeclaims: "20"
    # Per-StorageClass limits (storageClass.storageClassName/requests.storage)
    gold.storageclass.storage.k8s.io/requests.storage: 100Gi
    silver.storageclass.storage.k8s.io/requests.storage: 400Gi
    # Limit PVCs on premium storage class
    gold.storageclass.storage.k8s.io/persistentvolumeclaims: "5"

Object Count Quotas

Some Kubernetes objects consume control plane resources even without consuming compute. Secrets are particularly important to limit — each secret is stored in etcd and mounted to pods that reference it. Too many secrets can cause etcd performance degradation and node startup latency.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: object-count-quota
  namespace: team-payments
spec:
  hard:
    count/pods: "100"
    count/services: "30"
    count/secrets: "100"
    count/configmaps: "50"
    count/persistentvolumeclaims: "20"
    count/deployments.apps: "30"
    count/statefulsets.apps: "10"
    count/jobs.batch: "20"
    count/cronjobs.batch: "10"
    # CRD object counts
    count/ingressroutes.traefik.io: "20"

Monitoring Quota Usage

Proactively alert when a namespace approaches its quota ceiling to give teams time to request increases before pods start being rejected.

# Prometheus alerting rules for quota usage
groups:
  - name: kubernetes-quota
    rules:
      - alert: NamespaceCPUQuotaUsageHigh
        expr: >
          (kube_resourcequota{resource="requests.cpu", type="used"}
          / kube_resourcequota{resource="requests.cpu", type="hard"}) > 0.85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of CPU quota"

      - alert: NamespaceMemoryQuotaUsageHigh
        expr: >
          (kube_resourcequota{resource="requests.memory", type="used"}
          / kube_resourcequota{resource="requests.memory", type="hard"}) > 0.85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of memory quota"

Design Patterns and Best Practices

Key recommendations for implementing quotas and LimitRanges in production clusters:

  • Start permissive, tighten over time: Collect actual resource usage data for 2-4 weeks before setting quotas. Use the namespace_workload_* Prometheus metrics from kube-state-metrics for accurate baselines.
  • Always pair LimitRange with ResourceQuota: Quota without LimitRange breaks pods that omit resource specs. LimitRange ensures every pod contributes to quota tracking.
  • Use a quota approval workflow: Treat quota increases like infra capacity requests. Require a Jira ticket or PR approval before bumping namespace quotas, to maintain visibility into cluster capacity trends.
  • Reserve headroom: Set namespace quotas to 80% of the node pool capacity you want to dedicate to that team. Leave 20% for bursting and node maintenance (pod rescheduling during drains).
  • Separate quotas for different environments: Production namespaces get generous quotas; dev namespaces get tight ones to encourage efficiency and catch resource leaks early.
Vertical Pod Autoscaler (VPA): For teams that struggle to set accurate resource requests, deploy VPA in recommendation mode. VPA observes actual usage and suggests right-sized requests/limits. Review VPA recommendations weekly and update deployments accordingly.