Kubernetes Resource Quotas and LimitRanges Guide
Without resource controls, a single misbehaving application can consume all available CPU and memory on a cluster, starving every other workload. Kubernetes provides two complementary admission-level controls to prevent this: ResourceQuota sets hard ceilings on the total resources a namespace can consume, while LimitRange enforces per-container and per-pod constraints and injects sensible defaults when developers forget to set them. Together, they are the foundation of fair resource sharing in multi-tenant Kubernetes clusters.
Table of Contents
Requests vs Limits: The Foundation
Before understanding quotas, you must understand the difference between resource requests and limits, as both are tracked differently by the scheduler and the quota system.
- Requests: The amount of resource the scheduler guarantees will be available to the container. The scheduler uses requests to decide which node has enough free capacity for the pod. ResourceQuota tracks total requests across all pods in a namespace.
- Limits: The maximum resource a container is allowed to use. Exceeding CPU limits causes throttling; exceeding memory limits causes the container to be OOM-killed.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
containers:
- name: app
image: nginx
resources:
requests:
cpu: 250m # 0.25 CPU cores guaranteed
memory: 256Mi # 256MB RAM guaranteed
limits:
cpu: 1000m # max 1 CPU core
memory: 512Mi # max 512MB RAM (OOM if exceeded)
ResourceQuota: Namespace Ceilings
ResourceQuota is enforced at admission time — when a pod is created or updated. If creating the pod would exceed the namespace quota, the admission controller rejects it with a clear error message.
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: team-payments
spec:
hard:
# CPU: sum of all container requests/limits in the namespace
requests.cpu: "10" # 10 CPU cores total requests
limits.cpu: "20" # 20 CPU cores total limits
# Memory
requests.memory: 20Gi
limits.memory: 40Gi
# Pods
pods: "100"
# Services
services: "30"
services.loadbalancers: "3"
services.nodeports: "0" # block NodePort services
# Storage
requests.storage: 500Gi
persistentvolumeclaims: "20"
# Extended resources (e.g., GPUs)
requests.nvidia.com/gpu: "4"
# Check quota status — used vs hard limits
kubectl describe resourcequota production-quota -n team-payments
# Output shows:
# Name: production-quota
# Namespace: team-payments
# Resource Used Hard
# -------- ---- ----
# limits.cpu 6500m 20
# limits.memory 12Gi 40Gi
# pods 23 100
# requests.cpu 3200m 10
# requests.memory 6Gi 20Gi
Quota Scopes and Priority Classes
Quota scopes let you apply different quota rules to different subsets of pods within the same namespace. This is particularly useful when combined with PriorityClasses to grant production-critical pods more resources than batch jobs.
# Quota for BestEffort pods only (no requests/limits)
apiVersion: v1
kind: ResourceQuota
metadata:
name: besteffort-quota
namespace: team-payments
spec:
hard:
pods: "10"
scopeSelector:
matchExpressions:
- operator: In
scopeName: QOSClass
values: ["BestEffort"]
---
# Separate quota for high-priority production pods
apiVersion: v1
kind: ResourceQuota
metadata:
name: high-priority-quota
namespace: team-payments
spec:
hard:
pods: "20"
requests.cpu: "8"
requests.memory: 16Gi
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high-priority"]
# Define the PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "For critical production workloads"
LimitRange: Per-Object Constraints
LimitRange operates at the individual container/pod/PVC level. Its most important function is injecting default resource requests and limits for pods that don't specify them — without this, pods without resource specs can consume unlimited resources and bypass quota tracking (since quota only counts resources that are explicitly requested).
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-payments
spec:
limits:
# Container-level defaults and bounds
- type: Container
default:
cpu: 500m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
max:
cpu: "4"
memory: 4Gi
min:
cpu: 50m
memory: 64Mi
# maxLimitRequestRatio limits the burst factor
# (limit / request <= ratio)
maxLimitRequestRatio:
cpu: "4" # limit can be at most 4x the request
memory: "2"
# Pod-level max (sum of all containers)
- type: Pod
max:
cpu: "8"
memory: 8Gi
# PVC size bounds
- type: PersistentVolumeClaim
max:
storage: 100Gi
min:
storage: 1Gi
Storage Quotas and StorageClass Limits
Storage quotas prevent teams from provisioning more persistent storage than allocated. You can set quotas globally or per StorageClass, which is useful when different StorageClasses have different costs (e.g., SSD vs HDD).
apiVersion: v1
kind: ResourceQuota
metadata:
name: storage-quota
namespace: team-payments
spec:
hard:
# Total storage across all PVCs
requests.storage: 500Gi
# Total PVC count
persistentvolumeclaims: "20"
# Per-StorageClass limits (storageClass.storageClassName/requests.storage)
gold.storageclass.storage.k8s.io/requests.storage: 100Gi
silver.storageclass.storage.k8s.io/requests.storage: 400Gi
# Limit PVCs on premium storage class
gold.storageclass.storage.k8s.io/persistentvolumeclaims: "5"
Object Count Quotas
Some Kubernetes objects consume control plane resources even without consuming compute. Secrets are particularly important to limit — each secret is stored in etcd and mounted to pods that reference it. Too many secrets can cause etcd performance degradation and node startup latency.
apiVersion: v1
kind: ResourceQuota
metadata:
name: object-count-quota
namespace: team-payments
spec:
hard:
count/pods: "100"
count/services: "30"
count/secrets: "100"
count/configmaps: "50"
count/persistentvolumeclaims: "20"
count/deployments.apps: "30"
count/statefulsets.apps: "10"
count/jobs.batch: "20"
count/cronjobs.batch: "10"
# CRD object counts
count/ingressroutes.traefik.io: "20"
Monitoring Quota Usage
Proactively alert when a namespace approaches its quota ceiling to give teams time to request increases before pods start being rejected.
# Prometheus alerting rules for quota usage
groups:
- name: kubernetes-quota
rules:
- alert: NamespaceCPUQuotaUsageHigh
expr: >
(kube_resourcequota{resource="requests.cpu", type="used"}
/ kube_resourcequota{resource="requests.cpu", type="hard"}) > 0.85
for: 10m
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of CPU quota"
- alert: NamespaceMemoryQuotaUsageHigh
expr: >
(kube_resourcequota{resource="requests.memory", type="used"}
/ kube_resourcequota{resource="requests.memory", type="hard"}) > 0.85
for: 10m
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of memory quota"
Design Patterns and Best Practices
Key recommendations for implementing quotas and LimitRanges in production clusters:
- Start permissive, tighten over time: Collect actual resource usage data for 2-4 weeks before setting quotas. Use the
namespace_workload_*Prometheus metrics from kube-state-metrics for accurate baselines. - Always pair LimitRange with ResourceQuota: Quota without LimitRange breaks pods that omit resource specs. LimitRange ensures every pod contributes to quota tracking.
- Use a quota approval workflow: Treat quota increases like infra capacity requests. Require a Jira ticket or PR approval before bumping namespace quotas, to maintain visibility into cluster capacity trends.
- Reserve headroom: Set namespace quotas to 80% of the node pool capacity you want to dedicate to that team. Leave 20% for bursting and node maintenance (pod rescheduling during drains).
- Separate quotas for different environments: Production namespaces get generous quotas; dev namespaces get tight ones to encourage efficiency and catch resource leaks early.