Kubernetes Taints and Tolerations: Node Scheduling Guide (2026)
1. Scheduling Concepts Recap: How the Scheduler Selects Nodes
Before diving into taints and tolerations, it helps to understand how the Kubernetes scheduler decides where to place a pod. When you submit a pod spec, the scheduler runs a two-phase pipeline:
- Filtering — eliminates nodes that cannot run the pod (insufficient CPU/memory, node not ready, node selector mismatch, etc.).
- Scoring — ranks remaining nodes by how well they fit the pod (least-loaded, image locality, spread constraints, affinity weights, etc.).
The scheduler then places the pod on the highest-scoring node. Taints and tolerations plug into the filtering phase: a taint on a node causes the scheduler to filter out that node for pods that do not carry a matching toleration. This gives cluster operators a powerful repulsion mechanism — the opposite of node affinity's attraction model.
For a broader overview of workload primitives, see the Kubernetes Complete Guide and the Pods Guide. Resource requests and limits — which feed directly into the scheduling filter — are covered in Kubernetes Resource Management.
2. What Are Taints? The key=value:effect Syntax
A taint is a label-like annotation applied to a node that signals "pods should not be placed here unless they explicitly opt in." A taint has three parts:
key=value:effect
- key — an arbitrary string, e.g.
gpu,team,node.kubernetes.io/not-ready. - value — optional; can be empty, e.g.
gpu=trueor justgpu:NoSchedule. - effect — one of
NoSchedule,PreferNoSchedule, orNoExecute.
A matching toleration lives in the pod spec and declares that the pod can tolerate the specified taint. The scheduler only places the pod on a tainted node when every taint on that node has a matching toleration on the pod (with some nuance for PreferNoSchedule).
3. NoSchedule: Hard Block
NoSchedule is the strictest effect. The scheduler will never place a new pod on a tainted node unless the pod has a matching toleration. Pods already running on the node are not evicted.
Adding a NoSchedule taint via kubectl
# Add a taint
kubectl taint node worker-node-1 dedicated=ml-workloads:NoSchedule
# Verify
kubectl describe node worker-node-1 | grep -A5 Taints
Pod with a matching toleration
apiVersion: v1
kind: Pod
metadata:
name: ml-training-job
labels:
app: ml-training
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "ml-workloads"
effect: "NoSchedule"
containers:
- name: trainer
image: tensorflow/tensorflow:2.16.0-gpu
resources:
limits:
nvidia.com/gpu: "1"
Without this toleration the pod would stay in Pending state because the only GPU node is tainted. With it, the scheduler treats the node as available again. Note that you still need adequate CPU and memory — the toleration only lifts the taint barrier.
NoSchedule is additive — a node can carry multiple taints. A pod must tolerate all of them to be scheduled onto that node.
4. PreferNoSchedule: Soft Preference
PreferNoSchedule tells the scheduler "try not to place pods here, but do so if there is no better option." It is a best-effort hint, not a hard constraint. The scheduler will still place pods without a matching toleration onto the node if all other nodes have been filtered out or have lower scores.
# Add a soft taint
kubectl taint node worker-node-2 environment=staging:PreferNoSchedule
apiVersion: v1
kind: Pod
metadata:
name: staging-api
spec:
tolerations:
- key: "environment"
operator: "Equal"
value: "staging"
effect: "PreferNoSchedule"
containers:
- name: api
image: my-api:latest
Common use cases for PreferNoSchedule:
- Marking a node as degraded but still usable during an incident.
- Discouraging general workloads from landing on nodes reserved for batch jobs without hard-blocking them during peak load.
- Gradual node drains where you want to reduce load before cordoning.
5. NoExecute: Evicting Running Pods
NoExecute is the most powerful effect. It does everything NoSchedule does, plus it evicts pods already running on the node that do not have a matching toleration. This is the effect Kubernetes itself uses when a node becomes unhealthy.
# Mark a node for maintenance — evict all non-tolerated pods
kubectl taint node worker-node-3 maintenance=true:NoExecute
Graceful eviction with tolerationSeconds
You can give pods a grace window before eviction using tolerationSeconds. The pod is allowed to keep running for the specified number of seconds after the taint is applied, then evicted.
apiVersion: apps/v1
kind: Deployment
metadata:
name: long-running-job
spec:
replicas: 3
selector:
matchLabels:
app: long-running-job
template:
metadata:
labels:
app: long-running-job
spec:
tolerations:
# Allow the pod to keep running for 5 minutes after a NoExecute taint appears
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 300
containers:
- name: worker
image: my-worker:1.0.0
node.kubernetes.io/not-ready:NoExecute and node.kubernetes.io/unreachable:NoExecute taints to unhealthy nodes. By default, pods have an implicit toleration of 300 seconds for these, which is why pods are not immediately evicted when a node briefly loses connectivity.
6. Adding and Removing Taints
All taint management goes through kubectl taint. The syntax mirrors the label syntax closely.
# Add a taint
kubectl taint node <node-name> <key>=<value>:<effect>
# Remove a taint (note the trailing minus sign)
kubectl taint node <node-name> <key>=<value>:<effect>-
# Remove all taints with a given key regardless of value/effect
kubectl taint node <node-name> <key>-
# Practical examples
kubectl taint node gpu-node-1 nvidia.com/gpu=present:NoSchedule
kubectl taint node gpu-node-1 nvidia.com/gpu=present:NoSchedule- # remove
# Apply same taint to all nodes matching a label selector
kubectl taint nodes -l cloud.google.com/gke-spot=true \
cloud.google.com/gke-spot=true:NoExecute
Viewing taints on nodes
# Describe a single node
kubectl describe node worker-node-1 | grep -A10 Taints
# List all nodes with their taints in JSON
kubectl get nodes -o json \
| jq '.items[] | {name: .metadata.name, taints: .spec.taints}'
# Quick tabular view with custom columns
kubectl get nodes \
-o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
7. Toleration Operators: Equal vs Exists
The operator field in a toleration controls how the key and value are matched:
Equal(default) — both the key and value must match the taint exactly.Exists— only the key must match; the value is ignored. Omit thevaluefield when usingExists.
spec:
tolerations:
# Equal: match key=dedicated, value=gpu-team exactly
- key: "dedicated"
operator: "Equal"
value: "gpu-team"
effect: "NoSchedule"
# Exists: match any taint with key=dedicated, regardless of value
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
# Wildcard: tolerate ALL taints on any node
# (omit key, operator=Exists, omit effect)
- operator: "Exists"
operator: Exists with no key or effect) makes a pod schedulable on any node, including control-plane nodes and nodes under maintenance. Only use it for system-critical DaemonSet pods such as log collectors or CNI plugins.
8. Use Case 1: Dedicated GPU Nodes
GPU nodes are expensive. The classic pattern is to taint them so only GPU workloads land there, while also using nodeSelector or node affinity to attract those workloads to GPU nodes specifically.
# 1. Label the GPU node
kubectl label node gpu-node-1 accelerator=nvidia-tesla-v100
# 2. Taint the GPU node
kubectl taint node gpu-node-1 nvidia.com/gpu=present:NoSchedule
# 3. GPU workload pod spec
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-inference
namespace: ml-platform
spec:
replicas: 2
selector:
matchLabels:
app: model-inference
template:
metadata:
labels:
app: model-inference
spec:
# Toleration lifts the taint barrier
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
# nodeSelector (or affinity) provides positive attraction
nodeSelector:
accelerator: nvidia-tesla-v100
containers:
- name: inference-server
image: tritonserver:24.04-py3
resources:
limits:
nvidia.com/gpu: "1"
memory: "16Gi"
requests:
cpu: "4"
memory: "8Gi"
This two-pronged approach — taint (repulsion) + nodeSelector (attraction) — ensures GPU nodes are used only by GPU workloads and GPU workloads are always placed on GPU nodes. For autoscaling these workloads see Kubernetes HPA Scaling.
9. Use Case 2: Spot / Preemptible Instances
AWS, GCP, and Azure node pools for spot/preemptible instances are typically tainted automatically by the cloud provider's node lifecycle controller. The canonical AWS taint is:
node.kubernetes.io/lifecycle=spot:NoSchedule
Workloads that can tolerate spot interruptions (batch jobs, stateless microservices) opt in via a toleration; stateful or latency-sensitive workloads that should never land on spot nodes simply omit the toleration.
apiVersion: batch/v1
kind: Job
metadata:
name: nightly-etl
spec:
template:
spec:
restartPolicy: OnFailure
tolerations:
# Accept spot interruption
- key: "node.kubernetes.io/lifecycle"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Graceful 2-minute window if the node is reclaimed
- key: "node.kubernetes.io/lifecycle"
operator: "Equal"
value: "spot"
effect: "NoExecute"
tolerationSeconds: 120
# Prefer spot nodes to save cost, fall back to on-demand
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values: ["spot"]
containers:
- name: etl
image: my-etl-pipeline:2026.1
NoExecute tolerationSeconds so your pod has time to checkpoint or drain before the node is reclaimed. 120 seconds is a reasonable starting point; AWS gives a 2-minute interruption notice.
10. Use Case 3: Dedicated Nodes per Team
Large multi-tenant clusters often need to give each team exclusive access to a node pool for compliance, performance isolation, or cost allocation. Taints are the enforcement mechanism; Kubernetes namespaces provide the organizational boundary.
# Label and taint a pool for the payments team
kubectl label node payments-node-{1..3} team=payments
kubectl taint node payments-node-{1..3} team=payments:NoSchedule
# Every pod in the payments namespace must carry this toleration.
# Enforce it via a MutatingAdmissionWebhook or OPA/Gatekeeper policy,
# or simply add it to every Deployment in the namespace manually.
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processor
namespace: payments
spec:
replicas: 4
selector:
matchLabels:
app: payment-processor
template:
metadata:
labels:
app: payment-processor
spec:
tolerations:
- key: "team"
operator: "Equal"
value: "payments"
effect: "NoSchedule"
# Positive affinity: prefer payments nodes
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: team
operator: In
values: ["payments"]
containers:
- name: processor
image: payment-processor:3.1.0
For network-level isolation between teams in the same cluster, combine this pattern with Kubernetes Network Policies. For access control, see RBAC Security.
11. Use Case 4: Control-Plane Taint
By default, Kubernetes taints control-plane nodes (formerly called "master" nodes) to prevent user workloads from running there. You will see this taint on any kubeadm-provisioned cluster:
node-role.kubernetes.io/control-plane:NoSchedule
System components that must run on control-plane nodes — such as kube-proxy, coredns, and CNI DaemonSets — carry matching tolerations:
# From the kube-proxy DaemonSet (kubectl -n kube-system get ds kube-proxy -o yaml)
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists # wildcard: runs everywhere including control-plane
If you want to run user workloads on control-plane nodes in a small lab cluster (not recommended in production), remove the taint:
# Single-node cluster / lab only — do NOT do this in production
kubectl taint node control-plane-node \
node-role.kubernetes.io/control-plane:NoSchedule-
12. Taints vs nodeAffinity vs nodeSelector — Decision Guide
Kubernetes provides three mechanisms for controlling pod placement. Understanding which to use — and when to combine them — is essential for building predictable scheduling behavior.
| Mechanism | Direction | Hard or Soft | Lives On | Best For |
|---|---|---|---|---|
nodeSelector |
Attraction | Hard | Pod spec | Simple key=value node matching |
nodeAffinity required |
Attraction | Hard | Pod spec | Complex expressions (In, NotIn, Gt, Lt) |
nodeAffinity preferred |
Attraction | Soft | Pod spec | Weighted preference (e.g., prefer spot, fall back to on-demand) |
Taint NoSchedule |
Repulsion | Hard | Node | Blocking untolerated pods from a node |
Taint PreferNoSchedule |
Repulsion | Soft | Node | Discouraging workloads without a hard block |
Taint NoExecute |
Repulsion + Eviction | Hard | Node | Node maintenance, health-based eviction |
Decision guide
- I want pods to go to specific nodes → use
nodeSelectorornodeAffinity. - I want to keep untolerated pods off specific nodes → use taints.
- I want both (dedicated nodes with guaranteed placement) → combine taints with
nodeAffinity. - I want to evict pods during maintenance → use
NoExecutetaints (orkubectl drain, which does this automatically).
13. Built-in Taints: How Kubernetes Uses Them Internally
The node lifecycle controller automatically applies several well-known taints to reflect node health. Understanding these helps you debug unexpected pod evictions.
| Taint Key | Effect | Trigger |
|---|---|---|
node.kubernetes.io/not-ready |
NoExecute | Node's Ready condition is False |
node.kubernetes.io/unreachable |
NoExecute | Node's Ready condition is Unknown (kubelet lost contact) |
node.kubernetes.io/memory-pressure |
NoSchedule | Node reports MemoryPressure=True |
node.kubernetes.io/disk-pressure |
NoSchedule | Node reports DiskPressure=True |
node.kubernetes.io/pid-pressure |
NoSchedule | Node reports PIDPressure=True |
node.kubernetes.io/network-unavailable |
NoSchedule | Node network not configured (CNI not ready) |
node.kubernetes.io/unschedulable |
NoSchedule | Node is cordoned (kubectl cordon) |
node.cloudprovider.kubernetes.io/uninitialized |
NoSchedule | Node not yet initialized by cloud provider controller |
All pods automatically receive implicit tolerations for not-ready and unreachable with tolerationSeconds: 300 (set by the admission controller). This gives pods five minutes before eviction when a node temporarily loses connectivity. You can override this per-workload:
spec:
tolerations:
# Evict much sooner — useful for stateless, fast-restart services
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 30
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 30
For cluster-wide monitoring of node conditions that trigger these taints, see Kubernetes Monitoring with Prometheus.
14. Troubleshooting: Pod Stuck in Pending Due to Taints
The most common taint-related issue is a pod stuck in Pending with no obvious error in the pod's Status. The diagnosis workflow is straightforward.
Step 1: Check pod events
kubectl describe pod <pod-name> -n <namespace>
# Look for lines like:
# Warning FailedScheduling ... 0/5 nodes are available:
# 5 node(s) had untolerated taint {dedicated: ml-workloads}.
# preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
Step 2: Inspect node taints
# Check all nodes for taints
kubectl get nodes -o json \
| jq '.items[] | select(.spec.taints != null) | {name: .metadata.name, taints: .spec.taints}'
# Or describe each candidate node
kubectl describe node <node-name> | grep -A20 Taints
Step 3: Check the pod's tolerations
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.tolerations}'
Common mismatches to look for
- Wrong operator: using
Equalwhen the taint has no value (useExistsinstead). - Wrong effect: toleration specifies
NoSchedulebut the taint usesNoExecute. - Typo in key or value:
nvidia.com/gpuvsNvidia.com/GPU— keys are case-sensitive. - Missing toleration for a second taint: node has two taints but the pod only tolerates one.
- Namespace quota or resource limit: pod passes taint filtering but fails resource filtering — check
kubectl describe podfor resource-related events too.
Quick fix: remove a taint for debugging
# Temporarily remove taint to confirm it is the cause
kubectl taint node <node-name> dedicated=ml-workloads:NoSchedule-
# After confirming the pod schedules, re-add the taint and fix the toleration
kubectl taint node <node-name> dedicated=ml-workloads:NoSchedule
kubectl get events --sort-by='.lastTimestamp' -n <namespace> to see a timeline of scheduling failures across all pods in the namespace at once — much faster than describing each pod individually.
For broader scheduling debugging, the Kubernetes Deployments guide covers rollout troubleshooting, and Security Best Practices explains how RBAC can inadvertently affect scheduling by blocking access to node resources.
Summary
Kubernetes taints and tolerations give cluster operators a clean, declarative mechanism for node-level workload isolation:
- NoSchedule — hard block; new pods without a matching toleration will not be placed on the node.
- PreferNoSchedule — soft preference; the scheduler avoids the node but can use it as a last resort.
- NoExecute — evicts running pods that do not tolerate the taint, with optional
tolerationSecondsfor graceful shutdown. - Combine taints with
nodeAffinityfor dedicated node pools (repulsion + attraction). - Use
operator: Existsto match any value for a key, or the full wildcard to tolerate all taints. - Kubernetes applies built-in taints automatically for node health conditions — understanding them prevents surprise evictions.
- Debug pending pods with
kubectl describe pod, check node taints, and verify toleration key/value/effect match exactly.