Kubernetes Taints and Tolerations: Node Scheduling Guide (2026)

Kubernetes Taints and Tolerations

1. Scheduling Concepts Recap: How the Scheduler Selects Nodes

Before diving into taints and tolerations, it helps to understand how the Kubernetes scheduler decides where to place a pod. When you submit a pod spec, the scheduler runs a two-phase pipeline:

  • Filtering — eliminates nodes that cannot run the pod (insufficient CPU/memory, node not ready, node selector mismatch, etc.).
  • Scoring — ranks remaining nodes by how well they fit the pod (least-loaded, image locality, spread constraints, affinity weights, etc.).

The scheduler then places the pod on the highest-scoring node. Taints and tolerations plug into the filtering phase: a taint on a node causes the scheduler to filter out that node for pods that do not carry a matching toleration. This gives cluster operators a powerful repulsion mechanism — the opposite of node affinity's attraction model.

For a broader overview of workload primitives, see the Kubernetes Complete Guide and the Pods Guide. Resource requests and limits — which feed directly into the scheduling filter — are covered in Kubernetes Resource Management.

2. What Are Taints? The key=value:effect Syntax

A taint is a label-like annotation applied to a node that signals "pods should not be placed here unless they explicitly opt in." A taint has three parts:

key=value:effect
  • key — an arbitrary string, e.g. gpu, team, node.kubernetes.io/not-ready.
  • value — optional; can be empty, e.g. gpu=true or just gpu:NoSchedule.
  • effect — one of NoSchedule, PreferNoSchedule, or NoExecute.

A matching toleration lives in the pod spec and declares that the pod can tolerate the specified taint. The scheduler only places the pod on a tainted node when every taint on that node has a matching toleration on the pod (with some nuance for PreferNoSchedule).

Key insight: Taints repel; tolerations permit. A toleration does not attract a pod to a node — it merely removes the barrier. Use node affinity when you need positive attraction.

3. NoSchedule: Hard Block

NoSchedule is the strictest effect. The scheduler will never place a new pod on a tainted node unless the pod has a matching toleration. Pods already running on the node are not evicted.

Adding a NoSchedule taint via kubectl

# Add a taint
kubectl taint node worker-node-1 dedicated=ml-workloads:NoSchedule

# Verify
kubectl describe node worker-node-1 | grep -A5 Taints

Pod with a matching toleration

apiVersion: v1
kind: Pod
metadata:
  name: ml-training-job
  labels:
    app: ml-training
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "ml-workloads"
      effect: "NoSchedule"
  containers:
    - name: trainer
      image: tensorflow/tensorflow:2.16.0-gpu
      resources:
        limits:
          nvidia.com/gpu: "1"

Without this toleration the pod would stay in Pending state because the only GPU node is tainted. With it, the scheduler treats the node as available again. Note that you still need adequate CPU and memory — the toleration only lifts the taint barrier.

Tip: NoSchedule is additive — a node can carry multiple taints. A pod must tolerate all of them to be scheduled onto that node.

4. PreferNoSchedule: Soft Preference

PreferNoSchedule tells the scheduler "try not to place pods here, but do so if there is no better option." It is a best-effort hint, not a hard constraint. The scheduler will still place pods without a matching toleration onto the node if all other nodes have been filtered out or have lower scores.

# Add a soft taint
kubectl taint node worker-node-2 environment=staging:PreferNoSchedule
apiVersion: v1
kind: Pod
metadata:
  name: staging-api
spec:
  tolerations:
    - key: "environment"
      operator: "Equal"
      value: "staging"
      effect: "PreferNoSchedule"
  containers:
    - name: api
      image: my-api:latest

Common use cases for PreferNoSchedule:

  • Marking a node as degraded but still usable during an incident.
  • Discouraging general workloads from landing on nodes reserved for batch jobs without hard-blocking them during peak load.
  • Gradual node drains where you want to reduce load before cordoning.

5. NoExecute: Evicting Running Pods

NoExecute is the most powerful effect. It does everything NoSchedule does, plus it evicts pods already running on the node that do not have a matching toleration. This is the effect Kubernetes itself uses when a node becomes unhealthy.

# Mark a node for maintenance — evict all non-tolerated pods
kubectl taint node worker-node-3 maintenance=true:NoExecute

Graceful eviction with tolerationSeconds

You can give pods a grace window before eviction using tolerationSeconds. The pod is allowed to keep running for the specified number of seconds after the taint is applied, then evicted.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: long-running-job
spec:
  replicas: 3
  selector:
    matchLabels:
      app: long-running-job
  template:
    metadata:
      labels:
        app: long-running-job
    spec:
      tolerations:
        # Allow the pod to keep running for 5 minutes after a NoExecute taint appears
        - key: "maintenance"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"
          tolerationSeconds: 300
      containers:
        - name: worker
          image: my-worker:1.0.0
Built-in behavior: Kubernetes automatically adds node.kubernetes.io/not-ready:NoExecute and node.kubernetes.io/unreachable:NoExecute taints to unhealthy nodes. By default, pods have an implicit toleration of 300 seconds for these, which is why pods are not immediately evicted when a node briefly loses connectivity.

6. Adding and Removing Taints

All taint management goes through kubectl taint. The syntax mirrors the label syntax closely.

# Add a taint
kubectl taint node <node-name> <key>=<value>:<effect>

# Remove a taint (note the trailing minus sign)
kubectl taint node <node-name> <key>=<value>:<effect>-

# Remove all taints with a given key regardless of value/effect
kubectl taint node <node-name> <key>-

# Practical examples
kubectl taint node gpu-node-1 nvidia.com/gpu=present:NoSchedule
kubectl taint node gpu-node-1 nvidia.com/gpu=present:NoSchedule-   # remove

# Apply same taint to all nodes matching a label selector
kubectl taint nodes -l cloud.google.com/gke-spot=true \
  cloud.google.com/gke-spot=true:NoExecute

Viewing taints on nodes

# Describe a single node
kubectl describe node worker-node-1 | grep -A10 Taints

# List all nodes with their taints in JSON
kubectl get nodes -o json \
  | jq '.items[] | {name: .metadata.name, taints: .spec.taints}'

# Quick tabular view with custom columns
kubectl get nodes \
  -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

7. Toleration Operators: Equal vs Exists

The operator field in a toleration controls how the key and value are matched:

  • Equal (default) — both the key and value must match the taint exactly.
  • Exists — only the key must match; the value is ignored. Omit the value field when using Exists.
spec:
  tolerations:
    # Equal: match key=dedicated, value=gpu-team exactly
    - key: "dedicated"
      operator: "Equal"
      value: "gpu-team"
      effect: "NoSchedule"

    # Exists: match any taint with key=dedicated, regardless of value
    - key: "dedicated"
      operator: "Exists"
      effect: "NoSchedule"

    # Wildcard: tolerate ALL taints on any node
    # (omit key, operator=Exists, omit effect)
    - operator: "Exists"
Warning: The wildcard toleration (operator: Exists with no key or effect) makes a pod schedulable on any node, including control-plane nodes and nodes under maintenance. Only use it for system-critical DaemonSet pods such as log collectors or CNI plugins.

8. Use Case 1: Dedicated GPU Nodes

GPU nodes are expensive. The classic pattern is to taint them so only GPU workloads land there, while also using nodeSelector or node affinity to attract those workloads to GPU nodes specifically.

# 1. Label the GPU node
kubectl label node gpu-node-1 accelerator=nvidia-tesla-v100

# 2. Taint the GPU node
kubectl taint node gpu-node-1 nvidia.com/gpu=present:NoSchedule
# 3. GPU workload pod spec
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-inference
  namespace: ml-platform
spec:
  replicas: 2
  selector:
    matchLabels:
      app: model-inference
  template:
    metadata:
      labels:
        app: model-inference
    spec:
      # Toleration lifts the taint barrier
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Equal"
          value: "present"
          effect: "NoSchedule"
      # nodeSelector (or affinity) provides positive attraction
      nodeSelector:
        accelerator: nvidia-tesla-v100
      containers:
        - name: inference-server
          image: tritonserver:24.04-py3
          resources:
            limits:
              nvidia.com/gpu: "1"
              memory: "16Gi"
            requests:
              cpu: "4"
              memory: "8Gi"

This two-pronged approach — taint (repulsion) + nodeSelector (attraction) — ensures GPU nodes are used only by GPU workloads and GPU workloads are always placed on GPU nodes. For autoscaling these workloads see Kubernetes HPA Scaling.

9. Use Case 2: Spot / Preemptible Instances

AWS, GCP, and Azure node pools for spot/preemptible instances are typically tainted automatically by the cloud provider's node lifecycle controller. The canonical AWS taint is:

node.kubernetes.io/lifecycle=spot:NoSchedule

Workloads that can tolerate spot interruptions (batch jobs, stateless microservices) opt in via a toleration; stateful or latency-sensitive workloads that should never land on spot nodes simply omit the toleration.

apiVersion: batch/v1
kind: Job
metadata:
  name: nightly-etl
spec:
  template:
    spec:
      restartPolicy: OnFailure
      tolerations:
        # Accept spot interruption
        - key: "node.kubernetes.io/lifecycle"
          operator: "Equal"
          value: "spot"
          effect: "NoSchedule"
        # Graceful 2-minute window if the node is reclaimed
        - key: "node.kubernetes.io/lifecycle"
          operator: "Equal"
          value: "spot"
          effect: "NoExecute"
          tolerationSeconds: 120
      # Prefer spot nodes to save cost, fall back to on-demand
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: node.kubernetes.io/lifecycle
                    operator: In
                    values: ["spot"]
      containers:
        - name: etl
          image: my-etl-pipeline:2026.1
Best practice: Always pair the spot toleration with a NoExecute tolerationSeconds so your pod has time to checkpoint or drain before the node is reclaimed. 120 seconds is a reasonable starting point; AWS gives a 2-minute interruption notice.

10. Use Case 3: Dedicated Nodes per Team

Large multi-tenant clusters often need to give each team exclusive access to a node pool for compliance, performance isolation, or cost allocation. Taints are the enforcement mechanism; Kubernetes namespaces provide the organizational boundary.

# Label and taint a pool for the payments team
kubectl label node payments-node-{1..3} team=payments
kubectl taint node payments-node-{1..3} team=payments:NoSchedule

# Every pod in the payments namespace must carry this toleration.
# Enforce it via a MutatingAdmissionWebhook or OPA/Gatekeeper policy,
# or simply add it to every Deployment in the namespace manually.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
  namespace: payments
spec:
  replicas: 4
  selector:
    matchLabels:
      app: payment-processor
  template:
    metadata:
      labels:
        app: payment-processor
    spec:
      tolerations:
        - key: "team"
          operator: "Equal"
          value: "payments"
          effect: "NoSchedule"
      # Positive affinity: prefer payments nodes
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: team
                    operator: In
                    values: ["payments"]
      containers:
        - name: processor
          image: payment-processor:3.1.0

For network-level isolation between teams in the same cluster, combine this pattern with Kubernetes Network Policies. For access control, see RBAC Security.

11. Use Case 4: Control-Plane Taint

By default, Kubernetes taints control-plane nodes (formerly called "master" nodes) to prevent user workloads from running there. You will see this taint on any kubeadm-provisioned cluster:

node-role.kubernetes.io/control-plane:NoSchedule

System components that must run on control-plane nodes — such as kube-proxy, coredns, and CNI DaemonSets — carry matching tolerations:

# From the kube-proxy DaemonSet (kubectl -n kube-system get ds kube-proxy -o yaml)
tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - operator: Exists          # wildcard: runs everywhere including control-plane

If you want to run user workloads on control-plane nodes in a small lab cluster (not recommended in production), remove the taint:

# Single-node cluster / lab only — do NOT do this in production
kubectl taint node control-plane-node \
  node-role.kubernetes.io/control-plane:NoSchedule-
Security note: Removing the control-plane taint in production allows arbitrary user workloads to run next to etcd and the API server, which is a significant security risk. Review Kubernetes Security Best Practices for hardening guidance.

12. Taints vs nodeAffinity vs nodeSelector — Decision Guide

Kubernetes provides three mechanisms for controlling pod placement. Understanding which to use — and when to combine them — is essential for building predictable scheduling behavior.

Mechanism Direction Hard or Soft Lives On Best For
nodeSelector Attraction Hard Pod spec Simple key=value node matching
nodeAffinity required Attraction Hard Pod spec Complex expressions (In, NotIn, Gt, Lt)
nodeAffinity preferred Attraction Soft Pod spec Weighted preference (e.g., prefer spot, fall back to on-demand)
Taint NoSchedule Repulsion Hard Node Blocking untolerated pods from a node
Taint PreferNoSchedule Repulsion Soft Node Discouraging workloads without a hard block
Taint NoExecute Repulsion + Eviction Hard Node Node maintenance, health-based eviction

Decision guide

  • I want pods to go to specific nodes → use nodeSelector or nodeAffinity.
  • I want to keep untolerated pods off specific nodes → use taints.
  • I want both (dedicated nodes with guaranteed placement) → combine taints with nodeAffinity.
  • I want to evict pods during maintenance → use NoExecute taints (or kubectl drain, which does this automatically).

13. Built-in Taints: How Kubernetes Uses Them Internally

The node lifecycle controller automatically applies several well-known taints to reflect node health. Understanding these helps you debug unexpected pod evictions.

Taint Key Effect Trigger
node.kubernetes.io/not-ready NoExecute Node's Ready condition is False
node.kubernetes.io/unreachable NoExecute Node's Ready condition is Unknown (kubelet lost contact)
node.kubernetes.io/memory-pressure NoSchedule Node reports MemoryPressure=True
node.kubernetes.io/disk-pressure NoSchedule Node reports DiskPressure=True
node.kubernetes.io/pid-pressure NoSchedule Node reports PIDPressure=True
node.kubernetes.io/network-unavailable NoSchedule Node network not configured (CNI not ready)
node.kubernetes.io/unschedulable NoSchedule Node is cordoned (kubectl cordon)
node.cloudprovider.kubernetes.io/uninitialized NoSchedule Node not yet initialized by cloud provider controller

All pods automatically receive implicit tolerations for not-ready and unreachable with tolerationSeconds: 300 (set by the admission controller). This gives pods five minutes before eviction when a node temporarily loses connectivity. You can override this per-workload:

spec:
  tolerations:
    # Evict much sooner — useful for stateless, fast-restart services
    - key: "node.kubernetes.io/not-ready"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 30
    - key: "node.kubernetes.io/unreachable"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 30

For cluster-wide monitoring of node conditions that trigger these taints, see Kubernetes Monitoring with Prometheus.

14. Troubleshooting: Pod Stuck in Pending Due to Taints

The most common taint-related issue is a pod stuck in Pending with no obvious error in the pod's Status. The diagnosis workflow is straightforward.

Step 1: Check pod events

kubectl describe pod <pod-name> -n <namespace>

# Look for lines like:
# Warning  FailedScheduling  ... 0/5 nodes are available:
#   5 node(s) had untolerated taint {dedicated: ml-workloads}.
#   preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.

Step 2: Inspect node taints

# Check all nodes for taints
kubectl get nodes -o json \
  | jq '.items[] | select(.spec.taints != null) | {name: .metadata.name, taints: .spec.taints}'

# Or describe each candidate node
kubectl describe node <node-name> | grep -A20 Taints

Step 3: Check the pod's tolerations

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.tolerations}'

Common mismatches to look for

  • Wrong operator: using Equal when the taint has no value (use Exists instead).
  • Wrong effect: toleration specifies NoSchedule but the taint uses NoExecute.
  • Typo in key or value: nvidia.com/gpu vs Nvidia.com/GPU — keys are case-sensitive.
  • Missing toleration for a second taint: node has two taints but the pod only tolerates one.
  • Namespace quota or resource limit: pod passes taint filtering but fails resource filtering — check kubectl describe pod for resource-related events too.

Quick fix: remove a taint for debugging

# Temporarily remove taint to confirm it is the cause
kubectl taint node <node-name> dedicated=ml-workloads:NoSchedule-

# After confirming the pod schedules, re-add the taint and fix the toleration
kubectl taint node <node-name> dedicated=ml-workloads:NoSchedule
Pro tip: Use kubectl get events --sort-by='.lastTimestamp' -n <namespace> to see a timeline of scheduling failures across all pods in the namespace at once — much faster than describing each pod individually.

For broader scheduling debugging, the Kubernetes Deployments guide covers rollout troubleshooting, and Security Best Practices explains how RBAC can inadvertently affect scheduling by blocking access to node resources.

Summary

Kubernetes taints and tolerations give cluster operators a clean, declarative mechanism for node-level workload isolation:

  • NoSchedule — hard block; new pods without a matching toleration will not be placed on the node.
  • PreferNoSchedule — soft preference; the scheduler avoids the node but can use it as a last resort.
  • NoExecute — evicts running pods that do not tolerate the taint, with optional tolerationSeconds for graceful shutdown.
  • Combine taints with nodeAffinity for dedicated node pools (repulsion + attraction).
  • Use operator: Exists to match any value for a key, or the full wildcard to tolerate all taints.
  • Kubernetes applies built-in taints automatically for node health conditions — understanding them prevents surprise evictions.
  • Debug pending pods with kubectl describe pod, check node taints, and verify toleration key/value/effect match exactly.