Kubernetes DaemonSets: Node-Level Workloads and Log Agents (2026)

Kubernetes DaemonSets

What is a DaemonSet?

A DaemonSet is a Kubernetes workload controller that ensures a copy of a specific Pod runs on every node in a cluster (or a targeted subset of nodes). Unlike a Deployment, which spreads replicas for availability and scale, a DaemonSet is fundamentally about per-node presence. When a new node joins the cluster, Kubernetes automatically schedules the DaemonSet Pod onto it. When a node is removed, the Pod is garbage collected — no manual intervention required.

This makes DaemonSets the natural choice for infrastructure-level agents that need to interact with every machine: log shippers, metrics collectors, security sensors, network plugins, and storage daemons. In 2026, with multi-zone clusters spanning hundreds of nodes, DaemonSets remain the cornerstone of observability and networking on Kubernetes.

DaemonSet vs Deployment vs StatefulSet

Understanding when to reach for each controller is critical. The following table clarifies the key distinctions:

Attribute DaemonSet Deployment StatefulSet
Scheduling unit One Pod per node N replicas across nodes N ordered replicas
Primary use Node-level agents Stateless services Stateful databases/queues
Pod identity Node-bound (named by node) Interchangeable Stable (pod-0, pod-1...)
Scaling Scales with nodes Manual/HPA Manual/HPA
Persistent volume HostPath or none PVC (shared OK) Per-Pod PVC
Rolling update RollingUpdate or OnDelete RollingUpdate RollingUpdate (ordered)
Examples Fluentd, Node Exporter, Falco, CNI Nginx, API servers MySQL, Kafka, Zookeeper

See also: Kubernetes Deployments and Kubernetes StatefulSets for a deeper dive into those controllers.

Basic DaemonSet Spec

A DaemonSet manifest looks similar to a Deployment but with no replicas field and a kind: DaemonSet declaration. Below is a complete, production-ready template with resource limits and a node selector:

# daemonset-basic.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-agent
  namespace: monitoring
  labels:
    app: node-agent
    tier: infrastructure
spec:
  selector:
    matchLabels:
      app: node-agent
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1         # update one node at a time
  template:
    metadata:
      labels:
        app: node-agent
    spec:
      # Run on worker nodes only (exclude control-plane)
      nodeSelector:
        kubernetes.io/os: linux
        node-role.kubernetes.io/worker: "true"
      # Tolerate common taints so the agent still runs
      tolerations:
        - key: node.kubernetes.io/not-ready
          operator: Exists
          effect: NoExecute
        - key: node.kubernetes.io/unreachable
          operator: Exists
          effect: NoExecute
      # Prioritise so the agent survives resource pressure
      priorityClassName: system-node-critical
      containers:
        - name: node-agent
          image: my-registry/node-agent:2.4.1
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 256Mi
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
            type: Directory
      serviceAccountName: node-agent-sa
      terminationGracePeriodSeconds: 30

Key points in this spec:

  • No replicas field — Kubernetes manages exactly one Pod per matching node.
  • NODE_NAME env var — injected via the Downward API, essential for agents that tag metrics/logs by node name.
  • priorityClassName: system-node-critical — prevents the agent Pod from being evicted when a node is under memory pressure.
  • hostPath volume — mounts the node's /var/log directly into the container for log collection.

For more on resource management, see Kubernetes Resource Management.

Use Case 1: Fluentd Log Collection DaemonSet

Fluentd (and its successor Fluent Bit) is one of the most common DaemonSet workloads. Every node writes container logs to /var/log/containers/; a Fluentd agent on each node tails those files and ships them to Elasticsearch, Loki, or a cloud logging service.

The configuration is typically stored in a ConfigMap and mounted into the container:

# fluentd-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>

    <match kubernetes.**>
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      logstash_format true
      logstash_prefix k8s-logs
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_interval 5s
        retry_max_interval 30s
        chunk_limit_size 2M
        total_limit_size 500M
        overflow_action block
      </buffer>
    </match>
---
# fluentd-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
  labels:
    app: fluentd
    component: logging
spec:
  selector:
    matchLabels:
      app: fluentd
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccountName: fluentd
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
      priorityClassName: system-node-critical
      containers:
        - name: fluentd
          image: fluent/fluentd-kubernetes-daemonset:v1.16-debian-elasticsearch8-1
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "elasticsearch.logging.svc.cluster.local"
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "9200"
          resources:
            requests:
              cpu: 100m
              memory: 200Mi
            limits:
              cpu: 500m
              memory: 500Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluentd-config
              mountPath: /fluentd/etc
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluentd-config
          configMap:
            name: fluentd-config
Tip: For high-throughput clusters, consider Fluent Bit instead of Fluentd. It uses roughly 450 KB of memory vs Fluentd's ~40 MB, making it significantly more efficient as a DaemonSet where every node carries the overhead.

Use Case 2: Prometheus Node Exporter DaemonSet

The Prometheus Node Exporter exposes hardware and OS-level metrics (CPU, memory, disk, network) for each node. It must run on every node, making it a perfect DaemonSet candidate. Here is the manual YAML (the Helm chart generates something nearly identical):

# node-exporter-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: prometheus-node-exporter
  namespace: monitoring
  labels:
    app: node-exporter
    release: prometheus
spec:
  selector:
    matchLabels:
      app: node-exporter
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: node-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9100"
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      tolerations:
        - operator: Exists             # run on ALL nodes including taints
      priorityClassName: system-node-critical
      containers:
        - name: node-exporter
          image: prom/node-exporter:v1.8.0
          args:
            - --path.rootfs=/host/root
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|run|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
          ports:
            - containerPort: 9100
              protocol: TCP
              name: metrics
          resources:
            requests:
              cpu: 50m
              memory: 30Mi
            limits:
              cpu: 250m
              memory: 180Mi
          securityContext:
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 65534
          volumeMounts:
            - name: root
              mountPath: /host/root
              readOnly: true
              mountPropagation: HostToContainer
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: sys
              mountPath: /host/sys
              readOnly: true
      volumes:
        - name: root
          hostPath:
            path: /
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys

When using Helm, the equivalent values override looks like:

# node-exporter-helm-values.yaml
# helm install node-exporter prometheus-community/prometheus-node-exporter \
#   -n monitoring --create-namespace -f node-exporter-helm-values.yaml

tolerations:
  - operator: Exists

priorityClassName: system-node-critical

resources:
  requests:
    cpu: 50m
    memory: 30Mi
  limits:
    cpu: 250m
    memory: 180Mi

hostRootFsMount:
  enabled: true
  mountPropagation: HostToContainer

prometheus:
  monitor:
    enabled: true
    interval: 30s
    scrapeTimeout: 10s

See the full monitoring stack setup at Kubernetes Monitoring with Prometheus.

Use Case 3: Falco Security Agent DaemonSet

Falco is a CNCF runtime security tool that detects anomalous behavior by hooking into the Linux kernel syscall stream. Because it needs access to every node's kernel, it must run as a DaemonSet with elevated privileges.

# falco-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: falco
  namespace: falco
  labels:
    app: falco
spec:
  selector:
    matchLabels:
      app: falco
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: falco
    spec:
      serviceAccountName: falco
      tolerations:
        - operator: Exists           # run on all nodes
      priorityClassName: system-node-critical
      hostPID: true                  # inspect host process tree
      hostNetwork: true              # see host network traffic
      containers:
        - name: falco
          image: falcosecurity/falco-no-driver:0.38.0
          args:
            - /usr/bin/falco
            - --cri=/run/containerd/containerd.sock
            - -K=/var/run/secrets/kubernetes.io/serviceaccount/token
            - -k=https://$(KUBERNETES_SERVICE_HOST)
            - --k8s-api-cert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            - -pk
          env:
            - name: KUBERNETES_SERVICE_HOST
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          securityContext:
            privileged: true         # required for eBPF/kernel module
          resources:
            requests:
              cpu: 100m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 1024Mi
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: boot
              mountPath: /host/boot
              readOnly: true
            - name: lib-modules
              mountPath: /host/lib/modules
              readOnly: true
            - name: usr
              mountPath: /host/usr
              readOnly: true
            - name: etc
              mountPath: /host/etc
              readOnly: true
            - name: containerd-sock
              mountPath: /run/containerd/containerd.sock
      volumes:
        - name: dev
          hostPath:
            path: /dev
        - name: proc
          hostPath:
            path: /proc
        - name: boot
          hostPath:
            path: /boot
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: usr
          hostPath:
            path: /usr
        - name: etc
          hostPath:
            path: /etc
        - name: containerd-sock
          hostPath:
            path: /run/containerd/containerd.sock
            type: Socket
Security Warning: Setting privileged: true grants the container almost all capabilities of the host. Only do this for trusted infrastructure agents like Falco or CNI plugins. Never run application containers as privileged. Review your security posture at Kubernetes Security Best Practices.

Use Case 4: CNI Plugin DaemonSet (Calico / Cilium)

Container Network Interface (CNI) plugins — Calico, Cilium, Flannel, Weave — are always deployed as DaemonSets. This is non-optional: every node must run the network agent to program iptables/eBPF rules, manage IP allocation, and enforce Network Policies.

Cilium's agent spec (simplified) illustrates why this workload demands special privileges:

# cilium-agent-daemonset.yaml (excerpt)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cilium
  namespace: kube-system
  labels:
    k8s-app: cilium
spec:
  selector:
    matchLabels:
      k8s-app: cilium
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 2
  template:
    metadata:
      labels:
        k8s-app: cilium
    spec:
      priorityClassName: system-node-critical
      serviceAccountName: cilium
      hostNetwork: true            # programs host network stack
      tolerations:
        - operator: Exists         # MUST run everywhere, including control-plane
      initContainers:
        # Installs CNI binary and config before kubelet starts new Pods
        - name: install-cni-binaries
          image: quay.io/cilium/cilium:v1.15.5
          command: ["/install-plugin.sh"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: cni-path
              mountPath: /host/opt/cni/bin
      containers:
        - name: cilium-agent
          image: quay.io/cilium/cilium:v1.15.5
          securityContext:
            privileged: true
            capabilities:
              add:
                - NET_ADMIN
                - SYS_MODULE
                - SYS_ADMIN
          resources:
            requests:
              cpu: 100m
              memory: 512Mi
          volumeMounts:
            - name: bpf-maps
              mountPath: /sys/fs/bpf
              mountPropagation: Bidirectional
            - name: cilium-run
              mountPath: /var/run/cilium
            - name: cni-path
              mountPath: /host/opt/cni/bin
            - name: etc-cni-netd
              mountPath: /host/etc/cni/net.d
      volumes:
        - name: bpf-maps
          hostPath:
            path: /sys/fs/bpf
            type: DirectoryOrCreate
        - name: cilium-run
          hostPath:
            path: /var/run/cilium
            type: DirectoryOrCreate
        - name: cni-path
          hostPath:
            path: /opt/cni/bin
            type: DirectoryOrCreate
        - name: etc-cni-netd
          hostPath:
            path: /etc/cni/net.d
            type: DirectoryOrCreate
Why initContainers? CNI plugins use an initContainer to drop the CNI binary and config file onto the host before the main agent starts. This ensures the kubelet can use the CNI plugin for any new Pod created on the node immediately after the DaemonSet Pod is running.

Node Selection: nodeSelector and nodeAffinity

By default, a DaemonSet schedules a Pod on every node. You can restrict this using nodeSelector (simple key-value matching) or the more expressive nodeAffinity.

nodeSelector

The simplest approach — only schedule on nodes with a specific label:

spec:
  template:
    spec:
      # Only deploy on nodes labelled for GPU workloads
      nodeSelector:
        accelerator: nvidia-gpu
        topology.kubernetes.io/zone: us-east-1a

nodeAffinity

For multi-rule targeting, use requiredDuringSchedulingIgnoredDuringExecution:

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  # Run on worker nodes in production pools only
                  - key: node-role.kubernetes.io/worker
                    operator: Exists
                  - key: environment
                    operator: In
                    values:
                      - production
                      - staging
                  # Exclude spot/preemptible instances
                  - key: cloud.google.com/gke-spot
                    operator: DoesNotExist

You can also use preferredDuringSchedulingIgnoredDuringExecution for soft targeting — the DaemonSet will prefer those nodes but still schedule on others if the preferred nodes are unavailable.

Tolerations: Running on Master and Tainted Nodes

Kubernetes applies taints to nodes to repel general workloads. Control-plane nodes carry node-role.kubernetes.io/control-plane:NoSchedule. GPU nodes might carry nvidia.com/gpu:NoSchedule. Without tolerations, your DaemonSet Pods won't land on tainted nodes.

spec:
  template:
    spec:
      tolerations:
        # Tolerate control-plane taint (both old and new key names)
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule

        # Tolerate GPU nodes to deploy GPU metrics exporters
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule

        # Tolerate spot/preemptible nodes
        - key: cloud.google.com/gke-spot
          operator: Equal
          value: "true"
          effect: NoSchedule

        # Always tolerate not-ready and unreachable so agent
        # keeps collecting data even during node issues
        - key: node.kubernetes.io/not-ready
          operator: Exists
          effect: NoExecute
          tolerationSeconds: 300
        - key: node.kubernetes.io/unreachable
          operator: Exists
          effect: NoExecute
          tolerationSeconds: 300

        # Catch-all: tolerate ANY taint on the node
        # Use only for critical infrastructure like CNI/CSI
        # - operator: Exists
Best Practice: Avoid the catch-all operator: Exists toleration for most DaemonSets. It means the Pod will run on nodes intentionally isolated for debugging or decommissioning. Only CNI plugins and similar bootstrap-critical components should use the catch-all.

Update Strategy: RollingUpdate vs OnDelete

DaemonSets support two update strategies, configured under spec.updateStrategy:

RollingUpdate (Default)

When you update the DaemonSet Pod template, Kubernetes automatically kills the old Pod on one node and waits for the new one to become Ready before moving to the next node. The maxUnavailable field controls how many nodes can be updated simultaneously:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      # Can be an integer (number of nodes) or percentage
      maxUnavailable: 1     # safest: one node at a time
      # maxUnavailable: 10% # faster: 10% of nodes at once
      # maxUnavailable: 0   # NOT valid — at least 1 must be allowed

OnDelete

With OnDelete, Kubernetes does NOT automatically update Pods. The new version only starts on a node after you manually delete the old Pod on that node. This is useful when you need to test the new version on a single node before rolling out cluster-wide:

spec:
  updateStrategy:
    type: OnDelete
    # No rollingUpdate block needed

To manually trigger an update on a specific node with OnDelete:

# Find the Pod on the target node
kubectl get pods -n monitoring -l app=node-agent \
  -o wide --field-selector spec.nodeName=worker-node-03

# Delete it — the new version starts automatically
kubectl delete pod -n monitoring node-agent-xk9vp
Checking rollout status: Use kubectl rollout status daemonset/fluentd -n logging to watch a RollingUpdate in progress. Use kubectl rollout history daemonset/fluentd -n logging to see revision history and roll back with kubectl rollout undo.

HostPath Volumes and Host Namespaces

DaemonSets frequently need access to host resources not normally available to containers. Kubernetes exposes these through host namespace flags and HostPath volumes.

Host Namespace Flags

FlagDefaultEffectCommon Use Case
hostPID: true false Share host process ID namespace Falco, eBPF tools, process monitors
hostNetwork: true false Use host network stack (no overlay) Node Exporter, CNI agents, network probes
hostIPC: true false Share host IPC namespace Shared memory inspection tools

HostPath Volume Types

HostPath mounts a file or directory from the host node's filesystem:

volumes:
  - name: varlog
    hostPath:
      path: /var/log
      type: Directory           # must already exist

  - name: cni-bin-dir
    hostPath:
      path: /opt/cni/bin
      type: DirectoryOrCreate   # created if missing

  - name: containerd-sock
    hostPath:
      path: /run/containerd/containerd.sock
      type: Socket              # must be a Unix socket

  - name: etcd-data
    hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
Security Implications: hostPID, hostNetwork, and HostPath volumes significantly expand the blast radius of a compromised container. Always pair them with minimal RBAC, read-only mounts where possible, and Pod Security Admission (PSA) policies that enforce restricted or baseline standards on your application namespaces. See Kubernetes RBAC Security for details.

PriorityClasses for DaemonSets

Kubernetes can evict Pods from nodes under memory pressure. Infrastructure DaemonSet Pods — log agents, network plugins, monitoring exporters — should never be evicted because that would create gaps in observability or break networking entirely.

Assign a PriorityClass to protect your DaemonSet Pods:

# priority-classes reference
# Built-in classes (do not create these — they already exist):
#   system-cluster-critical  (value: 2000000000) — cluster-wide components
#   system-node-critical     (value: 2000001000) — per-node components

# Custom class for your own infrastructure agents:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: infra-node-critical
value: 1000000          # below system-node-critical but above default (0)
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Used for per-node infrastructure DaemonSets."
---
# Reference in your DaemonSet:
spec:
  template:
    spec:
      # Use the built-in class for CNI, CSI, kube-proxy
      priorityClassName: system-node-critical

      # Use the custom class for log agents, metrics exporters
      # priorityClassName: infra-node-critical
PriorityClassValueRecommended For
system-node-critical2,000,001,000CNI, CSI, kube-proxy, kubelet-adjacent
system-cluster-critical2,000,000,000CoreDNS, metrics-server, cluster autoscaler
infra-node-critical (custom)1,000,000Fluentd, Node Exporter, Falco
(default)0All application Pods

For a comprehensive look at resource management including QoS classes and LimitRanges, see Kubernetes Resource Management.

Troubleshooting DaemonSets

Here are the most common DaemonSet issues and how to diagnose them.

Problem 1: DaemonSet Pod Not Scheduled on a Node

Check if the Pod exists but is in Pending state:

# List all DaemonSet Pods and their nodes
kubectl get pods -n monitoring -l app=fluentd -o wide

# Describe a pending Pod — look at Events section
kubectl describe pod fluentd-<hash> -n monitoring

# Common causes shown in Events:
# 1. Taint mismatch:
#    "0/5 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }"
# 2. nodeSelector mismatch:
#    "0/5 nodes are available: 5 node(s) didn't match Pod's node affinity/selector"
# 3. Insufficient resources:
#    "0/5 nodes are available: 5 Insufficient memory"

# Check node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Check node labels
kubectl get nodes --show-labels

Problem 2: OOM Kills on Nodes

When a log agent's memory limit is too low for a busy node:

# Find OOMKilled Pods
kubectl get pods -n logging -o json | \
  jq '.items[] | select(.status.containerStatuses[].lastState.terminated.reason=="OOMKilled") | .metadata.name'

# Check actual memory usage before setting limits
kubectl top pods -n logging --containers

# Inspect memory usage trend — if consistently above 200Mi, raise the limit
kubectl describe pod fluentd-abc123 -n logging | grep -A5 "Last State"
Rule of thumb: Set memory requests to the p50 memory usage and limits to 3-5x the p99 usage observed during peak log volume. Use Prometheus container_memory_working_set_bytes to measure this over time.

Problem 3: DaemonSet Pod Stuck in Terminating

# Force delete a stuck Pod (use only if node is confirmed unreachable)
kubectl delete pod fluentd-abc123 -n logging --grace-period=0 --force

# If the node itself is gone, remove its entry from the cluster
kubectl delete node dead-worker-node-07

Problem 4: Resource Contention Between DaemonSets

On nodes with many DaemonSets, the cumulative overhead can crowd out application Pods. Audit total DaemonSet overhead:

# Sum all DaemonSet resource requests on a node
kubectl describe node worker-node-01 | grep -A 40 "Allocated resources"

# List all DaemonSets and their replica counts
kubectl get daemonset --all-namespaces

# Total Pods-per-node overhead check
kubectl get pods --all-namespaces -o wide \
  --field-selector spec.nodeName=worker-node-01 | grep -E "logging|monitoring|security"

Problem 5: New Node Not Getting DaemonSet Pod

If a node was added with a taint and the DaemonSet tolerations don't cover it, no Pod is created. Also check if the DaemonSet's nodeSelector requires a label that the new node does not have:

# Label the new node to match the DaemonSet nodeSelector
kubectl label node new-worker-node-12 \
  kubernetes.io/os=linux \
  node-role.kubernetes.io/worker=true

# Verify the DaemonSet schedules a Pod within ~30 seconds
kubectl get pods -n monitoring -l app=node-agent -o wide -w

For a full guide on Kubernetes internals that affect scheduling, see the Kubernetes Complete Guide.

Summary

Kubernetes DaemonSets are a foundational primitive for running per-node infrastructure. Key takeaways from this guide:

  • DaemonSet vs Deployment: Use DaemonSets when you need exactly one agent per node, not N replicas for load distribution.
  • Tolerations are critical: Without the right tolerations, your DaemonSet won't run on control-plane, GPU, or tainted nodes — causing invisible monitoring gaps.
  • nodeSelector / nodeAffinity: Scope DaemonSets to specific node pools (e.g., GPU nodes for GPU metrics, worker nodes only for app-level log agents).
  • RollingUpdate is safe: Set maxUnavailable: 1 for a cautious rollout; use OnDelete when you want fully manual control.
  • PriorityClass protects agents: Assign system-node-critical to CNI/CSI and a custom high-priority class to your observability agents to prevent eviction under pressure.
  • Limit host access: Only enable hostPID, hostNetwork, and privileged when genuinely required. Pair with RBAC and Pod Security Admission.
  • Resource right-sizing: DaemonSet overhead is multiplied by node count — a 50 Mi request on 200 nodes is 10 Gi of reserved cluster memory. Profile before setting limits.

With these patterns, you can build a robust observability and security foundation that automatically extends to every node in your cluster — today and as your infrastructure scales.