Kubernetes Pods: Complete Guide with Examples (2026)
The Pod is the atomic unit of deployment in Kubernetes — every container you run exists inside a pod. Despite their simplicity on the surface, pods have a rich specification covering init containers, multi-container patterns, health probes, resource management, and disruption budgets. Mastering pods means mastering the building block on which every higher-level abstraction (Deployment, StatefulSet, DaemonSet) rests.
Table of Contents
Pod Anatomy and Spec
A pod spec describes one or more containers that are co-scheduled on the same node and share the same network namespace (they see each other as localhost) and can share storage volumes. Here is a production-ready single-container pod spec with all common fields explained:
apiVersion: v1
kind: Pod
metadata:
name: api-server
namespace: production
labels:
app: api-server
version: "2.1.0"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
# Which service account this pod runs as
serviceAccountName: api-server-sa
# Security context for the entire pod
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: api
image: myrepo/api-server:2.1.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
- containerPort: 8443
name: https
env:
- name: APP_ENV
value: production
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
envFrom:
- configMapRef:
name: api-server-config
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
volumeMounts:
- name: config-volume
mountPath: /etc/app/config
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: config-volume
configMap:
name: api-server-config
- name: tmp
emptyDir: {}
# Prefer nodes with label zone=us-east-1a
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: [us-east-1a]
# Do not schedule two pods of this app on the same node
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: api-server
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 60
Init Containers
Init containers run to completion before any application containers start. They share the same volumes as app containers but have a separate image and resource spec. Common uses: waiting for a database to be ready, seeding config files, running schema migrations.
spec:
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c',
'until nc -z postgres-service 5432; do echo waiting for postgres; sleep 2; done']
- name: run-migrations
image: myrepo/api-server:2.1.0
command: ['python', 'manage.py', 'migrate', '--noinput']
env:
- name: DB_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
containers:
- name: api
image: myrepo/api-server:2.1.0
# starts only after both init containers succeed
Multi-Container Pod Patterns
Running multiple containers in a pod is sometimes the right design. The three canonical patterns are:
Sidecar Pattern
A helper container extends or enhances the main container. The most common example is a log-shipping sidecar that tails log files written by the main app and forwards them to a centralised system (Fluentd, Vector, Datadog Agent).
containers:
- name: app
image: myrepo/app:latest
volumeMounts:
- name: log-volume
mountPath: /var/log/app
- name: log-shipper
image: fluent/fluent-bit:2.2
volumeMounts:
- name: log-volume
mountPath: /var/log/app
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc
volumes:
- name: log-volume
emptyDir: {}
- name: fluent-bit-config
configMap:
name: fluent-bit-config
Ambassador Pattern
The ambassador container proxies network connections on behalf of the main container. The app always connects to localhost and the ambassador handles external routing, retries, or TLS termination. Envoy or Nginx make good ambassadors.
Adapter Pattern
The adapter container transforms the main container's output to match an external interface. For example, a container that exposes a proprietary metrics endpoint can have an adapter that converts those metrics to the Prometheus exposition format, without changing the main app.
Pod Lifecycle and Restart Policies
A pod progresses through these phases:
- Pending — pod accepted but containers not yet running (image pull, scheduling)
- Running — at least one container is running
- Succeeded — all containers exited 0 (final state for Jobs)
- Failed — all containers exited, at least one with non-zero exit code
- Unknown — pod state cannot be determined (node communication lost)
# Watch pod status changes in real time
kubectl get pods -w -n production
# Get detailed status including conditions
kubectl get pod api-server -o json | jq '.status.conditions'
# Check container exit codes
kubectl describe pod api-server | grep -A5 "Last State"
Restart policies (spec.restartPolicy) control what happens when a container exits:
- Always (default) — always restart. Use for long-running services managed by Deployments.
- OnFailure — restart only on non-zero exit. Use for Jobs that must complete successfully.
- Never — never restart. Use for one-shot tasks where you want to inspect the result.
Liveness, Readiness and Startup Probes
Probes are periodic health checks that Kubernetes runs against your containers. Getting them right is critical for zero-downtime deployments.
containers:
- name: api
image: myrepo/api-server:2.1.0
# startupProbe: gives the app time to initialise before liveness kicks in
# Useful for slow-starting apps (e.g., JVM warm-up)
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
failureThreshold: 30 # 30 * 10s = 5 minutes to start
periodSeconds: 10
# livenessProbe: if this fails, container is killed and restarted
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 0 # startupProbe handles delay
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5
# readinessProbe: if this fails, pod is removed from Service endpoints
# Traffic stops going to this pod until it recovers
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
Probe types:
- httpGet — HTTP GET request; 2xx/3xx = success
- tcpSocket — TCP connection to a port; connection established = success
- exec — run a command in the container; exit 0 = success
- grpc — gRPC health check (requires gRPC health protocol)
Resource Requests and Limits
Resource requests and limits are per-container settings that control scheduling and enforcement:
- requests — the amount the scheduler reserves on the node. A pod is only placed on nodes with enough unreserved capacity.
- limits — the maximum the container may consume. CPU is throttled; memory exceeding the limit triggers an OOMKill (container restart).
resources:
requests:
cpu: "250m" # 0.25 vCPU
memory: "256Mi"
limits:
cpu: "1000m" # 1 vCPU
memory: "512Mi"
QoS classes assigned by Kubernetes based on your settings:
- Guaranteed — requests == limits for all containers. Best protection against eviction.
- Burstable — requests set but limits are higher (or not set). Evicted under memory pressure after BestEffort pods.
- BestEffort — no requests or limits set. Evicted first under pressure.
# Check actual resource usage
kubectl top pod api-server -n production --containers
# Check QoS class of a pod
kubectl get pod api-server -n production -o jsonpath='{.status.qosClass}'
Pod Disruption Budgets
A PodDisruptionBudget (PDB) limits how many pods in a deployment/statefulset can be simultaneously disrupted during voluntary disruptions (node drains, cluster upgrades). Without a PDB a node drain could evict all replicas of your service at once.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
namespace: production
spec:
# At least 2 pods must be available at all times
minAvailable: 2
selector:
matchLabels:
app: api-server
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
namespace: production
spec:
# Alternative: no more than 1 pod may be unavailable at a time
maxUnavailable: 1
selector:
matchLabels:
app: api-server
# View PDB status
kubectl get pdb -n production
# NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# api-server-pdb 2 N/A 1 5d
Frequently Asked Questions
Can multiple containers in a pod communicate with each other?
Yes. Containers in the same pod share the same network namespace, so they communicate via localhost. For example, if container A listens on port 8080 and container B needs to call it, B simply connects to localhost:8080. They also share the pod's IP address as seen from outside.
What is the difference between a liveness probe and a readiness probe?
A failing liveness probe causes Kubernetes to kill and restart the container. A failing readiness probe removes the pod from the Service's endpoint list so it stops receiving traffic, but the container is not restarted. Use liveness for "is this process stuck?" and readiness for "is this process ready to serve requests?"
What happens to a pod when its node is deleted?
If the pod is managed by a controller (Deployment, StatefulSet, DaemonSet), the controller detects the pod is gone and creates a replacement on another available node. Standalone pods (not managed by a controller) are not rescheduled. This is why you should almost never run standalone pods in production.
How do I set a CPU limit without causing unnecessary throttling?
Set your CPU request accurately (matching your typical usage) but consider leaving the CPU limit unset or set it generously (2-4x the request). CPU throttling can severely impact latency-sensitive services even when the node has spare capacity. Memory limits are more important to set tightly because OOMKill is preferable to memory swapping degrading the entire node.
What is a pod's termination grace period?
When a pod is deleted, Kubernetes sends SIGTERM to all containers and waits terminationGracePeriodSeconds (default 30 seconds) for them to exit cleanly. After the grace period, it sends SIGKILL. For servers handling long-running requests, increase this to match your longest acceptable request duration. For Spring Boot apps draining connections, 60-90 seconds is common.