Kubernetes Interview Questions 2026

Top 55 Questions & Answers — Architecture, Workloads, Networking, Security, Helm & Troubleshooting

This guide covers the most commonly asked Kubernetes interview questions in 2026 — from core concepts to production-grade architecture, security, and debugging. Equally useful for CKA certification preparation and senior DevOps / platform engineering interviews.

Easy = Core concepts, basic usage  |  Medium = Production patterns, configuration  |  Hard = Architecture, debugging, advanced design
Architecture & Core Concepts
1
What is Kubernetes and what problem does it solve?Easy

Kubernetes is an open-source container orchestration platform that automates deploying, scaling, and managing containerised applications.

Problems it solves:

  • Placement — decides which node to run containers on based on resource availability and constraints
  • Self-healing — restarts failed containers, replaces nodes that go down, kills containers that fail health checks
  • Scaling — horizontally scales pods up/down based on CPU, memory, or custom metrics
  • Service discovery & load balancing — exposes containers via stable DNS names; load balances across instances
  • Rolling updates / rollbacks — deploys new versions with zero downtime; rolls back on failure
  • Secret & config management — injects credentials and configuration without baking them into images
Kubernetes doesn't run containers itself — it delegates to a container runtime (containerd, CRI-O) via the Container Runtime Interface (CRI).
2
Explain the Kubernetes control plane components.Medium
  • kube-apiserver — front-end of the control plane. All communication (kubectl, controllers, kubelets) goes through the API server. Validates and persists resources to etcd. Horizontally scalable.
  • etcd — distributed key-value store. Single source of truth for all cluster state. Must be HA (3 or 5 nodes for quorum). Back up etcd regularly — losing it means losing the cluster.
  • kube-scheduler — watches for unscheduled Pods and selects a node based on resource requirements, affinity rules, taints/tolerations. Does NOT start the Pod — it sets spec.nodeName.
  • kube-controller-manager — runs controller loops: Node Controller (handles node failures), ReplicaSet Controller (maintains desired Pod count), Endpoints Controller, Service Account Controller, etc.
  • cloud-controller-manager — cloud-provider-specific logic: create load balancers for Services, provision PersistentVolumes, add routes to cloud VPC. Separates cloud provider code from core K8s.
3
What runs on each worker node?Easy
  • kubelet — primary agent on every node. Watches the API server for Pods assigned to this node, instructs the container runtime to start/stop containers, runs liveness/readiness probes, reports node and Pod status back to the control plane.
  • kube-proxy — maintains network rules (iptables/IPVS) so traffic to a Service ClusterIP is forwarded to the correct Pod endpoints. Also implements load balancing across Pods behind a Service.
  • Container runtime — actually runs containers. Default is containerd (Docker was removed in K8s 1.24+). Implements CRI.
kubectl talks to the API server; it never talks to kubelets directly for normal operations.
4
What is a Pod and why is it the basic unit, not a container?Easy

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share:

  • Network namespace — same IP address and port space. Containers in the same Pod communicate via localhost.
  • Storage volumes — Pods can declare volumes; all containers in the Pod mount them.
  • Lifecycle — all containers start and stop together.

Why not just a container? Real-world applications often need tightly coupled helpers alongside the main process:

  • Sidecar — log shipper, service mesh proxy (Envoy), metrics exporter running alongside the app
  • Init container — runs to completion before main containers start (schema migration, config download)
  • Ambassador — proxy local connections to the outside world

Pods are ephemeral — don't store state in them. Use Deployments to manage Pods declaratively.

5
What is a Namespace and when do you use multiple namespaces?Easy

Namespaces provide virtual cluster isolation within a physical cluster. Resources in one namespace are hidden from other namespaces by default (but network access still works unless restricted by NetworkPolicies).

kubectl get pods -n kube-system        # control plane pods
kubectl get pods -n default            # user workloads (default)
kubectl create namespace staging       # create namespace
kubectl config set-context --current --namespace=staging  # set default

When to use multiple namespaces:

  • Multi-tenant clusters — team-A in their namespace, team-B in theirs, with RBAC limiting access
  • Environment separation — dev, staging, prod within one cluster (better: use separate clusters for prod)
  • Resource quotas — limit CPU/memory per namespace so one team can't starve another

Cluster-scoped resources (Nodes, PersistentVolumes, ClusterRoles) are not namespaced.

6
What is the difference between Labels, Selectors, and Annotations?Easy
  • Labels — key/value pairs attached to any K8s object. Used for grouping and selection. E.g. app: frontend, env: prod, version: v2. Can be queried.
  • Selectors — queries that match objects by label. Services use label selectors to find the Pods they should route traffic to. ReplicaSets use them to track owned Pods.
  • Annotations — also key/value but not used for selection. Store non-identifying metadata: deployment tool name, last deploy timestamp, URLs, team contact, config hashes. Ingress controllers read annotations for configuration (rewrite rules, SSL redirect).
labels:
  app: api
  tier: backend
  version: "3.1"
annotations:
  deployed-by: "github-actions"
  last-commit: "abc1234"
  docs: "https://wiki.company.com/api"
7
What happens when you run kubectl apply -f deployment.yaml?Medium
  1. kubectl reads the YAML, sends an HTTP PATCH/POST to the API server
  2. API server authenticates + authorizes the request (RBAC)
  3. API server validates the object schema
  4. Admission controllers run (e.g. PodSecurity, MutatingWebhookConfiguration injects sidecars, LimitRanger sets defaults)
  5. API server persists the Deployment object to etcd
  6. Deployment controller (in kube-controller-manager) notices the Deployment and creates/updates a ReplicaSet
  7. ReplicaSet controller creates the required number of Pod objects (status: Pending)
  8. Scheduler watches for Pending Pods, selects a node, sets spec.nodeName
  9. kubelet on that node watches for Pods assigned to it, tells containerd to pull the image and start the container
  10. kubelet runs probes; when ready, Endpoints controller adds Pod IP to the Service's endpoint list
8
What are taints and tolerations?Medium

Taints mark a node to repel Pods. Tolerations on a Pod allow it to be scheduled onto a tainted node.

# Taint a node so only GPU workloads run on it:
kubectl taint nodes gpu-node-1 type=gpu:NoSchedule

# Pod toleration to allow scheduling on gpu nodes:
tolerations:
- key: "type"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

Taint effects:

  • NoSchedule — don't schedule new Pods without the toleration (existing Pods stay)
  • PreferNoSchedule — try to avoid scheduling, but not guaranteed
  • NoExecute — evict existing Pods without toleration immediately

Common uses: dedicated nodes for GPU workloads, spot instance pools, system Pods on control plane nodes (which are tainted node-role.kubernetes.io/control-plane:NoSchedule).

9
What is Node Affinity vs Pod Affinity?Medium
  • Node Affinity — constrains which nodes a Pod can be scheduled on based on node labels. requiredDuringSchedulingIgnoredDuringExecution (hard) or preferredDuringSchedulingIgnoredDuringExecution (soft).
  • Pod Affinity — schedule a Pod near (podAffinity) or away from (podAntiAffinity) other Pods based on their labels. Used for co-location (cache next to app) or anti-co-location (spread replicas across nodes/AZs).
# Anti-affinity: spread web Pod replicas across nodes
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values: [web]
      topologyKey: kubernetes.io/hostname

For HA: use topologyKey: topology.kubernetes.io/zone to spread across AZs. Also consider topologySpreadConstraints — simpler and more powerful for even distribution.

10
What are resource requests and limits and why do they matter?Medium
  • Requests — guaranteed resources. The scheduler uses this to find a node with enough available capacity. The container is guaranteed at least this amount.
  • Limits — maximum resources the container can use. Exceeding CPU limit throttles the container. Exceeding memory limit OOMKills the container.
resources:
  requests:
    cpu: "250m"     # 0.25 cores guaranteed
    memory: "256Mi"
  limits:
    cpu: "1"        # throttled if it tries to use more
    memory: "512Mi" # OOMKilled if exceeded

QoS classes determined by request/limit:

  • Guaranteed — requests == limits for all containers. Highest priority, last to be evicted.
  • Burstable — requests < limits. Middle priority.
  • BestEffort — no requests or limits set. Evicted first under memory pressure.
Always set requests. Limits are optional for CPU but strongly recommended for memory to prevent OOM cascades.
11
What are liveness, readiness, and startup probes?Medium
  • Liveness probe — "is the container alive?" If it fails, kubelet restarts the container. Use for detecting deadlocks or corrupted state that the app can't recover from alone.
  • Readiness probe — "is the container ready to serve traffic?" If it fails, the Pod is removed from Service endpoints (no traffic routed to it). Use for: app warming up, temporarily overloaded, waiting for DB connection. Does NOT restart the container.
  • Startup probe — gives slow-starting apps time to boot before liveness kicks in. While startup probe is failing, liveness is disabled. Use for: legacy Java apps with 60s startup time.
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30   # 30 * 10s = 5 min to start
  periodSeconds: 10
12
What is the Kubernetes scheduler and how does the scheduling cycle work?Hard

The scheduler selects a node for each unscheduled Pod in two phases:

1. Filtering (Predicates) — eliminates nodes that don't meet requirements:

  • Node has enough CPU/memory (based on requests)
  • Node has required labels (nodeAffinity)
  • Node tolerates Pod's taints
  • Node has required volumes available
  • Pod's port isn't already used on this node (hostPort)

2. Scoring (Priorities) — ranks remaining nodes:

  • LeastAllocated — prefer nodes with more free resources
  • SelectorSpreadPriority — spread Pods from the same ReplicaSet across nodes
  • NodeAffinity score — prefer nodes matching soft affinity rules
  • ImageLocality — prefer nodes that already have the container image

The highest-scoring node wins. The scheduler writes spec.nodeName to the Pod object. Custom schedulers can be deployed alongside the default scheduler.

Workloads: Pods, Deployments, StatefulSets
13
What is a Deployment and how does rolling update work?Easy

A Deployment manages a ReplicaSet which manages Pods. It provides declarative updates with rolling update and rollback.

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1         # create 1 extra Pod above desired count
    maxUnavailable: 0   # never reduce below desired count

Rolling update process:

  1. New ReplicaSet created with new image
  2. Scale new RS up by maxSurge (e.g. 1 new Pod)
  3. Wait for new Pod to be Ready
  4. Scale old RS down by maxUnavailable (if 0: only after new one is ready)
  5. Repeat until old RS has 0 replicas
kubectl rollout status deployment/api
kubectl rollout history deployment/api
kubectl rollout undo deployment/api          # rollback
kubectl rollout undo deployment/api --to-revision=2
14
What is a StatefulSet and how does it differ from a Deployment?Medium
FeatureDeploymentStatefulSet
Pod identityRandom names (web-5x9kq)Stable ordinal names (db-0, db-1, db-2)
DNS hostnameNone per PodStable: db-0.my-svc.ns.svc.cluster.local
StorageShared or ephemeralPer-Pod PVC (volumeClaimTemplates)
ScalingParallel (random order)Sequential (0→1→2 up, 2→1→0 down)
Use forStateless appsDatabases, Kafka, Zookeeper, Elasticsearch

StatefulSets require a Headless Service (clusterIP: None) to create the stable DNS records per Pod.

15
What is a DaemonSet and when do you use it?Easy

A DaemonSet ensures one Pod runs on every node (or a selected subset of nodes). When a new node joins the cluster, the DaemonSet automatically adds a Pod to it.

Use cases:

  • Log collection — Fluentd/Filebeat on every node
  • Monitoring — Prometheus Node Exporter, Datadog agent
  • Networking — CNI plugins (Calico, Flannel), kube-proxy itself is a DaemonSet
  • Security — Falco runtime threat detection
  • Storage — Ceph node agents
kubectl get daemonsets -n kube-system
# DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE
# 3        3        3      3           3
16
What is a Job and a CronJob?Easy
  • Job — runs one or more Pods to completion. When the Pod succeeds, the Job is done. Use for: database migrations, batch processing, report generation, one-off tasks.
  • CronJob — creates Jobs on a schedule (cron syntax). Use for: nightly backups, daily email reports, scheduled cleanup tasks.
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 2 * * *"     # 2 AM every day
  concurrencyPolicy: Forbid  # don't start if previous is still running
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      backoffLimit: 2        # retry up to 2 times on failure
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: backup-tool:latest
17
What is a PodDisruptionBudget (PDB)?Medium

A PDB limits the number of Pods of a workload that can be voluntarily disrupted simultaneously (node drain, cluster upgrade). It prevents downtime during maintenance.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2         # at least 2 pods must always be available
  # OR: maxUnavailable: 1 # at most 1 pod can be down at once
  selector:
    matchLabels:
      app: api

When kubectl drain is run on a node, Kubernetes respects PDBs. If draining would violate the PDB, the drain waits. PDBs only apply to voluntary disruptions (drain, node upgrade) — node failures are involuntary and bypass PDB.

Always create a PDB for production workloads with 2+ replicas. Without it, a cluster upgrade can take down all your pods simultaneously.
18
What are init containers and when do you use them?Medium

Init containers run to completion before any app container starts. They run in order, one at a time. If an init container fails, the Pod restarts (respecting restartPolicy).

initContainers:
- name: wait-for-db
  image: busybox
  command: ['sh', '-c', 'until nc -z postgres-svc 5432; do sleep 2; done']

- name: run-migrations
  image: my-app:latest
  command: ['./migrate.sh']
  env:
  - name: DB_URL
    valueFrom:
      secretKeyRef:
        name: db-secret
        key: url

Common init container patterns:

  • Wait for a dependency (database, Kafka) to be ready
  • Run database migrations before the app starts
  • Clone a git repo or download config files into a shared volume
  • Register with a service mesh control plane
19
What is the difference between HPA and VPA?Medium
  • HPA (Horizontal Pod Autoscaler) — adds/removes Pod replicas based on CPU, memory, or custom metrics (Prometheus via KEDA). Fast response, no Pod restart. Best for stateless apps with variable traffic.
  • VPA (Vertical Pod Autoscaler) — adjusts CPU/memory requests/limits on existing Pods. Requires Pod restart to apply (evict and reschedule). Best for workloads where you need right-sized resources but can't easily add more replicas (databases, batch jobs).
# HPA: scale between 2 and 10 pods based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
Don't use HPA and VPA together on the same resource metric — they'll fight. Use VPA in Off mode for recommendations only, HPA for actual scaling.
20
What is Cluster Autoscaler?Medium

Cluster Autoscaler automatically adjusts the number of nodes in a cluster when:

  • Scale up — Pods are Pending because no node has enough resources. CA provisions a new node from the cloud provider's node group.
  • Scale down — a node is underutilised (all Pods could fit on other nodes) for a sustained period. CA drains the node and terminates it.

CA works with the cloud provider's autoscaling groups (AWS ASG, GCP MIG). On EKS: managed node groups + CA, or use Karpenter (AWS-native, faster and more flexible than CA — provisions the exact instance type needed, not just the configured type).

# CA respects PDBs during scale-down (won't drain a node if it violates a PDB)
# CA does NOT scale down nodes with:
# - Pods with local storage
# - Pods not managed by a controller
# - Pods with restrictive PDBs
21
What is a sidecar container and what are common patterns?Medium

A sidecar is a helper container running in the same Pod as the main app, sharing its network and volumes. The main app doesn't know or care about the sidecar — it adds cross-cutting behaviour transparently.

Common sidecar patterns:

  • Log forwarding — app writes logs to a shared volume; Fluentd sidecar tails and ships to Elasticsearch
  • Service mesh proxy — Envoy proxy injected by Istio intercepts all inbound/outbound traffic for mTLS, observability, retry logic
  • Config sync — periodic git-pull to shared volume, app reads config files
  • Metrics adapter — translates app-specific metrics to Prometheus format (when you can't modify the app)
  • TLS termination — nginx/Envoy sidecar handles TLS; main app only handles HTTP on localhost
22
What is the difference between kubectl delete pod and a Pod being evicted?Medium
  • kubectl delete pod — sends SIGTERM to the container, waits terminationGracePeriodSeconds (default 30s), then SIGKILL. The Pod's controller (Deployment/RS) immediately creates a replacement Pod.
  • Eviction — kubelet evicts Pods when the node is under pressure (memory, disk). BestEffort Pods evicted first, then Burstable, then Guaranteed. QoS class determines eviction order. The controller replaces evicted Pods on another node.
  • Preemption — scheduler evicts lower-priority Pods to make room for a higher-priority Pod that can't be scheduled.

Pre-stop hook lets you run cleanup before SIGTERM — useful for graceful connection draining in load balancers or draining from service mesh before shutdown.

Networking & Services
23
What are the Kubernetes Service types?Easy
  • ClusterIP (default) — stable internal IP and DNS name within the cluster. Only reachable from within the cluster. All internal service-to-service communication.
  • NodePort — exposes the service on a high port (30000–32767) on every node's IP. Accessible from outside using NodeIP:NodePort. Mainly for dev/test or when you control the load balancer yourself.
  • LoadBalancer — provisions a cloud load balancer (AWS ALB/NLB, GCP GCLB). The cloud assigns an external IP/DNS. Use for: production external traffic entry for individual services.
  • ExternalName — maps the Service to an external DNS name (CNAME). No proxy. Use for: routing to an external database by a K8s-internal name so you can change the external endpoint without updating apps.
For most production setups: internal services use ClusterIP, external traffic enters via one Ingress (not multiple LoadBalancer services — expensive and harder to manage).
24
What is a Headless Service?Medium

A Headless Service has clusterIP: None. Instead of returning a single virtual IP, DNS returns the individual Pod IPs directly.

apiVersion: v1
kind: Service
metadata:
  name: cassandra
spec:
  clusterIP: None       # headless
  selector:
    app: cassandra
  ports:
  - port: 9042

DNS for headless service:

# Normal Service:   cassandra.ns.svc.cluster.local → 10.96.0.100 (single VIP)
# Headless Service: cassandra.ns.svc.cluster.local → 10.244.1.5, 10.244.2.7, 10.244.3.2

# StatefulSet Pods also get:
# cassandra-0.cassandra.ns.svc.cluster.local → 10.244.1.5  (stable, individual)

Required for StatefulSets so each Pod gets a stable, individually addressable hostname. Cassandra, Kafka, etcd clusters need to directly address specific peers.

25
What is an Ingress and how does it differ from a LoadBalancer service?Medium
  • LoadBalancer Service — one cloud load balancer per service. With 20 services, you get 20 load balancers and 20 external IPs. Expensive. No HTTP-level routing.
  • Ingress — single entry point (one LoadBalancer) with routing rules. Routes HTTP/HTTPS traffic to multiple services based on hostname, path, headers. Managed by an Ingress Controller (nginx, Traefik, HAProxy, AWS ALB controller).
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  tls:
  - hosts: [api.myapp.com]
    secretName: api-tls
  rules:
  - host: api.myapp.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-v1-svc
            port: {number: 80}
26
What is a NetworkPolicy?Medium

NetworkPolicies are firewall rules for Pods — they restrict inbound (ingress) and outbound (egress) traffic at the Pod level. Implemented by the CNI plugin (Calico, Cilium, Weave).

# Allow only frontend to talk to backend; deny everything else:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-allow-frontend
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes: [Ingress]
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

Default: no NetworkPolicies = all traffic allowed between all Pods. Best practice: start with a default-deny policy in each namespace, then explicitly allow needed communication. This limits blast radius if one Pod is compromised.

27
How does Kubernetes DNS (CoreDNS) work?Medium

CoreDNS runs as a Deployment in kube-system. Every Pod's /etc/resolv.conf points to the CoreDNS ClusterIP.

# DNS resolution patterns:
service-name                          # same namespace
service-name.namespace                # cross-namespace
service-name.namespace.svc            # explicit
service-name.namespace.svc.cluster.local  # fully qualified

# StatefulSet Pod DNS:
pod-0.service-name.namespace.svc.cluster.local

# Examples:
curl http://user-service          # resolves user-service.default.svc.cluster.local
curl http://auth-svc.auth-ns      # cross-namespace

CoreDNS also supports custom DNS entries via the Corefile, forwarding external domains to upstream resolvers, and plugins for service discovery integration.

28
What is a CNI plugin and what does it do?Medium

CNI (Container Network Interface) plugins implement Pod networking. When a Pod is created, the CNI plugin:

  1. Assigns an IP address from the Pod CIDR to the Pod
  2. Creates a veth pair connecting the Pod's network namespace to the node's network
  3. Programs routing so other Pods and nodes can reach this Pod

Popular CNI plugins:

  • Calico — BGP-based routing, excellent NetworkPolicy support, high performance. Most popular in production.
  • Cilium — eBPF-based, replaces kube-proxy, advanced observability (Hubble), L7 NetworkPolicies, service mesh capabilities. Growing rapidly.
  • Flannel — simple VXLAN overlay. Minimal features. Good for bare metal dev clusters.
  • AWS VPC CNI — uses real VPC IPs for Pods (no overlay). Native AWS routing, good performance, EC2 ENI secondary IP limits apply.
29
What is a Service Mesh (Istio) and when do you need it?Hard

A service mesh is a dedicated infrastructure layer for service-to-service communication, implemented via sidecar proxies (Envoy) injected into each Pod.

What it provides (without changing app code):

  • mTLS — mutual TLS between all services automatically; zero-trust networking
  • Traffic management — fine-grained routing (canary: 5% to v2), retries, timeouts, circuit breaking
  • Observability — distributed traces, per-service metrics (latency, error rate, RPS) without instrumentation
  • Auth policies — allow Service A to call Service B, deny everything else — enforced at the proxy level

When you need it: 5+ microservices in production, compliance requires encryption in transit for internal traffic, you need canary deployments with traffic splitting, or you need distributed tracing without modifying every service.

Service meshes add latency (sidecar hop) and operational complexity. Don't add Istio to a 3-service app. Consider Cilium as a lighter alternative (eBPF, no sidecars).
30
How does kube-proxy implement Service routing?Hard

kube-proxy implements Services by programming network rules on every node:

iptables mode (default):

  • For each Service, creates iptables DNAT rules that redirect traffic to ClusterIP:Port to a random Pod IP
  • Rules are evaluated linearly — poor performance with 10,000+ services

IPVS mode (production at scale):

  • Uses Linux kernel IPVS (IP Virtual Server) — a proper load balancer in the kernel
  • Hash table lookup: O(1) regardless of number of services. Use this for clusters with hundreds of services.
  • Supports more LB algorithms: round-robin, least-connections, destination hash

eBPF mode (Cilium without kube-proxy):

  • Replaces kube-proxy entirely. Bypasses iptables stack. Best performance.
31
What is DNS-based service discovery in Kubernetes?Easy

When a Service is created, Kubernetes automatically creates a DNS record. Any Pod can resolve the service by name — no hardcoded IPs needed.

# Automatic DNS records created for:
# Service "payment-svc" in namespace "default":
payment-svc.default.svc.cluster.local → 10.96.15.200 (ClusterIP)

# Environment variables also injected into every Pod:
PAYMENT_SVC_SERVICE_HOST=10.96.15.200
PAYMENT_SVC_SERVICE_PORT=8080

# Best practice: use DNS names, not env vars
# (env vars are only set for services that existed before the Pod started)
32
What is a Gateway API and how does it improve on Ingress?Medium

Gateway API is the next-generation K8s networking API (GA in K8s 1.28), replacing Ingress. It separates concerns into roles:

  • GatewayClass — managed by infrastructure providers (which load balancer implementation to use)
  • Gateway — cluster operators define the entry point (ports, protocols, TLS)
  • HTTPRoute — developers define routing rules (path, header-based routing to services)

Improvements over Ingress:

  • Role separation: platform team controls Gateway, app teams control Routes
  • Native support for traffic splitting (canary), header matching, method matching — no annotations
  • Supports TCP, UDP, gRPC routes (Ingress is HTTP/HTTPS only)
  • Cross-namespace routing

Supported by Envoy Gateway, Istio, nginx, Contour, Kong, and AWS ALB.

Storage & Configuration
33
What is the difference between a PersistentVolume (PV), PersistentVolumeClaim (PVC), and StorageClass?Medium
  • PersistentVolume (PV) — a piece of storage provisioned by an admin or dynamically by a StorageClass. Has a lifecycle independent of Pods. Backed by: EBS, NFS, Ceph, local disk.
  • PersistentVolumeClaim (PVC) — a request for storage by a Pod. Specifies size and access mode. K8s binds a matching PV to the PVC.
  • StorageClass — defines how to dynamically provision PVs on demand. When a PVC requests a StorageClass, K8s automatically creates a PV.
# PVC: request 20Gi of fast SSD storage
apiVersion: v1
kind: PersistentVolumeClaim
spec:
  storageClassName: gp3      # AWS EBS gp3
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 20Gi

# Access modes: ReadWriteOnce (single node), ReadOnlyMany, ReadWriteMany (NFS/EFS)
34
What is the difference between a ConfigMap and a Secret?Easy
  • ConfigMap — stores non-sensitive config: feature flags, database hostnames, log levels, configuration files. Stored in plaintext in etcd.
  • Secret — stores sensitive data. Base64-encoded (NOT encrypted) in etcd by default. Encrypt etcd at rest with a KMS provider key for real security. Mounted as env vars or files.
# Secret: inject credentials as env vars
env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-credentials
      key: password

# ConfigMap: inject config file as a volume
volumes:
- name: app-config
  configMap:
    name: app-configuration
volumeMounts:
- name: app-config
  mountPath: /etc/app/config
Base64 is NOT encryption. Always enable etcd encryption at rest. Better: use external secrets (Vault, AWS Secrets Manager) via External Secrets Operator.
35
What is the Container Storage Interface (CSI)?Medium

CSI is a standard interface that allows storage vendors to develop drivers that work with any container orchestrator (K8s, Mesos, etc.). Storage vendors implement the CSI API; K8s calls it to create/delete/mount volumes.

  • Dynamic provisioning — CSI driver creates the actual cloud disk (EBS, Filestore) when a PVC is created
  • Volume snapshots — CSI snapshotting API lets K8s take backups of PVCs consistently
  • Volume expansion — resize PVCs live without downtime (supported by EBS, Filestore)
  • Raw block volumes — expose block device directly to Pod (databases that manage their own FS)

Popular CSI drivers: aws-ebs-csi-driver, efs-csi-driver, gce-pd-csi-driver, ceph-csi, longhorn.

36
How do you share files between containers in the same Pod?Easy
# emptyDir volume: created when Pod starts, deleted when Pod is removed
# Perfect for sharing data between sidecar and main container

spec:
  volumes:
  - name: shared-logs
    emptyDir: {}             # in memory: emptyDir: {medium: Memory}

  containers:
  - name: app
    image: my-app
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app

  - name: log-shipper
    image: fluentd
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/input
      readOnly: true

emptyDir is also used for large in-memory caches or scratch space for batch jobs. emptyDir.medium: Memory uses tmpfs (RAM-backed) — faster but counts toward memory limits.

37
What is External Secrets Operator and why is it preferred over native Secrets?Medium

External Secrets Operator (ESO) syncs secrets from external systems (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, GCP Secret Manager) into Kubernetes Secrets automatically.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-secret
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: db-credentials    # creates this K8s Secret
  data:
  - secretKey: password
    remoteRef:
      key: prod/db/password  # AWS Secrets Manager key

Why preferred: secrets never stored in Git (GitOps safe), auto-rotates when the external secret changes, single source of truth across clouds, audit trail in the external system, no etcd encryption complexity.

38
What is a projected volume?Medium

A projected volume maps several volume sources (ServiceAccount token, Secrets, ConfigMaps, downward API) into a single directory in a Pod. Useful for combining multiple config sources into one mount point.

volumes:
- name: all-configs
  projected:
    sources:
    - secret:
        name: db-credentials
    - configMap:
        name: app-config
    - serviceAccountToken:
        path: token
        expirationSeconds: 3600
    - downwardAPI:
        items:
        - path: namespace
          fieldRef:
            fieldPath: metadata.namespace

The Downward API exposes Pod metadata (name, namespace, labels, resource limits) to containers as files or env vars — without calling the K8s API.

Security & RBAC
39
Explain Kubernetes RBAC — Roles, ClusterRoles, RoleBindings.Medium
  • Role — grants permissions within a namespace (get/list/watch Pods in "default")
  • ClusterRole — grants permissions cluster-wide, or used across namespaces
  • RoleBinding — binds a Role/ClusterRole to a User, Group, or ServiceAccount within a namespace
  • ClusterRoleBinding — binds a ClusterRole to a subject cluster-wide
# Role: allow reading pods in "monitoring" namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: monitoring
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]

---
# Bind to a ServiceAccount:
kind: RoleBinding
subjects:
- kind: ServiceAccount
  name: metrics-collector
  namespace: monitoring
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io
40
What is a ServiceAccount and why should Pods have their own?Medium

A ServiceAccount provides an identity to processes running in a Pod, allowing them to authenticate to the K8s API and to external services (AWS via IRSA, GCP via Workload Identity).

# Bad: use default service account (same for all Pods in namespace)
# Good: dedicated service account per app with minimum permissions

apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456:role/payment-role  # IRSA

# In Deployment:
spec:
  serviceAccountName: payment-service
  automountServiceAccountToken: false  # don't mount if app doesn't call K8s API

IRSA (IAM Roles for Service Accounts) on EKS: the K8s ServiceAccount is mapped to an IAM role. The Pod gets temporary AWS credentials via the projected token — no long-lived access keys in Pods.

41
What is Pod Security Admission (PSA) and what replaced PodSecurityPolicy?Medium

PodSecurityPolicy (PSP) was deprecated in K8s 1.21 and removed in 1.25. Replaced by Pod Security Admission (built-in) and external admission webhooks (OPA/Gatekeeper, Kyverno).

PSA enforces three pre-defined security profiles at namespace level:

  • Privileged — unrestricted. Only for trusted system namespaces.
  • Baseline — prevents known privilege escalations. Minimum restrictions for general workloads.
  • Restricted — heavily restricted. Follows pod hardening best practices (no root, read-only filesystem, drop all capabilities).
# Apply restricted profile to a namespace:
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted
42
What are admission controllers and admission webhooks?Hard

Admission controllers intercept API requests after authentication/authorisation but before the object is persisted to etcd. They can mutate or validate objects.

Built-in admission controllers:

  • LimitRanger — sets default resource requests/limits if not specified
  • ResourceQuota — enforces namespace resource quotas
  • PodSecurity — enforces PSA policies
  • MutatingAdmissionWebhook — calls external webhook to mutate objects (inject sidecars, add labels)
  • ValidatingAdmissionWebhook — calls external webhook to validate objects (reject if policy violated)

Admission webhooks:

  • OPA Gatekeeper — defines policies in Rego language; validates objects. E.g. "all Pods must have resource limits."
  • Kyverno — K8s-native policy engine using YAML. Easier to write than Rego. Can mutate and validate.
  • Istio injection webhook — automatically injects Envoy sidecar into Pods in labeled namespaces.
43
What are container security best practices in Kubernetes?Medium
securityContext:
  runAsNonRoot: true          # don't run as root
  runAsUser: 1000             # specific non-root UID
  runAsGroup: 3000
  readOnlyRootFilesystem: true  # can't write to filesystem
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]             # drop all Linux capabilities
    add: ["NET_BIND_SERVICE"] # add only what's needed

  seccompProfile:
    type: RuntimeDefault       # restrict syscalls via seccomp

Additional hardening:

  • Use minimal base images (distroless, scratch, Alpine)
  • Scan images with Trivy, Grype, Snyk in CI
  • Never use hostPID: true, hostNetwork: true, or privileged: true unless absolutely required
  • Use OCI image signing (Cosign) and verify at admission
44
What is etcd and how do you secure and back it up?Hard

etcd is the distributed key-value store that holds all cluster state. Losing etcd = losing the cluster. All API server reads and writes go through etcd.

Security:

  • TLS client certificates for all access (API server uses --etcd-certfile)
  • Encryption at rest: configure EncryptionConfiguration with a KMS provider to encrypt Secrets in etcd
  • Firewall: etcd listens on port 2379/2380. Block all access except from control plane nodes.
  • Run etcd on dedicated nodes, not shared with workloads

Backup:

# Snapshot etcd to a file:
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

# Verify snapshot:
etcdctl snapshot status /backup/etcd-20260623.db

Automate snapshots every 30 minutes and ship to S3. Test restore quarterly.

45
What is Falco and how does runtime security work in K8s?Medium

Falco is a runtime security tool that detects anomalous behaviour in containers by monitoring system calls (via eBPF/kernel module).

What Falco detects:

  • Shell opened inside a container (exec /bin/bash)
  • Sensitive file read (/etc/shadow, /etc/kubernetes/admin.conf)
  • Network connection to unexpected destination
  • Privilege escalation attempt
  • Container running as root
  • Write to unexpected filesystem path

Rules are defined in YAML. Alerts go to stdout, syslog, or webhook (Slack, PagerDuty, Falcosidekick). Falco runs as a DaemonSet, one Pod per node, privileged.

46
What is image pull policy and how do you manage private registries?Easy
  • Always — always pull, even if image exists on node. Use for latest tags.
  • IfNotPresent — pull only if not present on node. Default for versioned tags. Faster startup.
  • Never — never pull; must be pre-loaded on node. For air-gapped environments.
# Private registry credentials via imagePullSecret:
kubectl create secret docker-registry ecr-creds \
  --docker-server=123456.dkr.ecr.us-east-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password)

# Reference in Pod:
spec:
  imagePullSecrets:
  - name: ecr-creds

On EKS with ECR: use amazon-ecr-credential-helper or attach an ECR pull policy to the node IAM role — no imagePullSecrets needed. ECR tokens expire every 12 hours; the helper refreshes automatically.

Helm, Scaling & Troubleshooting
47
What is Helm and what problem does it solve?Easy

Helm is the package manager for Kubernetes. A chart is a package of K8s YAML templates with parameterised values. Helm renders the templates and applies them as a release.

Problems Helm solves:

  • Templating — don't repeat app-name, image-tag, replicas in 10 YAML files
  • Release management — track what's deployed (helm list), upgrade, rollback
  • Sharing — charts in public repositories (ArtifactHub) for nginx, cert-manager, Prometheus, etc.
helm install my-nginx nginx-stable/nginx-ingress \
  --set controller.replicaCount=2 \
  --namespace ingress-nginx

helm upgrade my-nginx nginx-stable/nginx-ingress --set controller.replicaCount=3
helm rollback my-nginx 1        # rollback to revision 1
helm uninstall my-nginx
48
What is Kustomize and how does it differ from Helm?Medium
  • Helm — Go template engine. Chart = templates + values. Powerful but YAML becomes complex (conditionals, loops in templates). Release state tracked in K8s secrets. Better for distributing to external users.
  • Kustomize — overlay system. Start with base K8s YAML (no templating), apply patches per environment. Native in kubectl (kubectl apply -k). No release tracking. Pure YAML, easier to review diffs.
# Kustomize structure:
base/
  deployment.yaml        # base with common config
  service.yaml
overlays/
  dev/
    kustomization.yaml   # patches: replicas: 1, image: dev
  prod/
    kustomization.yaml   # patches: replicas: 5, image: prod

kubectl apply -k overlays/prod/

Many teams use both: Helm for third-party charts (Prometheus, cert-manager), Kustomize for their own app manifests.

49
What is GitOps and how does Argo CD implement it?Hard

GitOps uses a Git repository as the single source of truth for the desired cluster state. A GitOps operator continuously reconciles the live cluster state with Git.

Argo CD:

  • Watches a Git repo (or specific paths/branches)
  • Compares live K8s state with desired state in Git
  • Auto-syncs (or prompts for manual sync) when drift is detected
  • Supports Helm, Kustomize, raw YAML, Jsonnet
  • Web UI shows sync status, app health, deployment history
# Deploy: git push → Argo CD detects change → applies to cluster
# Rollback: git revert → Argo CD reverts cluster to previous state
# Audit: git log shows who changed what and when
# Disaster recovery: git clone → apply to new cluster → live in minutes

GitOps benefits: all changes audited in Git, PRs = change management, rollback = git revert, cluster can be fully recreated from Git. Flux CD is the alternative (CNCF incubating, more lightweight).

50
What is KEDA and how does it extend Kubernetes autoscaling?Medium

KEDA (Kubernetes Event-driven Autoscaling) scales Pods based on external event sources — not just CPU/memory. It extends HPA to use metrics from external systems.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef:
    name: order-processor
  minReplicaCount: 0          # scale to zero!
  maxReplicaCount: 20
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/xxx/orders
      queueLength: "10"       # 1 pod per 10 messages in queue

Supported scalers (80+): SQS, Kafka, RabbitMQ, Redis, Prometheus metrics, Datadog, cron schedule, Azure Service Bus, HTTP request rate, and many more.

Scale-to-zero: when queue is empty, KEDA scales Pods to 0 (no cost). When a message arrives, KEDA scales from 0 to 1 and then HPA takes over for further scaling. Great for batch workloads and cost optimisation.

51
A Pod is stuck in CrashLoopBackOff. How do you diagnose it?Medium
# Step 1: describe the Pod for events
kubectl describe pod <pod-name>
# Look for: OOMKilled (memory limit), image errors, probe failures

# Step 2: check logs (previous container)
kubectl logs <pod-name> --previous
# If multi-container: kubectl logs <pod-name> -c <container-name> --previous

# Step 3: check events in namespace
kubectl get events --sort-by='.lastTimestamp' -n <namespace>

# Step 4: if app starts briefly, exec into it
kubectl exec -it <pod-name> -- /bin/sh
# OR: override command temporarily to keep it running:
command: ["sleep", "3600"]

# Common causes:
# - Missing env var or secret
# - DB not reachable (liveness probe fails before ready)
# - OOMKilled → increase memory limit
# - Image missing entrypoint
# - Permission denied on volume mount
52
A Pod is stuck in Pending. What do you check?Medium
kubectl describe pod <pod-name>
# Check Events section at the bottom

# Common causes and fixes:

# 1. Insufficient resources:
# "0/3 nodes are available: 3 Insufficient cpu"
# → Add nodes or reduce CPU request

# 2. No nodes match nodeSelector / nodeAffinity:
# "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity"
# → Check kubectl get nodes --show-labels

# 3. Taint not tolerated:
# "0/3 nodes are available: 3 node(s) had taint {key:NoSchedule}"
# → Add toleration to Pod spec

# 4. PVC not bound:
# "persistentvolumeclaim "my-pvc" not found"
# → Check kubectl get pvc and kubectl describe pvc

# 5. Image pull failing (shows in Events):
# → Check imagePullSecrets, image name, registry access
53
How do you debug networking issues between Pods?Hard
# 1. Test direct Pod-to-Pod connectivity:
kubectl run debug --image=nicolaka/netshoot -it --rm -- bash
# Inside debug pod:
curl http://<pod-ip>:8080
nslookup payment-svc.default.svc.cluster.local
dig payment-svc.default.svc.cluster.local

# 2. Check Service endpoints:
kubectl get endpoints payment-svc
# If empty: selector doesn't match any Pods
# If has IPs: test connectivity to Pod IP directly

# 3. Check Service and selector match:
kubectl get svc payment-svc -o yaml    # check selector
kubectl get pods --show-labels          # check Pod labels

# 4. Check NetworkPolicy blocking:
kubectl get networkpolicies -A
# A NetworkPolicy in target namespace might be blocking ingress

# 5. Check kube-proxy iptables rules:
iptables -t nat -L KUBE-SERVICES | grep payment-svc

# 6. CoreDNS issues:
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl exec -it debug -- nslookup kubernetes.default
54
What is Prometheus + Grafana in Kubernetes and how does monitoring work?Medium

The standard Kubernetes monitoring stack:

  • Prometheus — time-series metrics DB. Scrapes metrics from Pods via /metrics endpoint. Service discovery via K8s API — automatically discovers all Pods with prometheus.io/scrape: "true" annotation.
  • kube-state-metrics — exports K8s object state (Deployment replica count, Pod phase, PVC status) to Prometheus.
  • Prometheus Node Exporter — node-level metrics (CPU, memory, disk, network) as a DaemonSet.
  • Grafana — dashboards querying Prometheus. Pre-built dashboards for K8s available on grafana.com.
  • Alertmanager — routes Prometheus alerts to PagerDuty, Slack, email.

Deploy via kube-prometheus-stack Helm chart (bundles all of the above). For managed Prometheus: AWS Managed Prometheus, Google Managed Prometheus.

55
What is a Custom Resource Definition (CRD) and what is an Operator?Hard

A CRD extends the Kubernetes API with custom resources — you can define your own object types alongside built-in ones like Pod and Deployment.

# After defining a CRD for "Database":
kubectl get databases
kubectl describe database my-postgres

# Example: Postgres Operator CRD
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: my-cluster
spec:
  teamId: myteam
  volume:
    size: 100Gi
  numberOfInstances: 3
  postgresql:
    version: "15"

Operator pattern: A custom controller watches CRD objects and reconciles the actual state to the desired state — running the operational logic a human DBA would do:

  • Create a PostgreSQL cluster with primary + replicas
  • Handle failover when the primary fails
  • Resize storage when requested
  • Take backups on schedule
  • Roll out minor version upgrades safely

Operators encode domain-specific operational knowledge in code. Popular operators: Prometheus Operator, Cert-Manager, Zalando Postgres Operator, Kafka (Strimzi), Elasticsearch (ECK).

What to Study Next