This guide covers the most commonly asked Kubernetes interview questions in 2026 — from core concepts to production-grade architecture, security, and debugging. Equally useful for CKA certification preparation and senior DevOps / platform engineering interviews.
Kubernetes is an open-source container orchestration platform that automates deploying, scaling, and managing containerised applications.
Problems it solves:
spec.nodeName.A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share:
localhost.Why not just a container? Real-world applications often need tightly coupled helpers alongside the main process:
Pods are ephemeral — don't store state in them. Use Deployments to manage Pods declaratively.
Namespaces provide virtual cluster isolation within a physical cluster. Resources in one namespace are hidden from other namespaces by default (but network access still works unless restricted by NetworkPolicies).
kubectl get pods -n kube-system # control plane pods
kubectl get pods -n default # user workloads (default)
kubectl create namespace staging # create namespace
kubectl config set-context --current --namespace=staging # set default
When to use multiple namespaces:
dev, staging, prod within one cluster (better: use separate clusters for prod)Cluster-scoped resources (Nodes, PersistentVolumes, ClusterRoles) are not namespaced.
app: frontend, env: prod, version: v2. Can be queried.labels:
app: api
tier: backend
version: "3.1"
annotations:
deployed-by: "github-actions"
last-commit: "abc1234"
docs: "https://wiki.company.com/api"
kubectl apply -f deployment.yaml?Mediumkubectl reads the YAML, sends an HTTP PATCH/POST to the API serverspec.nodeNameTaints mark a node to repel Pods. Tolerations on a Pod allow it to be scheduled onto a tainted node.
# Taint a node so only GPU workloads run on it:
kubectl taint nodes gpu-node-1 type=gpu:NoSchedule
# Pod toleration to allow scheduling on gpu nodes:
tolerations:
- key: "type"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
Taint effects:
NoSchedule — don't schedule new Pods without the toleration (existing Pods stay)PreferNoSchedule — try to avoid scheduling, but not guaranteedNoExecute — evict existing Pods without toleration immediatelyCommon uses: dedicated nodes for GPU workloads, spot instance pools, system Pods on control plane nodes (which are tainted node-role.kubernetes.io/control-plane:NoSchedule).
requiredDuringSchedulingIgnoredDuringExecution (hard) or preferredDuringSchedulingIgnoredDuringExecution (soft).podAffinity) or away from (podAntiAffinity) other Pods based on their labels. Used for co-location (cache next to app) or anti-co-location (spread replicas across nodes/AZs).# Anti-affinity: spread web Pod replicas across nodes
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: [web]
topologyKey: kubernetes.io/hostname
For HA: use topologyKey: topology.kubernetes.io/zone to spread across AZs. Also consider topologySpreadConstraints — simpler and more powerful for even distribution.
resources:
requests:
cpu: "250m" # 0.25 cores guaranteed
memory: "256Mi"
limits:
cpu: "1" # throttled if it tries to use more
memory: "512Mi" # OOMKilled if exceeded
QoS classes determined by request/limit:
Guaranteed — requests == limits for all containers. Highest priority, last to be evicted.Burstable — requests < limits. Middle priority.BestEffort — no requests or limits set. Evicted first under memory pressure.livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 30 * 10s = 5 min to start
periodSeconds: 10
The scheduler selects a node for each unscheduled Pod in two phases:
1. Filtering (Predicates) — eliminates nodes that don't meet requirements:
2. Scoring (Priorities) — ranks remaining nodes:
The highest-scoring node wins. The scheduler writes spec.nodeName to the Pod object. Custom schedulers can be deployed alongside the default scheduler.
A Deployment manages a ReplicaSet which manages Pods. It provides declarative updates with rolling update and rollback.
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # create 1 extra Pod above desired count
maxUnavailable: 0 # never reduce below desired count
Rolling update process:
maxSurge (e.g. 1 new Pod)maxUnavailable (if 0: only after new one is ready)kubectl rollout status deployment/api
kubectl rollout history deployment/api
kubectl rollout undo deployment/api # rollback
kubectl rollout undo deployment/api --to-revision=2
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod identity | Random names (web-5x9kq) | Stable ordinal names (db-0, db-1, db-2) |
| DNS hostname | None per Pod | Stable: db-0.my-svc.ns.svc.cluster.local |
| Storage | Shared or ephemeral | Per-Pod PVC (volumeClaimTemplates) |
| Scaling | Parallel (random order) | Sequential (0→1→2 up, 2→1→0 down) |
| Use for | Stateless apps | Databases, Kafka, Zookeeper, Elasticsearch |
StatefulSets require a Headless Service (clusterIP: None) to create the stable DNS records per Pod.
A DaemonSet ensures one Pod runs on every node (or a selected subset of nodes). When a new node joins the cluster, the DaemonSet automatically adds a Pod to it.
Use cases:
kubectl get daemonsets -n kube-system
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE
# 3 3 3 3 3
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 2 * * *" # 2 AM every day
concurrencyPolicy: Forbid # don't start if previous is still running
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
backoffLimit: 2 # retry up to 2 times on failure
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: backup-tool:latest
A PDB limits the number of Pods of a workload that can be voluntarily disrupted simultaneously (node drain, cluster upgrade). It prevents downtime during maintenance.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # at least 2 pods must always be available
# OR: maxUnavailable: 1 # at most 1 pod can be down at once
selector:
matchLabels:
app: api
When kubectl drain is run on a node, Kubernetes respects PDBs. If draining would violate the PDB, the drain waits. PDBs only apply to voluntary disruptions (drain, node upgrade) — node failures are involuntary and bypass PDB.
Init containers run to completion before any app container starts. They run in order, one at a time. If an init container fails, the Pod restarts (respecting restartPolicy).
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z postgres-svc 5432; do sleep 2; done']
- name: run-migrations
image: my-app:latest
command: ['./migrate.sh']
env:
- name: DB_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
Common init container patterns:
# HPA: scale between 2 and 10 pods based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Cluster Autoscaler automatically adjusts the number of nodes in a cluster when:
CA works with the cloud provider's autoscaling groups (AWS ASG, GCP MIG). On EKS: managed node groups + CA, or use Karpenter (AWS-native, faster and more flexible than CA — provisions the exact instance type needed, not just the configured type).
# CA respects PDBs during scale-down (won't drain a node if it violates a PDB)
# CA does NOT scale down nodes with:
# - Pods with local storage
# - Pods not managed by a controller
# - Pods with restrictive PDBs
A sidecar is a helper container running in the same Pod as the main app, sharing its network and volumes. The main app doesn't know or care about the sidecar — it adds cross-cutting behaviour transparently.
Common sidecar patterns:
kubectl delete pod and a Pod being evicted?Mediumkubectl delete pod — sends SIGTERM to the container, waits terminationGracePeriodSeconds (default 30s), then SIGKILL. The Pod's controller (Deployment/RS) immediately creates a replacement Pod.Pre-stop hook lets you run cleanup before SIGTERM — useful for graceful connection draining in load balancers or draining from service mesh before shutdown.
NodeIP:NodePort. Mainly for dev/test or when you control the load balancer yourself.A Headless Service has clusterIP: None. Instead of returning a single virtual IP, DNS returns the individual Pod IPs directly.
apiVersion: v1
kind: Service
metadata:
name: cassandra
spec:
clusterIP: None # headless
selector:
app: cassandra
ports:
- port: 9042
DNS for headless service:
# Normal Service: cassandra.ns.svc.cluster.local → 10.96.0.100 (single VIP)
# Headless Service: cassandra.ns.svc.cluster.local → 10.244.1.5, 10.244.2.7, 10.244.3.2
# StatefulSet Pods also get:
# cassandra-0.cassandra.ns.svc.cluster.local → 10.244.1.5 (stable, individual)
Required for StatefulSets so each Pod gets a stable, individually addressable hostname. Cassandra, Kafka, etcd clusters need to directly address specific peers.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
tls:
- hosts: [api.myapp.com]
secretName: api-tls
rules:
- host: api.myapp.com
http:
paths:
- path: /v1
pathType: Prefix
backend:
service:
name: api-v1-svc
port: {number: 80}
NetworkPolicies are firewall rules for Pods — they restrict inbound (ingress) and outbound (egress) traffic at the Pod level. Implemented by the CNI plugin (Calico, Cilium, Weave).
# Allow only frontend to talk to backend; deny everything else:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-allow-frontend
spec:
podSelector:
matchLabels:
app: backend
policyTypes: [Ingress]
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
Default: no NetworkPolicies = all traffic allowed between all Pods. Best practice: start with a default-deny policy in each namespace, then explicitly allow needed communication. This limits blast radius if one Pod is compromised.
CoreDNS runs as a Deployment in kube-system. Every Pod's /etc/resolv.conf points to the CoreDNS ClusterIP.
# DNS resolution patterns:
service-name # same namespace
service-name.namespace # cross-namespace
service-name.namespace.svc # explicit
service-name.namespace.svc.cluster.local # fully qualified
# StatefulSet Pod DNS:
pod-0.service-name.namespace.svc.cluster.local
# Examples:
curl http://user-service # resolves user-service.default.svc.cluster.local
curl http://auth-svc.auth-ns # cross-namespace
CoreDNS also supports custom DNS entries via the Corefile, forwarding external domains to upstream resolvers, and plugins for service discovery integration.
CNI (Container Network Interface) plugins implement Pod networking. When a Pod is created, the CNI plugin:
Popular CNI plugins:
A service mesh is a dedicated infrastructure layer for service-to-service communication, implemented via sidecar proxies (Envoy) injected into each Pod.
What it provides (without changing app code):
When you need it: 5+ microservices in production, compliance requires encryption in transit for internal traffic, you need canary deployments with traffic splitting, or you need distributed tracing without modifying every service.
kube-proxy implements Services by programming network rules on every node:
iptables mode (default):
ClusterIP:Port to a random Pod IPIPVS mode (production at scale):
eBPF mode (Cilium without kube-proxy):
When a Service is created, Kubernetes automatically creates a DNS record. Any Pod can resolve the service by name — no hardcoded IPs needed.
# Automatic DNS records created for:
# Service "payment-svc" in namespace "default":
payment-svc.default.svc.cluster.local → 10.96.15.200 (ClusterIP)
# Environment variables also injected into every Pod:
PAYMENT_SVC_SERVICE_HOST=10.96.15.200
PAYMENT_SVC_SERVICE_PORT=8080
# Best practice: use DNS names, not env vars
# (env vars are only set for services that existed before the Pod started)
Gateway API is the next-generation K8s networking API (GA in K8s 1.28), replacing Ingress. It separates concerns into roles:
Improvements over Ingress:
Supported by Envoy Gateway, Istio, nginx, Contour, Kong, and AWS ALB.
# PVC: request 20Gi of fast SSD storage
apiVersion: v1
kind: PersistentVolumeClaim
spec:
storageClassName: gp3 # AWS EBS gp3
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 20Gi
# Access modes: ReadWriteOnce (single node), ReadOnlyMany, ReadWriteMany (NFS/EFS)
# Secret: inject credentials as env vars
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
# ConfigMap: inject config file as a volume
volumes:
- name: app-config
configMap:
name: app-configuration
volumeMounts:
- name: app-config
mountPath: /etc/app/config
CSI is a standard interface that allows storage vendors to develop drivers that work with any container orchestrator (K8s, Mesos, etc.). Storage vendors implement the CSI API; K8s calls it to create/delete/mount volumes.
Popular CSI drivers: aws-ebs-csi-driver, efs-csi-driver, gce-pd-csi-driver, ceph-csi, longhorn.
# emptyDir volume: created when Pod starts, deleted when Pod is removed
# Perfect for sharing data between sidecar and main container
spec:
volumes:
- name: shared-logs
emptyDir: {} # in memory: emptyDir: {medium: Memory}
containers:
- name: app
image: my-app
volumeMounts:
- name: shared-logs
mountPath: /var/log/app
- name: log-shipper
image: fluentd
volumeMounts:
- name: shared-logs
mountPath: /var/log/input
readOnly: true
emptyDir is also used for large in-memory caches or scratch space for batch jobs. emptyDir.medium: Memory uses tmpfs (RAM-backed) — faster but counts toward memory limits.
External Secrets Operator (ESO) syncs secrets from external systems (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, GCP Secret Manager) into Kubernetes Secrets automatically.
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-secret
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: db-credentials # creates this K8s Secret
data:
- secretKey: password
remoteRef:
key: prod/db/password # AWS Secrets Manager key
Why preferred: secrets never stored in Git (GitOps safe), auto-rotates when the external secret changes, single source of truth across clouds, audit trail in the external system, no etcd encryption complexity.
A projected volume maps several volume sources (ServiceAccount token, Secrets, ConfigMaps, downward API) into a single directory in a Pod. Useful for combining multiple config sources into one mount point.
volumes:
- name: all-configs
projected:
sources:
- secret:
name: db-credentials
- configMap:
name: app-config
- serviceAccountToken:
path: token
expirationSeconds: 3600
- downwardAPI:
items:
- path: namespace
fieldRef:
fieldPath: metadata.namespace
The Downward API exposes Pod metadata (name, namespace, labels, resource limits) to containers as files or env vars — without calling the K8s API.
# Role: allow reading pods in "monitoring" namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: monitoring
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
---
# Bind to a ServiceAccount:
kind: RoleBinding
subjects:
- kind: ServiceAccount
name: metrics-collector
namespace: monitoring
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
A ServiceAccount provides an identity to processes running in a Pod, allowing them to authenticate to the K8s API and to external services (AWS via IRSA, GCP via Workload Identity).
# Bad: use default service account (same for all Pods in namespace)
# Good: dedicated service account per app with minimum permissions
apiVersion: v1
kind: ServiceAccount
metadata:
name: payment-service
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456:role/payment-role # IRSA
# In Deployment:
spec:
serviceAccountName: payment-service
automountServiceAccountToken: false # don't mount if app doesn't call K8s API
IRSA (IAM Roles for Service Accounts) on EKS: the K8s ServiceAccount is mapped to an IAM role. The Pod gets temporary AWS credentials via the projected token — no long-lived access keys in Pods.
PodSecurityPolicy (PSP) was deprecated in K8s 1.21 and removed in 1.25. Replaced by Pod Security Admission (built-in) and external admission webhooks (OPA/Gatekeeper, Kyverno).
PSA enforces three pre-defined security profiles at namespace level:
# Apply restricted profile to a namespace:
kubectl label namespace production \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/warn=restricted \
pod-security.kubernetes.io/audit=restricted
Admission controllers intercept API requests after authentication/authorisation but before the object is persisted to etcd. They can mutate or validate objects.
Built-in admission controllers:
LimitRanger — sets default resource requests/limits if not specifiedResourceQuota — enforces namespace resource quotasPodSecurity — enforces PSA policiesMutatingAdmissionWebhook — calls external webhook to mutate objects (inject sidecars, add labels)ValidatingAdmissionWebhook — calls external webhook to validate objects (reject if policy violated)Admission webhooks:
securityContext:
runAsNonRoot: true # don't run as root
runAsUser: 1000 # specific non-root UID
runAsGroup: 3000
readOnlyRootFilesystem: true # can't write to filesystem
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"] # drop all Linux capabilities
add: ["NET_BIND_SERVICE"] # add only what's needed
seccompProfile:
type: RuntimeDefault # restrict syscalls via seccomp
Additional hardening:
hostPID: true, hostNetwork: true, or privileged: true unless absolutely requiredetcd is the distributed key-value store that holds all cluster state. Losing etcd = losing the cluster. All API server reads and writes go through etcd.
Security:
--etcd-certfile)EncryptionConfiguration with a KMS provider to encrypt Secrets in etcdBackup:
# Snapshot etcd to a file:
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key
# Verify snapshot:
etcdctl snapshot status /backup/etcd-20260623.db
Automate snapshots every 30 minutes and ship to S3. Test restore quarterly.
Falco is a runtime security tool that detects anomalous behaviour in containers by monitoring system calls (via eBPF/kernel module).
What Falco detects:
exec /bin/bash)/etc/shadow, /etc/kubernetes/admin.conf)Rules are defined in YAML. Alerts go to stdout, syslog, or webhook (Slack, PagerDuty, Falcosidekick). Falco runs as a DaemonSet, one Pod per node, privileged.
Always — always pull, even if image exists on node. Use for latest tags.IfNotPresent — pull only if not present on node. Default for versioned tags. Faster startup.Never — never pull; must be pre-loaded on node. For air-gapped environments.# Private registry credentials via imagePullSecret:
kubectl create secret docker-registry ecr-creds \
--docker-server=123456.dkr.ecr.us-east-1.amazonaws.com \
--docker-username=AWS \
--docker-password=$(aws ecr get-login-password)
# Reference in Pod:
spec:
imagePullSecrets:
- name: ecr-creds
On EKS with ECR: use amazon-ecr-credential-helper or attach an ECR pull policy to the node IAM role — no imagePullSecrets needed. ECR tokens expire every 12 hours; the helper refreshes automatically.
Helm is the package manager for Kubernetes. A chart is a package of K8s YAML templates with parameterised values. Helm renders the templates and applies them as a release.
Problems Helm solves:
app-name, image-tag, replicas in 10 YAML fileshelm list), upgrade, rollbackhelm install my-nginx nginx-stable/nginx-ingress \
--set controller.replicaCount=2 \
--namespace ingress-nginx
helm upgrade my-nginx nginx-stable/nginx-ingress --set controller.replicaCount=3
helm rollback my-nginx 1 # rollback to revision 1
helm uninstall my-nginx
kubectl apply -k). No release tracking. Pure YAML, easier to review diffs.# Kustomize structure:
base/
deployment.yaml # base with common config
service.yaml
overlays/
dev/
kustomization.yaml # patches: replicas: 1, image: dev
prod/
kustomization.yaml # patches: replicas: 5, image: prod
kubectl apply -k overlays/prod/
Many teams use both: Helm for third-party charts (Prometheus, cert-manager), Kustomize for their own app manifests.
GitOps uses a Git repository as the single source of truth for the desired cluster state. A GitOps operator continuously reconciles the live cluster state with Git.
Argo CD:
# Deploy: git push → Argo CD detects change → applies to cluster
# Rollback: git revert → Argo CD reverts cluster to previous state
# Audit: git log shows who changed what and when
# Disaster recovery: git clone → apply to new cluster → live in minutes
GitOps benefits: all changes audited in Git, PRs = change management, rollback = git revert, cluster can be fully recreated from Git. Flux CD is the alternative (CNCF incubating, more lightweight).
KEDA (Kubernetes Event-driven Autoscaling) scales Pods based on external event sources — not just CPU/memory. It extends HPA to use metrics from external systems.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
scaleTargetRef:
name: order-processor
minReplicaCount: 0 # scale to zero!
maxReplicaCount: 20
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/xxx/orders
queueLength: "10" # 1 pod per 10 messages in queue
Supported scalers (80+): SQS, Kafka, RabbitMQ, Redis, Prometheus metrics, Datadog, cron schedule, Azure Service Bus, HTTP request rate, and many more.
Scale-to-zero: when queue is empty, KEDA scales Pods to 0 (no cost). When a message arrives, KEDA scales from 0 to 1 and then HPA takes over for further scaling. Great for batch workloads and cost optimisation.
CrashLoopBackOff. How do you diagnose it?Medium# Step 1: describe the Pod for events
kubectl describe pod <pod-name>
# Look for: OOMKilled (memory limit), image errors, probe failures
# Step 2: check logs (previous container)
kubectl logs <pod-name> --previous
# If multi-container: kubectl logs <pod-name> -c <container-name> --previous
# Step 3: check events in namespace
kubectl get events --sort-by='.lastTimestamp' -n <namespace>
# Step 4: if app starts briefly, exec into it
kubectl exec -it <pod-name> -- /bin/sh
# OR: override command temporarily to keep it running:
command: ["sleep", "3600"]
# Common causes:
# - Missing env var or secret
# - DB not reachable (liveness probe fails before ready)
# - OOMKilled → increase memory limit
# - Image missing entrypoint
# - Permission denied on volume mount
Pending. What do you check?Mediumkubectl describe pod <pod-name>
# Check Events section at the bottom
# Common causes and fixes:
# 1. Insufficient resources:
# "0/3 nodes are available: 3 Insufficient cpu"
# → Add nodes or reduce CPU request
# 2. No nodes match nodeSelector / nodeAffinity:
# "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity"
# → Check kubectl get nodes --show-labels
# 3. Taint not tolerated:
# "0/3 nodes are available: 3 node(s) had taint {key:NoSchedule}"
# → Add toleration to Pod spec
# 4. PVC not bound:
# "persistentvolumeclaim "my-pvc" not found"
# → Check kubectl get pvc and kubectl describe pvc
# 5. Image pull failing (shows in Events):
# → Check imagePullSecrets, image name, registry access
# 1. Test direct Pod-to-Pod connectivity:
kubectl run debug --image=nicolaka/netshoot -it --rm -- bash
# Inside debug pod:
curl http://<pod-ip>:8080
nslookup payment-svc.default.svc.cluster.local
dig payment-svc.default.svc.cluster.local
# 2. Check Service endpoints:
kubectl get endpoints payment-svc
# If empty: selector doesn't match any Pods
# If has IPs: test connectivity to Pod IP directly
# 3. Check Service and selector match:
kubectl get svc payment-svc -o yaml # check selector
kubectl get pods --show-labels # check Pod labels
# 4. Check NetworkPolicy blocking:
kubectl get networkpolicies -A
# A NetworkPolicy in target namespace might be blocking ingress
# 5. Check kube-proxy iptables rules:
iptables -t nat -L KUBE-SERVICES | grep payment-svc
# 6. CoreDNS issues:
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl exec -it debug -- nslookup kubernetes.default
The standard Kubernetes monitoring stack:
/metrics endpoint. Service discovery via K8s API — automatically discovers all Pods with prometheus.io/scrape: "true" annotation.Deploy via kube-prometheus-stack Helm chart (bundles all of the above). For managed Prometheus: AWS Managed Prometheus, Google Managed Prometheus.
A CRD extends the Kubernetes API with custom resources — you can define your own object types alongside built-in ones like Pod and Deployment.
# After defining a CRD for "Database":
kubectl get databases
kubectl describe database my-postgres
# Example: Postgres Operator CRD
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: my-cluster
spec:
teamId: myteam
volume:
size: 100Gi
numberOfInstances: 3
postgresql:
version: "15"
Operator pattern: A custom controller watches CRD objects and reconciles the actual state to the desired state — running the operational logic a human DBA would do:
Operators encode domain-specific operational knowledge in code. Popular operators: Prometheus Operator, Cert-Manager, Zalando Postgres Operator, Kafka (Strimzi), Elasticsearch (ECK).