This guide covers the most frequently asked Docker interview questions in 2026 — from container fundamentals and Dockerfile best practices to multi-stage builds, networking, Docker Compose, security hardening, and production patterns.
| Aspect | Container | Virtual Machine |
|---|---|---|
| Isolation | Kernel namespaces + cgroups | Hypervisor + guest OS |
| Size | MBs (share host kernel) | GBs (includes full OS) |
| Startup | Milliseconds | Minutes |
| Performance | Near-native (no hypervisor overhead) | 5-15% overhead |
| OS | Shares host kernel | Full guest OS per VM |
| Security isolation | Weaker (shared kernel) | Stronger (separate kernels) |
Containers isolate at the process level using Linux kernel features:
# Image → Container (multiple containers from one image):
docker image pull nginx:1.26-alpine # download image
docker run -d --name web1 nginx:1.26-alpine # container 1
docker run -d --name web2 nginx:1.26-alpine # container 2 (same image)
docker image ls # list images
docker container ls # list running containers
docker container ls -a # including stopped containers
Multiple containers can run from the same image simultaneously. Each has its own writable layer; changes in one don't affect others or the image.
A registry stores and distributes Docker images. Docker Hub is the default public registry. Private registries: AWS ECR, GCP Artifact Registry, GitHub Packages, Harbor (self-hosted).
# Pull from Docker Hub (default registry):
docker pull nginx:1.26-alpine
docker pull eclipse-temurin:21-jre-jammy
# Tag and push to Docker Hub:
docker build -t myapp:1.0 .
docker tag myapp:1.0 myusername/myapp:1.0
docker push myusername/myapp:1.0
# Push to AWS ECR:
aws ecr get-login-password | docker login \
--username AWS \
--password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker tag myapp:1.0 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0
A Docker image is a stack of read-only layers. Each instruction in a Dockerfile creates a new layer. Layers are cached and shared across images.
FROM eclipse-temurin:21-jre # Layer 1: base OS + JRE (cached)
COPY target/app.jar /app/ # Layer 2: your app jar
EXPOSE 8080 # metadata only (no new layer data)
ENTRYPOINT ["java","-jar","/app/app.jar"] # Layer 3: metadata
# docker image inspect shows layers:
docker image history myapp:1.0
Union filesystem (OverlayFS):
Docker caches each layer after building it. On subsequent builds, if a layer's inputs (instruction + files) haven't changed, Docker reuses the cached layer instead of rebuilding it.
# Cache-unfriendly (copies source first):
COPY . . # Layer invalidated every time ANY file changes
RUN mvn dependency:go-offline # must re-download ALL dependencies
# Cache-friendly (copy pom.xml first):
COPY pom.xml . # only invalidated when pom.xml changes
RUN mvn dependency:go-offline # cached as long as pom.xml hasn't changed
COPY src ./src # invalidated only when src changes
RUN mvn package # only rebuilds when needed
Cache invalidation rules:
COPY/ADD: invalidated if any copied file changes (Docker checksums files)RUN: invalidated only if the instruction text changes (not if the network resources change — apt update may fetch stale packages)--no-cache: force rebuild all layersdocker run, docker start, and docker exec?Easydocker run — creates AND starts a new container from an image. Most common command.docker start — starts an existing stopped container (previously created with run).docker exec — runs a command inside an already-running container. Does not start/stop the container.# Common docker run flags:
docker run \
-d \ # detached (background)
--name my-api \ # container name
-p 8080:8080 \ # host:container port mapping
-e SPRING_PROFILES_ACTIVE=dev \ # env variable
-v /data:/app/data \ # volume mount host:container
--memory=512m \ # memory limit
--cpus=0.5 \ # CPU limit
--restart=unless-stopped \ # restart policy
myapp:1.0
# Execute command in running container:
docker exec -it my-api bash # interactive shell
docker exec my-api env # list env vars
docker exec my-api kill -HUP 1 # send signal to process
CMD and ENTRYPOINT?MediumENTRYPOINT — defines the executable that always runs. Cannot be overridden by docker run arguments (only by --entrypoint flag).CMD — defines default arguments to ENTRYPOINT (or the command itself if no ENTRYPOINT). Can be overridden by docker run arguments.# Pattern 1: ENTRYPOINT + CMD (recommended for single-purpose images)
ENTRYPOINT ["java", "-jar", "/app/app.jar"]
CMD ["--spring.profiles.active=prod"]
# Override CMD: docker run myapp --spring.profiles.active=dev
# Override both: docker run --entrypoint /bin/sh myapp
# Pattern 2: CMD only (flexible — any command can replace it)
CMD ["python", "app.py"]
# Override: docker run myimage python tests.py
# Shell form vs Exec form:
ENTRYPOINT java -jar app.jar # shell form: spawns /bin/sh -c, PID 1 is sh (bad)
ENTRYPOINT ["java","-jar","app.jar"] # exec form: PID 1 is java (correct — receives SIGTERM)
COPY and ADD in a Dockerfile?EasyCOPY — copies files/directories from the build context to the image. Simple and explicit. Preferred.ADD — does everything COPY does, plus: auto-extracts tar archives (.tar.gz → directory), and can fetch remote URLs (though curl in a RUN command is preferred for URLs since ADD doesn't cache URL content well).# Use COPY for regular file copies:
COPY target/app.jar /app/app.jar
COPY src/main/resources/application.yml /app/config/
# Use ADD only when you need auto-extraction:
ADD myarchive.tar.gz /opt/myapp/ # extracted automatically
# Avoid ADD for URLs (no cache, no integrity check):
# Bad: ADD https://example.com/file.jar /app/
# Good:
RUN curl -fsSL https://example.com/file.jar -o /app/file.jar
.dockerignore and why is it important?Easy.dockerignore excludes files from the build context sent to the Docker daemon. The build context is everything in the directory (recursively) when you run docker build.
# .dockerignore
.git/ # no git history in image
.gitignore
**/target/ # Maven build output (not needed — we COPY the jar)
**/*.class
**/__pycache__/
node_modules/ # huge, rebuilt inside container
.env # never put secrets in the image
*.log
Dockerfile
docker-compose*.yml
README.md
.github/
Why important:
node_modules folder sent on every build is extremely slow..env), credentials, and private keys from being accidentally baked into images.containerd and how does it relate to Docker?Mediumcontainerd is the industry-standard container runtime — it manages the container lifecycle (create, start, stop, delete, pull images, manage storage and networking). It's a CNCF graduated project used by Docker, Kubernetes, and major cloud providers.
Docker CLI (docker)
↓ Docker API
Docker Engine (dockerd)
↓ containerd shim
containerd
↓ OCI runtime (runc)
Linux kernel (namespaces, cgroups)
Docker is a developer-friendly toolset on top of containerd. Kubernetes 1.24+ removed the Docker shim and talks directly to containerd (or CRI-O). Container images built with Docker are OCI-compliant and run anywhere (Kubernetes, Podman, containerd directly).
Multi-stage builds use multiple FROM instructions in one Dockerfile. Each stage can copy artifacts from previous stages. The final image contains only what you explicitly copy into it.
# Node.js multi-stage build:
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: Runtime (only built assets, no node_modules for dev)
FROM nginx:1.27-alpine
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Benefits:
# syntax=docker/dockerfile:1
# Stage 1: Dependency resolution (cached separately)
FROM eclipse-temurin:21-jdk-jammy AS deps
WORKDIR /app
COPY pom.xml .
RUN --mount=type=cache,target=/root/.m2 \
mvn dependency:go-offline -q
# Stage 2: Build
FROM deps AS builder
COPY src ./src
RUN --mount=type=cache,target=/root/.m2 \
mvn clean package -DskipTests -q
# Stage 3: Layer extraction (Spring Boot layertools)
FROM builder AS extractor
RUN java -Djarmode=layertools -jar target/*.jar extract
# Stage 4: Runtime (JRE only, non-root user)
FROM eclipse-temurin:21-jre-jammy
RUN groupadd --gid 1001 appgroup && \
useradd --uid 1001 --gid appgroup --no-create-home appuser
WORKDIR /app
COPY --from=extractor --chown=appuser:appgroup /app/dependencies/ ./
COPY --from=extractor --chown=appuser:appgroup /app/spring-boot-loader/ ./
COPY --from=extractor --chown=appuser:appgroup /app/snapshot-dependencies/ ./
COPY --from=extractor --chown=appuser:appgroup /app/application/ ./
USER appuser
EXPOSE 8080
ENTRYPOINT ["java","org.springframework.boot.loader.launch.JarLauncher"]
Layer ordering: dependencies (rare change) → Spring Boot loader → app code (frequent change). Docker only rebuilds the changed layer and everything after it.
BuildKit is the modern build engine for Docker (default since Docker 23.0). Significant improvements over the classic builder:
--mount=type=cache persists build caches (Maven, npm, pip) between builds without including them in the image--mount=type=secret passes secrets to build steps without storing them in any layer--mount=type=ssh passes SSH agent to build (for private git repos)# Enable BuildKit (already default in Docker 23+):
DOCKER_BUILDKIT=1 docker build .
# Secret mount (never stored in layer):
# syntax=docker/dockerfile:1
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
npm install
docker build --secret id=npmrc,src=.npmrc .
# Cache mount (persists Maven cache between builds):
RUN --mount=type=cache,target=/root/.m2 mvn package
docker buildx and multi-platform builds?Mediumdocker buildx is Docker's extended build capabilities using BuildKit. Key use: building images for multiple CPU architectures from a single machine.
# Create a builder that supports multi-platform:
docker buildx create --name multiplatform --use
docker buildx inspect --bootstrap
# Build for AMD64 (x86_64) and ARM64 (Apple M1/M2, AWS Graviton):
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag myapp:1.0 \
--push \ # push directly to registry (required for multi-platform)
.
# Verify multi-arch manifest:
docker buildx imagetools inspect myapp:1.0
Why it matters:
Distroless images (from Google) contain only the application and its runtime dependencies — no shell, no package manager, no system utilities, no cron, no init system.
# Standard JRE image: ~200MB, hundreds of packages, shell included
FROM eclipse-temurin:21-jre-jammy
# Distroless Java 21: ~70MB, only JRE, no shell
FROM gcr.io/distroless/java21-debian12
COPY --from=builder /app/app.jar /app/app.jar
CMD ["/app/app.jar"]
# Distroless static (for statically compiled binaries — Go, Rust):
FROM gcr.io/distroless/static-debian12
Benefits:
Trade-off: debugging is harder (no shell for docker exec). Use debug variants (:debug tag has busybox shell) in dev, distroless in prod. Or use kubectl debug ephemeral containers in K8s.
ARG) and environment variables (ENV)?MediumARG — build-time variable. Available only during the build process (RUN steps). Not present in the running container. Passed with --build-arg.ENV — environment variable baked into the image. Available both during build and in the running container. Visible in docker inspect.# ARG: parameterise the build (e.g. version, base image tag)
ARG JAVA_VERSION=21
FROM eclipse-temurin:${JAVA_VERSION}-jre-jammy
ARG APP_VERSION=1.0.0
LABEL version="${APP_VERSION}"
# ENV: runtime configuration
ENV JAVA_OPTS="-Xmx512m -Xms256m" \
SPRING_PROFILES_ACTIVE=prod \
SERVER_PORT=8080
# Build with custom args:
docker build --build-arg JAVA_VERSION=17 --build-arg APP_VERSION=2.1.0 .
# NEVER use ARG for secrets (visible in docker history):
ARG DB_PASSWORD=secret # BAD — visible in image layers!
# Use --mount=type=secret instead (BuildKit)
HEALTHCHECK instruction?MediumHEALTHCHECK tells Docker how to test if a container is healthy. Docker runs the command periodically; the container status changes to unhealthy if it fails repeatedly.
HEALTHCHECK \
--interval=30s \ # check every 30 seconds
--timeout=10s \ # fail if no response in 10s
--start-period=30s \ # grace period after start
--retries=3 \ # mark unhealthy after 3 failures
CMD curl -f http://localhost:8080/actuator/health || exit 1
# Check container health:
docker inspect --format='{{.State.Health.Status}}' my-container
# output: healthy / unhealthy / starting
In Docker Compose and Swarm, Docker only routes traffic to healthy containers. In standalone containers, health status is informational — Docker doesn't restart unhealthy containers by default (use --restart policy or Kubernetes).
RUN is a layer; chain commands to avoid intermediate layers storing deleted files:
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
.dockerignore aggressively.node:20-alpine3.19 not node:20 — pinned tags are smaller and reproducible.# Audit image size with dive:
dive myapp:latest # shows layer-by-layer waste
# Anti-pattern: using :latest
docker push myapp:latest # BAD — not reproducible, no audit trail
# Production tagging strategy:
# 1. Always tag with git SHA (immutable, traceable):
docker tag myapp:build myapp:${GIT_SHA}
# 2. Semantic version on release:
docker tag myapp:${GIT_SHA} myapp:2.1.0
docker tag myapp:${GIT_SHA} myapp:2.1 # minor version
docker tag myapp:${GIT_SHA} myapp:2 # major version
# 3. Environment tags (mutable, for CD systems):
docker tag myapp:${GIT_SHA} myapp:latest-staging
docker tag myapp:${GIT_SHA} myapp:latest-prod
# In Kubernetes: always use immutable digest or SHA tag:
image: myapp@sha256:abc123... # most secure — references specific bytes
Image scanning checks a Docker image for known CVEs in OS packages and application dependencies.
# Trivy (most popular, free, fast):
trivy image myapp:latest
trivy image --severity CRITICAL,HIGH myapp:latest # filter severity
trivy image --exit-code 1 --severity CRITICAL myapp:latest # fail CI on critical
# Example output:
# 2024-01-15T10:00:00Z INFO Vulnerability found
# ┌────────────────┬───────────────┬──────────┬────────┬───────┐
# │ Library │ Vulnerability │ Severity │ Status │ Fixed │
# ├────────────────┼───────────────┼──────────┼────────┼───────┤
# │ libssl3 │ CVE-2024-0727 │ HIGH │ fixed │ 3.0.13│
# └────────────────┴───────────────┴──────────┴────────┴───────┘
# Grype:
grype myapp:latest
# In CI (GitHub Actions):
- uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: table
exit-code: 1
severity: CRITICAL,HIGH
Also scan at: ECR (continuous scanning on push), Harbor (private registry with built-in scanning), Snyk Container.
Cosign (Sigstore project) cryptographically signs container images, enabling verification that an image was built by a trusted CI pipeline and hasn't been tampered with.
# Sign image in CI (keyless using OIDC — no key management):
cosign sign --yes myapp:${GIT_SHA}
# Signature stored in OCI registry alongside image
# Verify image before deploying:
cosign verify \
--certificate-identity https://github.com/myorg/myrepo/.github/workflows/ci.yml@refs/heads/main \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
myapp:${GIT_SHA}
# Kubernetes: enforce signature verification at admission (Kyverno policy):
# Only allow images that are signed by our CI pipeline
Why it matters — SolarWinds/Log4Shell lessons: An attacker who can push a malicious image to your registry (via compromised credentials) would silently deploy their image. With signing + admission verification, even a registry-push attacker can't get unsigned images into the cluster.
docker system prune and how do you manage disk space?Easy# Check Docker disk usage:
docker system df
# Remove stopped containers, unused networks, dangling images, build cache:
docker system prune
# Remove EVERYTHING including volumes:
docker system prune -a --volumes
# Specific cleanup:
docker container prune # remove all stopped containers
docker image prune # remove dangling images (no tag)
docker image prune -a # remove all unused images
docker volume prune # remove all unused volumes
docker network prune # remove unused networks
docker builder prune # remove build cache
# Production: automate with a daily cron:
# 0 2 * * * docker system prune -f --filter "until=48h"
CI build servers accumulate images rapidly. A build server without cleanup can fill a 500GB disk in days. Set up automated pruning or use a CI runner that recreates a clean environment per job.
docker network create --driver bridge my-network
docker run --network my-network --name api myapp
docker run --network my-network --name db postgres
# api can connect to db at hostname "db"
On user-defined bridge networks (not the default bridge network), Docker provides automatic DNS resolution — containers can reach each other by container name.
# Default bridge network: NO DNS resolution by name
docker run --name api myapp # can't reach "db" by name
# User-defined bridge: DNS resolution works
docker network create app-net
docker run -d --network app-net --name api myapp
docker run -d --network app-net --name db postgres:16
# Inside api container:
curl http://db:5432 # resolves to db container's IP
psql -h db -U myuser mydb # works!
# Network aliases:
docker run --network app-net --network-alias database postgres:16
# Can be reached as "database" even with a different container name
/var/lib/docker/volumes/. Best for: persisting data. Portable, can be backed up, shared between containers. Not dependent on directory structure of host.# Volume:
docker run -v mydata:/app/data myapp # named volume
docker run --mount type=volume,source=mydata,target=/app/data myapp
# Bind mount (dev hot-reload):
docker run -v $(pwd)/src:/app/src myapp # bind mount
docker run --mount type=bind,source=$(pwd)/src,target=/app/src myapp
# tmpfs:
docker run --tmpfs /tmp:rw,size=100m myapp
docker run --mount type=tmpfs,destination=/tmp myapp
# Method 1: Named volume (both containers mount the same volume)
docker volume create shared-data
docker run -v shared-data:/data --name producer myapp-producer
docker run -v shared-data:/data --name consumer myapp-consumer
# Method 2: --volumes-from (mount all volumes from another container)
docker run --name data-source -v /app/logs myapp
docker run --volumes-from data-source log-shipper
# Method 3: Bind mount to same host path (explicit host directory)
docker run -v /host/shared:/shared container-a
docker run -v /host/shared:/shared container-b
# In Docker Compose (recommended for dev):
services:
app:
volumes: [shared-logs:/app/logs]
log-shipper:
volumes: [shared-logs:/logs:ro] # read-only for shipper
volumes:
shared-logs:
# Map host port to container port:
docker run -p 8080:80 nginx # access at localhost:8080 → container port 80
docker run -p 443:443 nginx # same port number
docker run -p 127.0.0.1:8080:80 nginx # bind to localhost only (not 0.0.0.0)
# Dynamic host port (Docker assigns random host port):
docker run -p 80 nginx # Docker assigns e.g. host port 32768
docker port nginx # see the mapping
# Multiple ports:
docker run -p 8080:8080 -p 8443:8443 myapp
# EXPOSE in Dockerfile (documentation + -P flag):
EXPOSE 8080 # documents the port, doesn't publish it
docker run -P myapp # publishes all EXPOSE'd ports to random host ports
Port mapping is handled by iptables rules (Docker modifies iptables). For production at scale, use a reverse proxy (nginx, Traefik) or load balancer in front rather than direct port mappings per container.
docker logs and logging drivers?Medium# Default logging: json-file driver (stdout/stderr → JSON files on host)
docker logs my-container # tail logs
docker logs -f my-container # follow
docker logs --since 10m my-container # last 10 minutes
docker logs --tail 100 my-container # last 100 lines
Logging drivers route container logs to different destinations:
json-file (default) — writes to host disk. Set max-size and max-file to prevent disk fill.syslog — sends to system syslog daemon.journald — sends to systemd journal (Linux).awslogs — sends directly to CloudWatch Logs.fluentd / fluentbit — sends to Fluentd for aggregation.gelf — Graylog Extended Log Format (Graylog, Logstash).splunk — sends to Splunk HTTP Event Collector.docker run --log-driver awslogs \
--log-opt awslogs-group=/myapp/prod \
--log-opt awslogs-region=us-east-1 \
myapp:latest
docker logs — logs are lost when the container is removed. Use a logging driver or a sidecar (Fluentd) to ship logs to a centralised system.# Single env var:
docker run -e DB_HOST=postgres -e DB_PORT=5432 myapp
# From an env file (secure — file not stored in history):
docker run --env-file .env myapp
# .env file contents:
# DB_HOST=postgres
# DB_PASSWORD=mysecretpassword
# From host environment:
export DB_HOST=postgres
docker run -e DB_HOST myapp # passes current host value
# In Docker Compose:
services:
app:
environment:
- DB_HOST=db
- SPRING_PROFILES_ACTIVE=prod
env_file:
- .env.prod # loaded from file
Security: env vars are visible in docker inspect and /proc/1/environ inside the container. For sensitive values (passwords, keys), prefer mounting secrets as files (Docker secrets, K8s Secrets, Vault) — file access is easier to audit and restrict.
Docker Secrets (available in Swarm mode) store sensitive data encrypted at rest and in transit. Secrets are mounted as files in /run/secrets/ inside the container — never as env vars or image layers.
# Create a secret:
echo "mysupersecretpassword" | docker secret create db_password -
# Use in a Swarm service:
docker service create \
--secret db_password \
--env DB_PASSWORD_FILE=/run/secrets/db_password \
myapp:latest
# Application reads secret from file:
// In app code (Java):
String password = Files.readString(Path.of("/run/secrets/db_password")).trim();
Secrets are stored in etcd (Swarm's distributed store) encrypted. Only containers that are explicitly granted the secret can access it. Secret files are in-memory (tmpfs) — never written to disk on the container host.
# Memory limits:
docker run --memory=512m myapp # hard limit: OOMKilled if exceeded
docker run --memory-reservation=256m # soft limit: guaranteed, yield under pressure
docker run --memory-swap=512m # total memory+swap limit (= --memory = swap disabled)
# CPU limits:
docker run --cpus=0.5 myapp # use at most 0.5 CPU cores
docker run --cpus=2 myapp # use at most 2 cores
docker run --cpu-shares=512 myapp # relative weight (1024 = full share)
docker run --cpuset-cpus=0,2 myapp # pin to specific CPU cores 0 and 2
# Check resource usage:
docker stats # live resource usage for all containers
docker stats my-container # single container
Resource limits are implemented by Linux cgroups. Without limits, a single runaway container can starve all others on the host. Always set limits in production — especially memory, as OOM without limits can kill the Docker daemon itself or cause unpredictable host behaviour.
docker swarm init). Uses the same Docker CLI. Built-in load balancing. Less feature-rich.| Feature | Swarm | Kubernetes |
|---|---|---|
| Setup complexity | Minutes | Hours/days |
| Auto-scaling | Manual | HPA, VPA, KEDA |
| Rolling updates | Basic | Advanced (Argo Rollouts) |
| Storage | Volumes | PV/PVC/StorageClass/CSI |
| Industry adoption | Declining | Dominant |
Swarm is suitable for simple small-scale deployments where you want Docker primitives without K8s complexity. Kubernetes is the choice for production at scale.
Docker Compose defines and runs multi-container applications using a YAML file. One command starts all services with correct networking and dependencies.
docker compose up -d # start all services in background
docker compose down # stop and remove containers/networks
docker compose down -v # also remove volumes
docker compose logs -f api # follow logs for a service
docker compose ps # status of all services
docker compose exec api bash # shell into running container
docker compose restart api # restart one service
When to use:
docker stack deploy)Docker Compose V2 (default since Docker 23+) is a Go rewrite, much faster than V1 (Python). Command changed from docker-compose to docker compose (no hyphen).
services:
api:
image: myapp:${APP_VERSION:-latest}
build:
context: .
dockerfile: Dockerfile
target: runtime
ports: ["8080:8080"]
environment:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/myapp
SPRING_DATASOURCE_USERNAME: ${DB_USER}
SPRING_DATASOURCE_PASSWORD: ${DB_PASSWORD}
SPRING_REDIS_HOST: redis
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
restart: unless-stopped
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
db:
image: postgres:16-alpine
volumes: [pg-data:/var/lib/postgresql/data]
environment:
POSTGRES_DB: myapp
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d myapp"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --maxmemory 256mb
volumes: [redis-data:/data]
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
restart: unless-stopped
volumes:
pg-data:
redis-data:
depends_on and what are its limitations?Mediumservices:
api:
depends_on:
db:
condition: service_started # default: db container started (not ready!)
db:
image: postgres:16
Limitation: service_started only waits for the container to start, not for the service to be ready. Postgres takes ~5s to accept connections after starting.
# Fix: use service_healthy (requires healthcheck on the dependency):
depends_on:
db:
condition: service_healthy # waits until postgres healthcheck passes
redis:
condition: service_healthy
# OR: implement retry logic in your app (recommended for robustness):
# Spring Boot: spring.datasource.hikari.connection-timeout=30000
# Spring Retry on @Repository startup, etc.
# Base: docker-compose.yml (shared config)
services:
api:
image: myapp:${VERSION}
environment:
SPRING_PROFILES_ACTIVE: dev
# Override: docker-compose.override.yml (auto-merged for `docker compose up`)
services:
api:
build: . # build locally in dev
volumes:
- ./src:/app/src # hot-reload source
# Production: docker-compose.prod.yml
services:
api:
restart: always
deploy:
replicas: 3
# Explicit merge order:
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Test/CI:
docker compose -f docker-compose.yml -f docker-compose.test.yml up -d
docker compose exec api mvn test
docker compose down
# Scale a service to N replicas:
docker compose up -d --scale api=3
# Define scale in compose file:
services:
api:
image: myapp:latest
deploy:
replicas: 3
# Check scaled instances:
docker compose ps
# NAME IMAGE STATUS PORTS
# myapp-api-1 myapp Up
# myapp-api-2 myapp Up
# myapp-api-3 myapp Up
Load balancing when scaling: When multiple replicas run, Docker's internal DNS load-balances between them. However, for externally exposed ports (ports: mapping), you can't scale services that map to a fixed host port (port conflict). Use:
# docker-compose.test.yml
services:
app:
build: .
depends_on:
db:
condition: service_healthy
kafka:
condition: service_healthy
command: ["mvn", "verify", "-Pintegration"]
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: testdb
POSTGRES_USER: test
POSTGRES_PASSWORD: test
healthcheck:
test: ["CMD-SHELL", "pg_isready -U test"]
interval: 5s
retries: 10
kafka:
image: confluentinc/cp-kafka:7.6.1
environment:
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_NODE_ID: 1
healthcheck:
test: kafka-topics --list --bootstrap-server localhost:9092
interval: 10s
retries: 5
# In CI (GitHub Actions):
- name: Run integration tests
run: |
docker compose -f docker-compose.test.yml up \
--abort-on-container-exit \
--exit-code-from app
docker compose -f docker-compose.test.yml down -v
--abort-on-container-exit: stops all containers when any exits. --exit-code-from app: CI fails if the app container exits with non-zero.
By default, containers run as root (UID 0) inside the container. If the container escapes (via a kernel vulnerability or misconfigured mount), the attacker has root on the host.
Risks:
# Fix 1: Create and use a non-root user in Dockerfile:
RUN groupadd --gid 1001 appgroup && \
useradd --uid 1001 --gid appgroup --no-create-home appuser
USER appuser
# Fix 2: Run as non-root at docker run:
docker run --user 1001:1001 myapp
# Fix 3: Enable user namespaces (remap root in container to non-root on host):
# In /etc/docker/daemon.json:
{"userns-remap": "default"}
# Fix 4: Kubernetes securityContext:
securityContext:
runAsNonRoot: true
runAsUser: 1001
A privileged container (--privileged) has almost all host capabilities — it can access all devices, load kernel modules, modify host network settings. It effectively has root on the host.
# Never do this unless absolutely required:
docker run --privileged myapp # DANGEROUS
# Legitimate uses (rare):
# - Docker-in-Docker (DinD) for CI runners that build Docker images
# - Security tools that need direct hardware/kernel access (Falco with kernel module)
# - CNI plugin setup during K8s node initialisation
# Safer alternatives:
# If you need specific capabilities, add only those:
docker run --cap-add NET_ADMIN --cap-drop ALL mynetwork-tool
# If you need device access, mount only specific device:
docker run --device /dev/sda:/dev/sda myapp
In Kubernetes: reject privileged containers using Pod Security Admission (Restricted or Baseline policy). Policy-as-Code tools (OPA, Kyverno) can audit and block privileged container requests.
Linux capabilities divide root privileges into discrete units. Docker containers start with a limited set of capabilities. You can add or remove specific capabilities rather than granting full root.
# Default capabilities Docker grants (partial list):
# CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE, KILL, MKNOD
# Production security: drop ALL capabilities, add only what's needed:
docker run \
--cap-drop ALL \
--cap-add NET_BIND_SERVICE \ # needed to bind to port 80
mywebserver
# Dockerfile security context:
# (enforcement via Kubernetes securityContext):
securityContext:
capabilities:
drop: ["ALL"]
add: ["NET_BIND_SERVICE"]
# Common capabilities you might need:
# NET_BIND_SERVICE: bind to ports < 1024 (just run on 8080 instead)
# SYS_PTRACE: needed for debugging tools (not in prod)
# NET_ADMIN: network configuration (only for network tools)
Seccomp (Secure Computing Mode) filters which Linux system calls a container can make. Docker applies a default seccomp profile that blocks ~44 syscalls rarely needed by containers but dangerous if exploited.
# Docker applies the default seccomp profile automatically.
# To run without seccomp (NOT recommended):
docker run --security-opt seccomp=unconfined myapp
# Apply a custom seccomp profile:
docker run --security-opt seccomp=/path/to/my-profile.json myapp
# Example: block ptrace (prevents process inspection):
{
"syscalls": [
{"names": ["ptrace", "process_vm_readv"], "action": "SCMP_ACT_ERRNO"}
]
}
# In Kubernetes (K8s 1.19+ applies RuntimeDefault by default):
securityContext:
seccompProfile:
type: RuntimeDefault # use container runtime's default seccomp profile
Combined with AppArmor or SELinux: defence-in-depth for container security. If an attacker exploits a vulnerability in your app, seccomp limits what they can do with that code execution.
Docker Content Trust uses Notary (TUF-based) to sign and verify images. When enabled, Docker only pulls/runs signed images.
# Enable DCT for the current shell:
export DOCKER_CONTENT_TRUST=1
# Now pull will verify the image is signed:
docker pull myapp:1.0 # fails if not signed
# Sign an image on push:
docker trust sign myapp:1.0 # prompts for signing key passphrase
# Inspect trust data:
docker trust inspect --pretty myapp:1.0
# Revoke a compromised image version:
docker trust revoke myapp:1.0
DCT (Notary v1) is being superseded by Cosign (Sigstore) for new deployments. Cosign doesn't require a separate Notary server — signatures are stored in the OCI registry itself alongside the image.
# Run container with read-only root filesystem:
docker run --read-only myapp
# App needs to write somewhere? Mount specific writable directories:
docker run --read-only \
--tmpfs /tmp:rw,size=50m \ # temp files in memory
-v /app/logs:/app/logs \ # persistent logs as volume
myapp
# Kubernetes equivalent:
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {} # or emptyDir: {medium: Memory}
Security benefit:
# Step 1: check restart count and last exit code
docker inspect my-container | grep -A 5 '"State"'
# Look for: "ExitCode": 137 (OOMKilled), 1 (app error), 0 (unexpected exit)
# Step 2: check logs from the failed run
docker logs my-container
docker logs my-container --previous # previous (crashed) run's logs
# Step 3: check events
docker events --filter container=my-container
# Common exit codes:
# 0 → container exited cleanly (bad if restart=always intended continuous service)
# 1 → application error
# 127 → command not found (wrong ENTRYPOINT path or missing binary)
# 130 → SIGINT (Ctrl+C)
# 137 → SIGKILL (OOMKilled — memory limit exceeded, or docker kill)
# 139 → Segfault
# 143 → SIGTERM (graceful termination)
# Run with no restart to debug:
docker run --rm -it myapp bash # override entrypoint to inspect
docker run --rm -it --entrypoint /bin/sh myapp
The Docker daemon socket (/var/run/docker.sock) is effectively root on the host. Anyone with access can run privileged containers, mount the host filesystem, and escape to the host.
# NEVER mount the Docker socket in a container unless absolutely necessary:
docker run -v /var/run/docker.sock:/var/run/docker.sock myapp # DANGEROUS
# If you must (e.g. CI runners building Docker images), alternatives:
# 1. Docker-in-Docker (DinD): dedicated container with --privileged
# - Still privileged, but isolated
# 2. Kaniko: builds Docker images without the daemon (runs as non-root)
# docker run gcr.io/kaniko-project/executor:latest
# 3. Buildah: rootless builds
# 4. img (Rootless Docker build)
# Protect the daemon:
# - Enable TLS for remote Docker API access
# - Use --tlsverify, --tlscacert, --tlscert, --tlskey
# - Restrict who can add users to the "docker" group (= root)
# - Enable rootless mode (dockerd runs as non-root user)
# - Use user namespaces (userns-remap)
Container escape allows an attacker to break out of container isolation and access the host or other containers. Containers share the host kernel — a kernel vulnerability can be exploited from inside a container.
Known escape vectors:
Prevention layers:
docker save vs docker export?Easydocker save — exports an image (with all layers and metadata) to a tar file. Can be reloaded with docker load. Preserves the full image history, tags, and layers.docker export — exports a running or stopped container's filesystem as a flat tar (no layers, no history). Reloaded with docker import. Loses image history and metadata.# Save an image for air-gapped transfer:
docker save myapp:1.0 | gzip > myapp-1.0.tar.gz
# Transfer to air-gapped machine:
docker load < myapp-1.0.tar.gz
# Export a container's current filesystem state:
docker export my-container > container-snapshot.tar
cat container-snapshot.tar | docker import - mynewimage:latest
# Use case for export: snapshot a modified container that you want to
# turn into a new image (rare — prefer building from Dockerfile)
Graceful shutdown: the container finishes in-flight requests before terminating. Docker sends SIGTERM → waits stopTimeout → sends SIGKILL.
# Requirements:
# 1. Process must be PID 1 (to receive SIGTERM)
# 2. App must handle SIGTERM
# Exec form ensures PID 1 is your app (not shell):
ENTRYPOINT ["java", "-jar", "app.jar"] # java is PID 1
# Shell form means /bin/sh is PID 1 (shell doesn't forward SIGTERM!):
ENTRYPOINT java -jar app.jar # BAD — java never receives SIGTERM
# Spring Boot: set graceful shutdown in application.properties:
server.shutdown=graceful
spring.lifecycle.timeout-per-shutdown-phase=30s
# Docker stop timeout (default 10s):
docker stop --time 60 my-container # give 60s to shutdown
# Or in compose:
services:
api:
stop_grace_period: 60s
# Use tini for proper signal handling (multi-process containers):
FROM mybase
RUN apt-get install -y tini
ENTRYPOINT ["/usr/bin/tini", "--", "java", "-jar", "app.jar"]
Docker CLI / Kubernetes kubelet
↓ gRPC (CRI - Container Runtime Interface)
containerd ← high-level runtime (image pull, snapshot management)
↓ OCI runtime spec
runc ← low-level runtime (actually creates namespaces/cgroups, starts process)
↓
Linux kernel (namespaces, cgroups, seccomp)
Alternative OCI runtimes (replace runc):