Kubernetes Deployments: Rolling Updates, Rollbacks and Scaling (2026)

Deployments are the workhorse of Kubernetes for running stateless applications. They wrap ReplicaSets with rich update semantics — declarative rolling updates, one-command rollbacks, revision history, and horizontal scaling. Understanding every knob on the Deployment spec separates engineers who can confidently ship to production from those who cause incidents during deploys.

Deployment Spec Deep Dive
Rolling Update Strategy
Rollbacks and History
Manual and Automatic Scaling
Recreate vs Rolling Update
Canary Deployment Pattern
Blue/Green Deployments
Frequently Asked Questions

Deployment Spec Deep Dive

A Deployment manages a desired number of identical pod replicas. Under the hood it creates and manages ReplicaSets — when you update the pod template, a new ReplicaSet is created and the old one is scaled down.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
  labels:
    app: api-server
spec:
  replicas: 4

  # Pods are selected by this label — must match pod template labels
  selector:
    matchLabels:
      app: api-server

  # Retain last 10 ReplicaSets for rollback history
  revisionHistoryLimit: 10

  # How long to wait for a pod to become ready before marking the rollout as failed
  progressDeadlineSeconds: 600

  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Allow 1 extra pod above replicas during update
      maxUnavailable: 0  # Never reduce below desired replica count

  template:
    metadata:
      labels:
        app: api-server
      annotations:
        # Trigger a rolling restart when a ConfigMap changes by updating this annotation
        checksum/config: "abc123"
    spec:
      containers:
      - name: api
        image: myrepo/api-server:2.1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          periodSeconds: 10
          failureThreshold: 3

Rolling Update Strategy

The rolling update strategy replaces old pods with new ones gradually, keeping the application available throughout. The two key parameters are:

maxSurge — how many extra pods above replicas can exist during the update. Can be an integer or percentage (e.g., 25%).
maxUnavailable — how many pods below replicas are acceptable during the update. Can also be a percentage.

Production recommendation: Set maxUnavailable: 0 and maxSurge: 1 (or 25%). This ensures your service never drops below full capacity during a deploy. The trade-off is you need capacity for one extra pod temporarily. For cost-sensitive clusters where capacity is tight, use maxSurge: 0, maxUnavailable: 1 — this does a replace-before-terminate cycle.

# Trigger a rolling update by changing the image
kubectl set image deployment/api-server api=myrepo/api-server:2.2.0 -n production

# Watch the rollout progress
kubectl rollout status deployment/api-server -n production
# Waiting for deployment "api-server" rollout to finish: 1 out of 4 new replicas have been updated...
# Waiting for deployment "api-server" rollout to finish: 2 out of 4 new replicas have been updated...
# Waiting for deployment "api-server" rollout to finish: 3 out of 4 new replicas have been updated...
# Waiting for deployment "api-server" rollout to finish: 1 old replicas are pending termination...
# deployment "api-server" successfully rolled out

# See which ReplicaSets exist (old ones are scaled to 0)
kubectl get rs -n production -l app=api-server
# NAME                     DESIRED   CURRENT   READY   AGE
# api-server-7d9f8b6c4d   4         4         4       2m   <-- new
# api-server-6c8b7a5d3e   0         0         0       3d   <-- old

Rollbacks and History

Kubernetes keeps a configurable number of old ReplicaSets to enable instant rollbacks. No re-building of images required — the old ReplicaSet already has the correct pod template.

# View rollout history
kubectl rollout history deployment/api-server -n production
# REVISION  CHANGE-CAUSE
# 1         Initial deployment
# 2         Update to v2.1.0
# 3         Update to v2.2.0

# Annotate the reason for a change (shows up in history)
kubectl annotate deployment/api-server kubernetes.io/change-cause="Update to v2.2.0" -n production

# Inspect a specific revision
kubectl rollout history deployment/api-server --revision=2 -n production

# Rollback to previous revision
kubectl rollout undo deployment/api-server -n production

# Rollback to a specific revision
kubectl rollout undo deployment/api-server --to-revision=1 -n production

# Watch the rollback
kubectl rollout status deployment/api-server -n production

Note: revisionHistoryLimit defaults to 10. Each retained revision keeps a ReplicaSet object in etcd. For clusters with many deployments and frequent deploys, reduce this to 3—5 to keep etcd lean.

Manual and Automatic Scaling

Manual Scaling

# Scale a deployment
kubectl scale deployment/api-server --replicas=8 -n production

# Conditional scale (only if current replicas match)
kubectl scale deployment/api-server --replicas=8 --current-replicas=4 -n production

Horizontal Pod Autoscaler (HPA)

HPA automatically adjusts replicas based on observed metrics. The most common trigger is CPU utilisation, but custom metrics (from Prometheus) and external metrics are also supported.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 4
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60   # target 60% CPU across all pods
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # wait 5 min before scaling down
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60   # remove at most 2 pods per minute
    scaleUp:
      stabilizationWindowSeconds: 0    # scale up immediately
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60   # add at most 4 pods per minute

# Check HPA status
kubectl get hpa -n production
# NAME             REFERENCE              TARGETS          MINPODS  MAXPODS  REPLICAS
# api-server-hpa   Deployment/api-server  45%/60%, 30%/70% 4        20       4

Recreate vs Rolling Update

The Recreate strategy terminates all existing pods before creating new ones. This causes downtime but is necessary when your new version is incompatible with the old one (e.g., incompatible database schema changes that cannot run with the old code simultaneously).

strategy:
  type: Recreate
# No rollingUpdate block needed

Use Recreate for:

Stateful apps that cannot run two versions concurrently
Apps that grab an exclusive lock on startup
Batch jobs where a clean slate is required

Canary Deployment Pattern

A canary release routes a small percentage of traffic to the new version while the majority still goes to the stable version. In Kubernetes without a service mesh, you approximate this with two Deployments sharing the same Service selector.

# stable deployment — 9 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-stable
  namespace: production
spec:
  replicas: 9
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        track: stable
    spec:
      containers:
      - name: api
        image: myrepo/api-server:2.1.0
---
# canary deployment — 1 replica (~10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-canary
  namespace: production
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        track: canary
    spec:
      containers:
      - name: api
        image: myrepo/api-server:2.2.0
---
# Single Service selects pods from BOTH deployments
apiVersion: v1
kind: Service
metadata:
  name: api-server
  namespace: production
spec:
  selector:
    app: api-server   # matches both stable and canary pods
  ports:
  - port: 80
    targetPort: 8080

Tip: For fine-grained traffic splitting (e.g., exactly 5% to canary), use a service mesh like Istio or Linkerd with VirtualService/TrafficSplit resources, or use NGINX Ingress with canary annotations. The multi-deployment approach is only approximate (based on pod count ratio).

Blue/Green Deployments

In a blue/green deployment, both versions run simultaneously but only one receives traffic. Switching over is instant — you just update the Service selector. If the new version has issues, switching back is equally instant.

# Blue deployment (currently live)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-blue
  namespace: production
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api-server
      slot: blue
  template:
    metadata:
      labels:
        app: api-server
        slot: blue
    spec:
      containers:
      - name: api
        image: myrepo/api-server:2.1.0
---
# Green deployment (new version, warming up)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-green
  namespace: production
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api-server
      slot: green
  template:
    metadata:
      labels:
        app: api-server
        slot: green
    spec:
      containers:
      - name: api
        image: myrepo/api-server:2.2.0
---
# Service — switch by changing slot: blue to slot: green
apiVersion: v1
kind: Service
metadata:
  name: api-server
  namespace: production
spec:
  selector:
    app: api-server
    slot: blue    # change to 'green' to cut over
  ports:
  - port: 80
    targetPort: 8080

# Instant cutover: patch the Service selector
kubectl patch service api-server -n production \
  -p '{"spec":{"selector":{"app":"api-server","slot":"green"}}}'

# Instant rollback if issues found
kubectl patch service api-server -n production \
  -p '{"spec":{"selector":{"app":"api-server","slot":"blue"}}}'

Frequently Asked Questions

What is the difference between a Deployment and a ReplicaSet?

A ReplicaSet ensures a specified number of pod replicas are running at all times, but it has no concept of update history or rolling updates. A Deployment wraps and manages ReplicaSets, adding declarative update semantics, rollback capability, and revision history. You should almost always use Deployments rather than creating ReplicaSets directly.

How do I force a rolling restart without changing the image?

Use kubectl rollout restart deployment/api-server -n production. This triggers a rolling restart by adding a kubectl.kubernetes.io/restartedAt annotation to the pod template, which counts as a template change and triggers a new rollout.

My deployment is stuck with some old pods still running. What do I check?

First run kubectl rollout status deployment/api-server to see if it's progressing. Then check kubectl describe deployment/api-server for conditions. Common causes: new pods failing readiness probes (so the rollout doesn't proceed), insufficient cluster resources to schedule new pods, or an image pull error. Check new pod logs with kubectl logs and events with kubectl get events.

Can HPA and manual scaling conflict?

Yes. If you manually set replicas with kubectl scale and HPA is active, HPA will override your manual setting on its next evaluation cycle (every 15 seconds by default). Treat HPA as the authority for replica count when it's enabled. If you need to temporarily override it (e.g., to scale to zero during maintenance), pause the HPA first.

What is progressDeadlineSeconds and why does it matter?

progressDeadlineSeconds (default 600 seconds) is the maximum time Kubernetes will wait for a Deployment rollout to make progress before marking it as failed with a ProgressDeadlineExceeded condition. This is important for CI/CD pipelines — you can use kubectl rollout status --timeout=10m to block the pipeline until the rollout succeeds or fails, rather than polling indefinitely.

Kubernetes Deployments: Rolling Updates, Rollbacks and Scaling (2026)

Table of Contents

Deployment Spec Deep Dive

Rolling Update Strategy

Rollbacks and History

Manual and Automatic Scaling

Manual Scaling

Horizontal Pod Autoscaler (HPA)

Recreate vs Rolling Update

Canary Deployment Pattern

Blue/Green Deployments

Frequently Asked Questions

What is the difference between a Deployment and a ReplicaSet?

How do I force a rolling restart without changing the image?

My deployment is stuck with some old pods still running. What do I check?

Can HPA and manual scaling conflict?

What is progressDeadlineSeconds and why does it matter?

Read Next

Services and Networking

HPA and VPA: Auto Scaling

Kubernetes Articles