Kubernetes Rolling vs Blue-Green vs Canary Deployments

Choosing the right deployment strategy is one of the most impactful decisions for production reliability. Kubernetes natively supports rolling updates out of the box, and with a bit of additional tooling you can implement blue-green switches and canary releases with fine-grained traffic control. Each strategy has a different risk profile, resource cost, and rollback speed — understanding the trade-offs ensures you pick the right approach for each workload.

Strategy Comparison Overview

StrategyDowntimeResource CostRollback SpeedRisk Exposure
Rolling UpdateZero (with probes)Low (+25% peak)Slow (reverse roll)Gradual — all users get new version progressively
Blue-GreenZeroHigh (2x resources)Instant (flip Service)All-or-nothing — instant full traffic switch
CanaryZeroLow (+canary pods)Fast (remove canary)Controlled — only % of traffic hits new version
RecreateBrief downtimeNoneMediumAll users down during replacement

The right choice depends on three factors: how quickly you need to roll back if something goes wrong, whether you can tolerate temporary dual-version traffic, and how much extra infrastructure cost you can absorb during the deploy window.

Rolling Updates: Native Kubernetes

The rolling update is Kubernetes' default deployment strategy. It replaces old pods with new ones gradually — a few at a time — so the service stays up throughout. The key configuration fields are maxUnavailable (how many old pods can be down simultaneously) and maxSurge (how many extra pods can run simultaneously).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0     # Never take a pod down until a new one is Ready
      maxSurge: 3           # Run up to 3 extra pods during the rollout
  template:
    spec:
      containers:
        - name: api-server
          image: myrepo/api-server:1.5.0
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

With maxUnavailable: 0 and maxSurge: 3, Kubernetes starts 3 new pods first. Only when they pass their readiness probe does it terminate 3 old pods, then repeat until all pods are updated. This guarantees zero requests are dropped as long as the readiness probe accurately reflects the pod's ability to serve traffic.

Readiness probes are mandatory for zero-downtime rolling updates. Without a readiness probe, Kubernetes adds new pods to the Service endpoints immediately on start — before the app has finished initializing. This causes a burst of 502/503 errors during every rolling deploy.

Tuning Rolling Update Behaviour

Two additional fields fine-tune how the rolling update behaves under failure conditions:

spec:
  progressDeadlineSeconds: 300   # Mark as failed if rollout doesn't progress in 5 min
  minReadySeconds: 15            # Pod must be Ready for 15s before counting as available
  revisionHistoryLimit: 5        # Keep last 5 ReplicaSets for rollback
# Monitor a rolling update in real time
kubectl rollout status deployment/api-server -n production --timeout=10m

# Pause a rollout (to investigate mid-rollout issues)
kubectl rollout pause deployment/api-server -n production

# Resume after investigation
kubectl rollout resume deployment/api-server -n production

# Roll back to previous version instantly
kubectl rollout undo deployment/api-server -n production

# Roll back to a specific revision
kubectl rollout history deployment/api-server -n production
kubectl rollout undo deployment/api-server --to-revision=4 -n production

Blue-Green Deployments

In a blue-green deployment, you run two complete copies of the application simultaneously — the current "blue" version and the new "green" version. Traffic switches from blue to green by updating the Service selector. Rollback is equally instant: switch the selector back. The cost is running 2x your normal pod count during the deploy window.

# Blue deployment (current production version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-blue
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: api-server
      slot: blue
  template:
    metadata:
      labels:
        app: api-server
        slot: blue
    spec:
      containers:
        - name: api-server
          image: myrepo/api-server:1.4.0

---
# Green deployment (new version — not receiving traffic yet)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-green
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: api-server
      slot: green
  template:
    metadata:
      labels:
        app: api-server
        slot: green
    spec:
      containers:
        - name: api-server
          image: myrepo/api-server:1.5.0

---
# Service points at blue initially
apiVersion: v1
kind: Service
metadata:
  name: api-server
  namespace: production
spec:
  selector:
    app: api-server
    slot: blue       # Change to "green" to switch traffic
  ports:
    - port: 80
      targetPort: 8080
# Switch traffic from blue to green (instant, no pod restarts)
kubectl patch service api-server -n production \
  -p '{"spec":{"selector":{"slot":"green"}}}'

# Verify traffic is flowing to green pods
kubectl get endpoints api-server -n production

# Rollback: switch back to blue instantly
kubectl patch service api-server -n production \
  -p '{"spec":{"selector":{"slot":"blue"}}}'

# After confirming green is stable, scale down blue
kubectl scale deployment api-server-blue --replicas=0 -n production

Canary Deployments with Traffic Splitting

A canary deployment routes a small percentage of traffic to the new version while most traffic continues hitting the stable version. You observe error rates, latency, and business metrics on the canary before promoting it to 100%.

The simplest Kubernetes-native canary uses a single Service with two Deployments at different replica counts. Traffic split is proportional to the pod count ratio.

# Stable: 9 replicas = 90% of traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-stable
  namespace: production
spec:
  replicas: 9
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        track: stable
    spec:
      containers:
        - name: api-server
          image: myrepo/api-server:1.4.0

---
# Canary: 1 replica = 10% of traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server-canary
  namespace: production
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        track: canary
    spec:
      containers:
        - name: api-server
          image: myrepo/api-server:1.5.0

---
# Single service selects BOTH stable and canary pods
apiVersion: v1
kind: Service
metadata:
  name: api-server
  namespace: production
spec:
  selector:
    app: api-server   # Selects pods from BOTH deployments
  ports:
    - port: 80
      targetPort: 8080
# Promote canary: increase canary replicas, decrease stable
kubectl scale deployment api-server-canary --replicas=5 -n production
kubectl scale deployment api-server-stable --replicas=5 -n production
# Now 50/50 split

# Full promotion
kubectl scale deployment api-server-canary --replicas=10 -n production
kubectl scale deployment api-server-stable --replicas=0 -n production

Advanced Strategies with Argo Rollouts

Argo Rollouts extends Kubernetes with a Rollout CRD that provides automated canary analysis, blue-green with preview services, and integration with traffic management tools like Istio, Traefik, and NGINX for precise percentage-based traffic splitting without relying on replica counts.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: api-server
  template:
    spec:
      containers:
        - name: api-server
          image: myrepo/api-server:1.5.0
  strategy:
    canary:
      stableService: api-server-stable
      canaryService: api-server-canary
      trafficRouting:
        nginx:
          stableIngress: api-server-ingress
      steps:
        - setWeight: 5          # 5% traffic to canary
        - pause: {duration: 5m}
        - setWeight: 20
        - pause:
            duration: 10m
        - analysis:
              templates:
                - templateName: success-rate-check
        - setWeight: 50
        - pause: {duration: 5m}
        - setWeight: 100

The analysis step queries Prometheus metrics during the canary phase and automatically rolls back if error rates or latency exceed defined thresholds — fully automating the "is this canary safe to promote?" decision.

Recreate Strategy for Stateful Workloads

The Recreate strategy terminates all existing pods before starting new ones. This causes a brief downtime window but is necessary for workloads where two versions cannot run simultaneously — typically single-instance stateful applications that acquire exclusive locks on shared resources.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: db-migration-runner
spec:
  replicas: 1
  strategy:
    type: Recreate    # Terminate old pod before starting new one
  template:
    spec:
      containers:
        - name: migration-runner
          image: myrepo/migrations:2.0.0

Choosing the Right Strategy

Decision guide for production deployments:

  • Use Rolling Update when your service is stateless, multiple versions can run simultaneously, and you want the simplest approach. This covers 80% of microservices.
  • Use Blue-Green when you need instant rollback capability and can afford 2x resource cost. Ideal for infrequent, high-risk releases (major versions, database schema changes accompanied by app changes).
  • Use Canary when you deploy frequently and want to validate new versions against real production traffic before full rollout. Pair with Argo Rollouts for automated analysis.
  • Use Recreate when your application cannot run two versions simultaneously — single-instance stateful apps, database migration jobs, or apps that acquire exclusive file/socket locks.
Combine strategies: Many teams use rolling updates for day-to-day deploys and switch to blue-green or canary for major releases. Argo Rollouts makes it easy to define different strategies per workload without rebuilding your deployment pipeline.