Kubernetes KEDA: Event-Driven Autoscaling Guide

KEDA (Kubernetes Event-Driven Autoscaling) extends Kubernetes with the ability to scale workloads based on the depth of external event queues, message brokers, and monitoring metrics — far beyond what the native HPA's CPU and memory metrics can offer. With KEDA, you can scale a Kafka consumer deployment to zero when no messages are waiting, then burst to dozens of replicas as the queue depth grows, all without writing a single custom metric adapter. KEDA is a CNCF graduated project and integrates with over 60 scalers including Kafka, AWS SQS, Azure Service Bus, Redis Streams, Prometheus, and more.

KEDA Architecture and How It Works

KEDA installs into your cluster as a set of controllers that sit alongside and extend the standard Kubernetes HPA. The architecture consists of three key components:

  • KEDA Operator — watches for ScaledObject and ScaledJob resources. When it finds one, it creates a corresponding HPA object and sets the HPA's externalMetrics source to the KEDA Metrics Adapter.
  • KEDA Metrics Adapter — implements the Kubernetes External Metrics API. The HPA queries this adapter for the current metric value (e.g., Kafka lag). The adapter polls the external event source (Kafka, SQS, Redis) and returns the current value.
  • KEDA Webhooks — validating and mutating webhooks that validate ScaledObject resources at admission time.

Because KEDA creates a real HPA under the hood, all HPA features — cooldown periods, scale down stabilization, min/max replicas — apply automatically. KEDA simply provides a rich external metric feed that the HPA acts on. This also means KEDA is fully compatible with existing HPA tooling and dashboards.

Scale to zero: Standard Kubernetes HPA cannot scale a deployment to zero replicas. KEDA adds this capability — when the metric value drops to zero (e.g., empty queue), KEDA can scale the deployment to zero, saving resources completely. When new events arrive, KEDA scales back up before they are processed.

Installing KEDA with Helm

The official KEDA Helm chart is the recommended installation method for production clusters. KEDA installs into its own namespace and requires no changes to existing workloads.

# Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update

# Install KEDA
helm upgrade --install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --version 2.14.0

# Verify installation
kubectl get pods -n keda
# Expected:
# keda-operator-xxx                      1/1   Running
# keda-operator-metrics-apiserver-xxx    1/1   Running
# keda-webhooks-xxx                      1/1   Running

# Check KEDA CRDs
kubectl get crd | grep keda.sh

KEDA supports high-availability mode for the operator and metrics server. For production, run two replicas of each with pod anti-affinity rules:

helm upgrade --install keda kedacore/keda \
  --namespace keda \
  --set operator.replicaCount=2 \
  --set metricsServer.replicaCount=2 \
  --set webhooks.replicaCount=2

ScaledObject: The Core KEDA Resource

A ScaledObject links a Kubernetes workload (Deployment, StatefulSet, or any custom resource that implements the scale subresource) to one or more external scalers. KEDA creates an HPA based on the ScaledObject configuration.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-processor
  pollingInterval: 15      # Check metric every 15 seconds
  cooldownPeriod: 30       # Wait 30s before scaling down to zero
  minReplicaCount: 0       # Allow scale to zero
  maxReplicaCount: 50
  advanced:
    restoreToOriginalReplicaCount: true   # Restore replicas when ScaledObject is deleted
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 120
          policies:
            - type: Percent
              value: 25
              periodSeconds: 60
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.production.svc:9092
        consumerGroup: order-processor-group
        topic: orders
        lagThreshold: "50"    # Scale up when lag per replica exceeds 50

Scaling on Kafka Consumer Lag

The Kafka scaler is one of KEDA's most popular integrations. It queries Kafka consumer group lag and scales the consumer deployment to maintain a target lag per replica. When the total lag is 500 messages and your lagThreshold is 50, KEDA targets 10 replicas.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-headless.kafka.svc.cluster.local:9092
        consumerGroup: my-consumer-group
        topic: events
        lagThreshold: "100"
        offsetResetPolicy: latest
      authenticationRef:
        name: kafka-trigger-auth

---
# TriggerAuthentication for SASL/SSL Kafka
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-trigger-auth
  namespace: production
spec:
  secretTargetRef:
    - parameter: sasl
      name: kafka-secrets
      key: sasl-mechanism
    - parameter: username
      name: kafka-secrets
      key: username
    - parameter: password
      name: kafka-secrets
      key: password
    - parameter: tls
      name: kafka-secrets
      key: tls

Scaling on AWS SQS Queue Depth

The AWS SQS scaler reads the ApproximateNumberOfMessages attribute from an SQS queue and scales your deployment to process the backlog at a target rate. KEDA requires AWS credentials via environment variables, IRSA (IAM Roles for Service Accounts), or a TriggerAuthentication object.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: sqs-worker
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-trigger-auth
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
        queueLength: "10"      # Messages per replica
        awsRegion: us-east-1
        scaleOnInFlight: "true"   # Include in-flight messages in count

---
# TriggerAuthentication using IRSA (recommended for EKS)
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-trigger-auth
  namespace: production
spec:
  podIdentity:
    provider: aws-eks
IRSA setup: For EKS, annotate your worker's ServiceAccount with eks.amazonaws.com/role-arn pointing to an IAM role that has sqs:GetQueueAttributes and sqs:ReceiveMessage permissions. Using IRSA avoids storing AWS credentials as Kubernetes Secrets.

Scaling on Prometheus Metrics

The Prometheus scaler evaluates a PromQL query and scales based on the numeric result. This is the most flexible KEDA scaler because Prometheus aggregates metrics from virtually any source — HTTP request rate, database connection pool usage, GPU utilization, and more.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-request-rate-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 2
  maxReplicaCount: 40
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: http_requests_per_second
        threshold: "200"    # Scale up when metric exceeds 200 per replica
        query: |
          sum(rate(http_requests_total{namespace="production",app="api-server"}[2m]))

Scale to Zero and ScaledJob

One of KEDA's standout features is the ability to scale a Deployment to exactly zero replicas when there is no work, then scale back up when events arrive. This is ideal for batch processors, webhook handlers, and nightly ETL jobs that don't need to run continuously.

For truly batch workloads, ScaledJob is a better fit than ScaledObject. Instead of maintaining a long-running Deployment, ScaledJob creates Kubernetes Jobs on demand — one Job per batch of events:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: report-generator
  namespace: production
spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    template:
      spec:
        containers:
          - name: report-generator
            image: myrepo/report-generator:latest
            env:
              - name: QUEUE_URL
                value: https://sqs.us-east-1.amazonaws.com/123456789/reports
        restartPolicy: Never
  pollingInterval: 30
  maxReplicaCount: 10
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-trigger-auth
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/reports
        queueLength: "1"
        awsRegion: us-east-1

Production Tips and Troubleshooting

Common issues and how to resolve them when running KEDA in production:

  • ScaledObject not scaling — Check KEDA operator logs: kubectl logs -n keda -l app=keda-operator. Common causes are authentication failures to the event source, incorrect metric names, or the target deployment not existing.
  • Slow scale-up — Reduce pollingInterval. The default is 30 seconds; setting it to 5–10 seconds makes KEDA react faster but increases load on the event source API.
  • Flapping replicas — Add a scaleDown.stabilizationWindowSeconds in the HPA behavior config. Without stabilization, KEDA may oscillate between replica counts when the metric hovers near a threshold.
  • Scale-to-zero latency — When scaling from zero, there is a cold-start delay while pods are scheduled and initialized. For latency-sensitive workloads, consider minReplicaCount: 1 instead of 0 to keep at least one pod warm.
# Check ScaledObject status
kubectl describe scaledobject order-processor-scaler -n production

# View the HPA KEDA created
kubectl get hpa -n production

# Check KEDA metrics adapter
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .