Kubernetes KEDA: Event-Driven Autoscaling Guide
KEDA (Kubernetes Event-Driven Autoscaling) extends Kubernetes with the ability to scale workloads based on the depth of external event queues, message brokers, and monitoring metrics — far beyond what the native HPA's CPU and memory metrics can offer. With KEDA, you can scale a Kafka consumer deployment to zero when no messages are waiting, then burst to dozens of replicas as the queue depth grows, all without writing a single custom metric adapter. KEDA is a CNCF graduated project and integrates with over 60 scalers including Kafka, AWS SQS, Azure Service Bus, Redis Streams, Prometheus, and more.
Table of Contents
KEDA Architecture and How It Works
KEDA installs into your cluster as a set of controllers that sit alongside and extend the standard Kubernetes HPA. The architecture consists of three key components:
- KEDA Operator — watches for
ScaledObjectandScaledJobresources. When it finds one, it creates a corresponding HPA object and sets the HPA'sexternalMetricssource to the KEDA Metrics Adapter. - KEDA Metrics Adapter — implements the Kubernetes External Metrics API. The HPA queries this adapter for the current metric value (e.g., Kafka lag). The adapter polls the external event source (Kafka, SQS, Redis) and returns the current value.
- KEDA Webhooks — validating and mutating webhooks that validate ScaledObject resources at admission time.
Because KEDA creates a real HPA under the hood, all HPA features — cooldown periods, scale down stabilization, min/max replicas — apply automatically. KEDA simply provides a rich external metric feed that the HPA acts on. This also means KEDA is fully compatible with existing HPA tooling and dashboards.
Installing KEDA with Helm
The official KEDA Helm chart is the recommended installation method for production clusters. KEDA installs into its own namespace and requires no changes to existing workloads.
# Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
# Install KEDA
helm upgrade --install keda kedacore/keda \
--namespace keda \
--create-namespace \
--version 2.14.0
# Verify installation
kubectl get pods -n keda
# Expected:
# keda-operator-xxx 1/1 Running
# keda-operator-metrics-apiserver-xxx 1/1 Running
# keda-webhooks-xxx 1/1 Running
# Check KEDA CRDs
kubectl get crd | grep keda.sh
KEDA supports high-availability mode for the operator and metrics server. For production, run two replicas of each with pod anti-affinity rules:
helm upgrade --install keda kedacore/keda \
--namespace keda \
--set operator.replicaCount=2 \
--set metricsServer.replicaCount=2 \
--set webhooks.replicaCount=2
ScaledObject: The Core KEDA Resource
A ScaledObject links a Kubernetes workload (Deployment, StatefulSet, or any custom resource that implements the scale subresource) to one or more external scalers. KEDA creates an HPA based on the ScaledObject configuration.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaler
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-processor
pollingInterval: 15 # Check metric every 15 seconds
cooldownPeriod: 30 # Wait 30s before scaling down to zero
minReplicaCount: 0 # Allow scale to zero
maxReplicaCount: 50
advanced:
restoreToOriginalReplicaCount: true # Restore replicas when ScaledObject is deleted
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 120
policies:
- type: Percent
value: 25
periodSeconds: 60
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.production.svc:9092
consumerGroup: order-processor-group
topic: orders
lagThreshold: "50" # Scale up when lag per replica exceeds 50
Scaling on Kafka Consumer Lag
The Kafka scaler is one of KEDA's most popular integrations. It queries Kafka consumer group lag and scales the consumer deployment to maintain a target lag per replica. When the total lag is 500 messages and your lagThreshold is 50, KEDA targets 10 replicas.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
namespace: production
spec:
scaleTargetRef:
name: kafka-consumer
minReplicaCount: 1
maxReplicaCount: 30
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-headless.kafka.svc.cluster.local:9092
consumerGroup: my-consumer-group
topic: events
lagThreshold: "100"
offsetResetPolicy: latest
authenticationRef:
name: kafka-trigger-auth
---
# TriggerAuthentication for SASL/SSL Kafka
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: kafka-trigger-auth
namespace: production
spec:
secretTargetRef:
- parameter: sasl
name: kafka-secrets
key: sasl-mechanism
- parameter: username
name: kafka-secrets
key: username
- parameter: password
name: kafka-secrets
key: password
- parameter: tls
name: kafka-secrets
key: tls
Scaling on AWS SQS Queue Depth
The AWS SQS scaler reads the ApproximateNumberOfMessages attribute from an SQS queue and scales your deployment to process the backlog at a target rate. KEDA requires AWS credentials via environment variables, IRSA (IAM Roles for Service Accounts), or a TriggerAuthentication object.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-worker-scaler
namespace: production
spec:
scaleTargetRef:
name: sqs-worker
minReplicaCount: 0
maxReplicaCount: 20
triggers:
- type: aws-sqs-queue
authenticationRef:
name: aws-trigger-auth
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
queueLength: "10" # Messages per replica
awsRegion: us-east-1
scaleOnInFlight: "true" # Include in-flight messages in count
---
# TriggerAuthentication using IRSA (recommended for EKS)
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: aws-trigger-auth
namespace: production
spec:
podIdentity:
provider: aws-eks
eks.amazonaws.com/role-arn pointing to an IAM role that has sqs:GetQueueAttributes and sqs:ReceiveMessage permissions. Using IRSA avoids storing AWS credentials as Kubernetes Secrets.
Scaling on Prometheus Metrics
The Prometheus scaler evaluates a PromQL query and scales based on the numeric result. This is the most flexible KEDA scaler because Prometheus aggregates metrics from virtually any source — HTTP request rate, database connection pool usage, GPU utilization, and more.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-request-rate-scaler
namespace: production
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 2
maxReplicaCount: 40
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc:9090
metricName: http_requests_per_second
threshold: "200" # Scale up when metric exceeds 200 per replica
query: |
sum(rate(http_requests_total{namespace="production",app="api-server"}[2m]))
Scale to Zero and ScaledJob
One of KEDA's standout features is the ability to scale a Deployment to exactly zero replicas when there is no work, then scale back up when events arrive. This is ideal for batch processors, webhook handlers, and nightly ETL jobs that don't need to run continuously.
For truly batch workloads, ScaledJob is a better fit than ScaledObject. Instead of maintaining a long-running Deployment, ScaledJob creates Kubernetes Jobs on demand — one Job per batch of events:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: report-generator
namespace: production
spec:
jobTargetRef:
parallelism: 1
completions: 1
template:
spec:
containers:
- name: report-generator
image: myrepo/report-generator:latest
env:
- name: QUEUE_URL
value: https://sqs.us-east-1.amazonaws.com/123456789/reports
restartPolicy: Never
pollingInterval: 30
maxReplicaCount: 10
triggers:
- type: aws-sqs-queue
authenticationRef:
name: aws-trigger-auth
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/reports
queueLength: "1"
awsRegion: us-east-1
Production Tips and Troubleshooting
Common issues and how to resolve them when running KEDA in production:
- ScaledObject not scaling — Check KEDA operator logs:
kubectl logs -n keda -l app=keda-operator. Common causes are authentication failures to the event source, incorrect metric names, or the target deployment not existing. - Slow scale-up — Reduce
pollingInterval. The default is 30 seconds; setting it to 5–10 seconds makes KEDA react faster but increases load on the event source API. - Flapping replicas — Add a
scaleDown.stabilizationWindowSecondsin the HPA behavior config. Without stabilization, KEDA may oscillate between replica counts when the metric hovers near a threshold. - Scale-to-zero latency — When scaling from zero, there is a cold-start delay while pods are scheduled and initialized. For latency-sensitive workloads, consider
minReplicaCount: 1instead of 0 to keep at least one pod warm.
# Check ScaledObject status
kubectl describe scaledobject order-processor-scaler -n production
# View the HPA KEDA created
kubectl get hpa -n production
# Check KEDA metrics adapter
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .