Kubernetes Jaeger: Distributed Tracing Setup (2026)
Jaeger is an open-source distributed tracing platform originally developed at Uber and now a CNCF graduated project. In a Kubernetes microservices environment, a single user request can fan out across dozens of services — Jaeger collects, correlates, and visualises those traces so you can pinpoint latency bottlenecks, understand service dependencies, and debug cascading failures. This guide walks through deploying Jaeger on Kubernetes with the Jaeger Operator, instrumenting services with OpenTelemetry, and integrating traces with Grafana Tempo as a backend.
Table of Contents
Distributed Tracing Concepts
Before deploying Jaeger, it helps to understand the core data model shared by all distributed tracing systems:
- Trace — the complete end-to-end journey of a request through your system. Identified by a globally unique trace ID.
- Span — a single unit of work within a trace (e.g., an HTTP call, a database query). Each span has a start time, duration, operation name, and key-value tags.
- Parent-child relationship — spans form a directed acyclic graph. The root span is created by the first service that receives the request; child spans are created by downstream calls.
- Context propagation — trace and span IDs are forwarded in HTTP headers (W3C TraceContext:
traceparent) or gRPC metadata so downstream services can attach their spans to the same trace. - Baggage — arbitrary key-value data attached to a trace context and propagated with every downstream call (useful for tenant ID, experiment flags, etc.).
Jaeger uses the OpenTracing data model and is fully compatible with the OpenTelemetry SDK via the OTLP protocol, which is the modern recommended instrumentation approach in 2026.
Installing the Jaeger Operator
The Jaeger Operator is a Kubernetes operator that manages the lifecycle of Jaeger deployments. It provides a Jaeger custom resource that encapsulates the full stack configuration.
# Install cert-manager (required by the Jaeger Operator webhook)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
kubectl wait --for=condition=available deployment/cert-manager -n cert-manager --timeout=120s
# Create the observability namespace
kubectl create namespace observability
# Install the Jaeger Operator
kubectl apply -n observability \
-f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml
# Verify the operator is running
kubectl get deployment jaeger-operator -n observability
The operator watches for Jaeger custom resources and creates the required Deployments, Services, ConfigMaps, and Ingress rules automatically. For cluster-wide tracing, give the operator cluster-level RBAC:
kubectl create clusterrolebinding jaeger-operator-cluster \
--clusterrole=jaeger-operator-metrics-reader \
--serviceaccount=observability:jaeger-operator
Creating a Jaeger Instance
With the operator running, create a Jaeger custom resource. For development use the all-in-one strategy (single pod, in-memory storage). For production, use the production strategy with a separate Elasticsearch or Cassandra backend.
# jaeger-allinone.yaml — development/testing only
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger-dev
namespace: observability
spec:
strategy: allInOne
allInOne:
image: jaegertracing/all-in-one:latest
options:
log-level: info
storage:
type: memory
options:
memory:
max-traces: 100000
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
# jaeger-production.yaml — production with Elasticsearch
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger-production
namespace: observability
spec:
strategy: production
collector:
replicas: 3
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
options:
collector.queue-size: 10000
collector.num-workers: 50
query:
replicas: 2
options:
query.max-clock-skew-adjustment: 1s
storage:
type: elasticsearch
options:
es.server-urls: https://elasticsearch:9200
es.index-prefix: jaeger
es.num-shards: 5
es.num-replicas: 1
secretName: jaeger-elasticsearch-secret
sampling:
options:
default_strategy:
type: probabilistic
param: 0.1
# Apply and verify
kubectl apply -f jaeger-production.yaml
kubectl get pods -n observability -l app.kubernetes.io/name=jaeger-production
# Access the Jaeger UI
kubectl port-forward svc/jaeger-production-query 16686:16686 -n observability
OpenTelemetry Instrumentation
The recommended approach in 2026 is to use the OpenTelemetry SDK in your applications and send traces via OTLP to the Jaeger collector. The Jaeger Operator can also inject an OpenTelemetry sidecar automatically.
# OpenTelemetry Collector sidecar injection
# Add this annotation to your Deployment:
metadata:
annotations:
sidecar.jaegertracing.io/inject: "true"
For Java Spring Boot applications, add the OpenTelemetry Java agent:
# Deployment with OTel Java agent
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
spec:
initContainers:
- name: otel-agent-init
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
command: ["cp", "/javaagent.jar", "/otel/javaagent.jar"]
volumeMounts:
- mountPath: /otel
name: otel-agent
containers:
- name: payment-service
image: payment-service:latest
env:
- name: JAVA_TOOL_OPTIONS
value: "-javaagent:/otel/javaagent.jar"
- name: OTEL_SERVICE_NAME
value: "payment-service"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://jaeger-production-collector:4317"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "0.1"
volumeMounts:
- mountPath: /otel
name: otel-agent
volumes:
- name: otel-agent
emptyDir: {}
For Node.js services, use the @opentelemetry/sdk-node package with auto-instrumentation:
// tracing.js — load before application code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
serviceName: process.env.OTEL_SERVICE_NAME || 'my-service',
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://jaeger-collector:4317',
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
Sampling Strategies
Tracing every request in a high-throughput production system is impractical — a service handling 10,000 RPS would generate millions of spans per minute. Sampling controls which traces are recorded and stored.
- Probabilistic sampling — sample a fixed percentage of traces (e.g., 10%). Simple but may miss rare errors.
- Rate limiting sampling — sample up to N traces per second per service. Good for low-traffic services.
- Remote/adaptive sampling — Jaeger collector dynamically adjusts per-operation sample rates based on traffic. Recommended for production.
- Parent-based sampling — respect the sampling decision made by the upstream service. Ensures a complete trace is collected or none at all.
# Jaeger remote sampling configuration (collector side)
sampling:
options:
strategies-file: /etc/jaeger/sampling_strategies.json
# sampling_strategies.json
{
"default_strategy": {
"type": "probabilistic",
"param": 0.05
},
"service_strategies": [
{
"service": "payment-service",
"type": "probabilistic",
"param": 1.0,
"operation_strategies": [
{
"operation": "POST /charge",
"type": "probabilistic",
"param": 1.0
}
]
},
{
"service": "recommendation-service",
"type": "ratelimiting",
"param": 10
}
]
}
Storage Backends
Jaeger supports multiple storage backends. In-memory is only suitable for development. For production, Elasticsearch is the most common choice; Cassandra is preferred for very high write throughput; and Grafana Tempo is a modern cost-efficient alternative.
# Create Elasticsearch secret for Jaeger
kubectl create secret generic jaeger-elasticsearch-secret \
--from-literal=ES_PASSWORD=changeme \
--from-literal=ES_USERNAME=elastic \
-n observability
# Check Elasticsearch index management
# Jaeger creates daily rolling indices: jaeger-span-YYYY-MM-DD, jaeger-service-YYYY-MM-DD
# Clean up old indices with the Jaeger Spark dependencies job or ES ILM policies
# Spark dependencies job for service map
kubectl apply -f - <
Grafana Tempo Integration
Grafana Tempo is a cost-efficient distributed tracing backend that stores traces in object storage (S3/GCS) and integrates tightly with Grafana and Loki. You can use Tempo as the trace storage backend and still use the Jaeger UI (or Grafana's Explore view) for querying.
# tempo-values.yaml
tempo:
storage:
trace:
backend: s3
s3:
bucket: my-tempo-traces
endpoint: s3.amazonaws.com
region: us-east-1
receivers:
jaeger:
protocols:
thrift_compact:
endpoint: "0.0.0.0:6831"
thrift_http:
endpoint: "0.0.0.0:14268"
grpc:
endpoint: "0.0.0.0:14250"
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
helm upgrade --install tempo grafana/tempo-distributed \
--namespace observability \
--values tempo-values.yaml
In Grafana, add Tempo as a datasource and enable the TraceQL search. You can then correlate Loki logs with traces using the Derived Fields feature — configure Loki to extract trace IDs from log lines and link them directly to the corresponding Jaeger/Tempo trace.
Frequently Asked Questions
What is the difference between Jaeger and Zipkin?
Both are distributed tracing systems with similar data models. Jaeger has more active CNCF community support, better Kubernetes integration via the operator, adaptive sampling, and native OTLP support. Zipkin is older and simpler but lacks some of Jaeger's advanced features. For new deployments in 2026, Jaeger or Grafana Tempo are the recommended choices.
How much storage does Jaeger need per trace?
A typical trace with 10 spans and basic metadata consumes roughly 2–5 KB compressed in Elasticsearch. At 10,000 RPS with 10% sampling, that is 1,000 traces/second × 3.5 KB = ~3.5 MB/s or approximately 300 GB/day before replication. Index rollover and ILM policies are essential for managing Elasticsearch storage in production.
Can Jaeger work with Istio service mesh?
Yes. Istio's Envoy sidecar automatically propagates the B3 and W3C TraceContext headers between services. If you configure Jaeger as Istio's tracing backend, the service mesh generates traces without any application code changes. However, application-level spans (database calls, external APIs) require SDK instrumentation.