Kubernetes Jaeger: Distributed Tracing Setup (2026)

Jaeger is an open-source distributed tracing platform originally developed at Uber and now a CNCF graduated project. In a Kubernetes microservices environment, a single user request can fan out across dozens of services — Jaeger collects, correlates, and visualises those traces so you can pinpoint latency bottlenecks, understand service dependencies, and debug cascading failures. This guide walks through deploying Jaeger on Kubernetes with the Jaeger Operator, instrumenting services with OpenTelemetry, and integrating traces with Grafana Tempo as a backend.

Distributed Tracing Concepts

Before deploying Jaeger, it helps to understand the core data model shared by all distributed tracing systems:

  • Trace — the complete end-to-end journey of a request through your system. Identified by a globally unique trace ID.
  • Span — a single unit of work within a trace (e.g., an HTTP call, a database query). Each span has a start time, duration, operation name, and key-value tags.
  • Parent-child relationship — spans form a directed acyclic graph. The root span is created by the first service that receives the request; child spans are created by downstream calls.
  • Context propagation — trace and span IDs are forwarded in HTTP headers (W3C TraceContext: traceparent) or gRPC metadata so downstream services can attach their spans to the same trace.
  • Baggage — arbitrary key-value data attached to a trace context and propagated with every downstream call (useful for tenant ID, experiment flags, etc.).

Jaeger uses the OpenTracing data model and is fully compatible with the OpenTelemetry SDK via the OTLP protocol, which is the modern recommended instrumentation approach in 2026.

OpenTelemetry vs OpenTracing: OpenTracing is now archived. All new instrumentation should use the OpenTelemetry SDK which supports traces, metrics, and logs in a single unified API. Jaeger accepts OTLP natively since version 1.35.

Installing the Jaeger Operator

The Jaeger Operator is a Kubernetes operator that manages the lifecycle of Jaeger deployments. It provides a Jaeger custom resource that encapsulates the full stack configuration.

# Install cert-manager (required by the Jaeger Operator webhook)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
kubectl wait --for=condition=available deployment/cert-manager -n cert-manager --timeout=120s

# Create the observability namespace
kubectl create namespace observability

# Install the Jaeger Operator
kubectl apply -n observability \
  -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml

# Verify the operator is running
kubectl get deployment jaeger-operator -n observability

The operator watches for Jaeger custom resources and creates the required Deployments, Services, ConfigMaps, and Ingress rules automatically. For cluster-wide tracing, give the operator cluster-level RBAC:

kubectl create clusterrolebinding jaeger-operator-cluster \
  --clusterrole=jaeger-operator-metrics-reader \
  --serviceaccount=observability:jaeger-operator

Creating a Jaeger Instance

With the operator running, create a Jaeger custom resource. For development use the all-in-one strategy (single pod, in-memory storage). For production, use the production strategy with a separate Elasticsearch or Cassandra backend.

# jaeger-allinone.yaml — development/testing only
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger-dev
  namespace: observability
spec:
  strategy: allInOne
  allInOne:
    image: jaegertracing/all-in-one:latest
    options:
      log-level: info
  storage:
    type: memory
    options:
      memory:
        max-traces: 100000
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
# jaeger-production.yaml — production with Elasticsearch
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger-production
  namespace: observability
spec:
  strategy: production
  collector:
    replicas: 3
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: "2"
        memory: 2Gi
    options:
      collector.queue-size: 10000
      collector.num-workers: 50
  query:
    replicas: 2
    options:
      query.max-clock-skew-adjustment: 1s
  storage:
    type: elasticsearch
    options:
      es.server-urls: https://elasticsearch:9200
      es.index-prefix: jaeger
      es.num-shards: 5
      es.num-replicas: 1
    secretName: jaeger-elasticsearch-secret
  sampling:
    options:
      default_strategy:
        type: probabilistic
        param: 0.1
# Apply and verify
kubectl apply -f jaeger-production.yaml
kubectl get pods -n observability -l app.kubernetes.io/name=jaeger-production

# Access the Jaeger UI
kubectl port-forward svc/jaeger-production-query 16686:16686 -n observability

OpenTelemetry Instrumentation

The recommended approach in 2026 is to use the OpenTelemetry SDK in your applications and send traces via OTLP to the Jaeger collector. The Jaeger Operator can also inject an OpenTelemetry sidecar automatically.

# OpenTelemetry Collector sidecar injection
# Add this annotation to your Deployment:
metadata:
  annotations:
    sidecar.jaegertracing.io/inject: "true"

For Java Spring Boot applications, add the OpenTelemetry Java agent:

# Deployment with OTel Java agent
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      initContainers:
        - name: otel-agent-init
          image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
          command: ["cp", "/javaagent.jar", "/otel/javaagent.jar"]
          volumeMounts:
            - mountPath: /otel
              name: otel-agent
      containers:
        - name: payment-service
          image: payment-service:latest
          env:
            - name: JAVA_TOOL_OPTIONS
              value: "-javaagent:/otel/javaagent.jar"
            - name: OTEL_SERVICE_NAME
              value: "payment-service"
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://jaeger-production-collector:4317"
            - name: OTEL_EXPORTER_OTLP_PROTOCOL
              value: "grpc"
            - name: OTEL_TRACES_SAMPLER
              value: "parentbased_traceidratio"
            - name: OTEL_TRACES_SAMPLER_ARG
              value: "0.1"
          volumeMounts:
            - mountPath: /otel
              name: otel-agent
      volumes:
        - name: otel-agent
          emptyDir: {}

For Node.js services, use the @opentelemetry/sdk-node package with auto-instrumentation:

// tracing.js — load before application code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  serviceName: process.env.OTEL_SERVICE_NAME || 'my-service',
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://jaeger-collector:4317',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Sampling Strategies

Tracing every request in a high-throughput production system is impractical — a service handling 10,000 RPS would generate millions of spans per minute. Sampling controls which traces are recorded and stored.

  • Probabilistic sampling — sample a fixed percentage of traces (e.g., 10%). Simple but may miss rare errors.
  • Rate limiting sampling — sample up to N traces per second per service. Good for low-traffic services.
  • Remote/adaptive sampling — Jaeger collector dynamically adjusts per-operation sample rates based on traffic. Recommended for production.
  • Parent-based sampling — respect the sampling decision made by the upstream service. Ensures a complete trace is collected or none at all.
# Jaeger remote sampling configuration (collector side)
sampling:
  options:
    strategies-file: /etc/jaeger/sampling_strategies.json

# sampling_strategies.json
{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.05
  },
  "service_strategies": [
    {
      "service": "payment-service",
      "type": "probabilistic",
      "param": 1.0,
      "operation_strategies": [
        {
          "operation": "POST /charge",
          "type": "probabilistic",
          "param": 1.0
        }
      ]
    },
    {
      "service": "recommendation-service",
      "type": "ratelimiting",
      "param": 10
    }
  ]
}
Tail-based sampling: The OpenTelemetry Collector supports tail-based sampling — keeping the decision until the full trace arrives. This lets you always capture slow traces (duration > 1s) or error traces, regardless of your base sample rate.

Storage Backends

Jaeger supports multiple storage backends. In-memory is only suitable for development. For production, Elasticsearch is the most common choice; Cassandra is preferred for very high write throughput; and Grafana Tempo is a modern cost-efficient alternative.

# Create Elasticsearch secret for Jaeger
kubectl create secret generic jaeger-elasticsearch-secret \
  --from-literal=ES_PASSWORD=changeme \
  --from-literal=ES_USERNAME=elastic \
  -n observability

# Check Elasticsearch index management
# Jaeger creates daily rolling indices: jaeger-span-YYYY-MM-DD, jaeger-service-YYYY-MM-DD
# Clean up old indices with the Jaeger Spark dependencies job or ES ILM policies

# Spark dependencies job for service map
kubectl apply -f - <

Grafana Tempo Integration

Grafana Tempo is a cost-efficient distributed tracing backend that stores traces in object storage (S3/GCS) and integrates tightly with Grafana and Loki. You can use Tempo as the trace storage backend and still use the Jaeger UI (or Grafana's Explore view) for querying.

# tempo-values.yaml
tempo:
  storage:
    trace:
      backend: s3
      s3:
        bucket: my-tempo-traces
        endpoint: s3.amazonaws.com
        region: us-east-1
  receivers:
    jaeger:
      protocols:
        thrift_compact:
          endpoint: "0.0.0.0:6831"
        thrift_http:
          endpoint: "0.0.0.0:14268"
        grpc:
          endpoint: "0.0.0.0:14250"
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"
        http:
          endpoint: "0.0.0.0:4318"
helm upgrade --install tempo grafana/tempo-distributed \
  --namespace observability \
  --values tempo-values.yaml

In Grafana, add Tempo as a datasource and enable the TraceQL search. You can then correlate Loki logs with traces using the Derived Fields feature — configure Loki to extract trace IDs from log lines and link them directly to the corresponding Jaeger/Tempo trace.

Frequently Asked Questions

What is the difference between Jaeger and Zipkin?

Both are distributed tracing systems with similar data models. Jaeger has more active CNCF community support, better Kubernetes integration via the operator, adaptive sampling, and native OTLP support. Zipkin is older and simpler but lacks some of Jaeger's advanced features. For new deployments in 2026, Jaeger or Grafana Tempo are the recommended choices.

How much storage does Jaeger need per trace?

A typical trace with 10 spans and basic metadata consumes roughly 2–5 KB compressed in Elasticsearch. At 10,000 RPS with 10% sampling, that is 1,000 traces/second × 3.5 KB = ~3.5 MB/s or approximately 300 GB/day before replication. Index rollover and ILM policies are essential for managing Elasticsearch storage in production.

Can Jaeger work with Istio service mesh?

Yes. Istio's Envoy sidecar automatically propagates the B3 and W3C TraceContext headers between services. If you configure Jaeger as Istio's tracing backend, the service mesh generates traces without any application code changes. However, application-level spans (database calls, external APIs) require SDK instrumentation.