Kubernetes Loki: Log Aggregation and Querying (2026)

Grafana Loki is a horizontally scalable, highly available log aggregation system designed specifically for cloud-native environments. Unlike traditional logging solutions that index the full content of log lines, Loki indexes only metadata labels — making it dramatically cheaper to operate at scale. Combined with Promtail for log shipping and Grafana for visualization, the Loki stack gives Kubernetes clusters a powerful, cost-effective observability pillar that complements Prometheus metrics and Jaeger traces.

Loki Architecture Overview
Installing Loki with Helm
Configuring Promtail Log Shipping
Writing LogQL Queries
Grafana Dashboards for Kubernetes Logs
Multi-Tenancy and Label Strategy
Production Configuration and Retention
Frequently Asked Questions

Loki Architecture Overview

Loki follows a microservices-inspired architecture with several distinct components that can be deployed together (single binary) or independently (microservices mode) depending on scale requirements.

Distributor — receives log streams from agents (Promtail, Fluentbit, etc.) and fans out writes to multiple ingesters
Ingester — buffers incoming log chunks in memory and periodically flushes them to object storage (S3, GCS, Azure Blob)
Querier — executes LogQL queries against both the ingester in-memory data and the long-term object store
Query Frontend — queues and splits large queries for parallelism, caches results
Compactor — merges index shards and enforces retention policies
Ruler — evaluates alerting rules written in LogQL

The critical design choice is Loki's label model. Every log stream is identified by a set of key-value labels (e.g., namespace="production", app="api-server"). These labels are indexed; the log content itself is stored compressed and only scanned at query time. This keeps index size small but means you should avoid high-cardinality labels such as pod names that change frequently.

Key insight: Loki's "logs like Prometheus" philosophy means you query logs the same way you query metrics — using label selectors first, then filtering log content. Labels should be low-cardinality (namespace, app, environment), not high-cardinality (pod UID, request ID).

Installing Loki with Helm

The easiest way to deploy the full Loki stack on Kubernetes is via the official Grafana Helm chart. The loki-stack chart bundles Loki, Promtail, and optionally Grafana in a single release.

# Add the Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Create a dedicated namespace
kubectl create namespace monitoring

# Install loki-stack (Loki + Promtail + Grafana)
helm upgrade --install loki-stack grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi \
  --set loki.persistence.storageClassName=standard \
  --set grafana.enabled=true \
  --set grafana.sidecar.datasources.enabled=true \
  --set promtail.enabled=true

For production, you should use Loki Distributed or Loki Simple Scalable mode backed by object storage. Here is a values file for simple scalable mode with S3:

# loki-values.yaml
loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 3
  storage:
    type: s3
    s3:
      endpoint: s3.amazonaws.com
      region: us-east-1
      bucketnames: my-loki-chunks
      access_key_id: ${AWS_ACCESS_KEY_ID}
      secret_access_key: ${AWS_SECRET_ACCESS_KEY}
      insecure: false
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  limits_config:
    retention_period: 30d
    max_query_series: 10000

write:
  replicas: 3
read:
  replicas: 3
backend:
  replicas: 3

helm upgrade --install loki grafana/loki \
  --namespace monitoring \
  --values loki-values.yaml

Storage note: For production always use object storage (S3/GCS/Azure Blob) rather than local PersistentVolumes. Object storage scales independently of compute and does not require pod affinity rules for consistent data access.

Configuring Promtail Log Shipping

Promtail is a log shipping agent deployed as a DaemonSet — one pod per node — that reads log files from the node's /var/log/pods/ directory and pushes them to Loki. It automatically discovers pods via the Kubernetes API and attaches metadata labels.

# promtail-config.yaml (ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: monitoring
data:
  promtail.yaml: |
    server:
      http_listen_port: 9080
      grpc_listen_port: 0

    positions:
      filename: /tmp/positions.yaml

    clients:
      - url: http://loki-stack:3100/loki/api/v1/push

    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        pipeline_stages:
          - cri: {}
          - multiline:
              firstline: '^\d{4}-\d{2}-\d{2}'
              max_wait_time: 3s
          - labeldrop:
              - filename
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_node_name]
            target_label: node
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_name]
            target_label: container
          - source_labels: [__meta_kubernetes_pod_label_app]
            target_label: app
          - replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_uid
              - __meta_kubernetes_pod_container_name
            target_label: __path__

The pipeline_stages section is powerful — it lets you parse, transform, and filter log lines before they reach Loki. Common stages include:

cri — parse CRI-O/containerd log format (timestamp, stream, flags, message)
json — parse structured JSON log lines and extract fields as labels
regex — extract values from unstructured log lines using named capture groups
multiline — collapse stack traces into a single log entry
drop — discard noisy log lines before they consume storage

    pipeline_stages:
      - cri: {}
      # Parse JSON application logs
      - json:
          expressions:
            level: level
            request_id: requestId
            duration_ms: durationMs
      # Promote log level to a Loki label
      - labels:
          level:
      # Drop debug logs in production
      - drop:
          source: level
          expression: "debug"

Writing LogQL Queries

LogQL is Loki's query language, modelled after PromQL. Every query starts with a log stream selector (label filter) in curly braces, optionally followed by a filter expression and a metric query.

# Show all logs from the production namespace
{namespace="production"}

# Filter to error logs only
{namespace="production", app="api-server"} |= "ERROR"

# Regex filter — match lines containing HTTP 5xx
{namespace="production"} |~ "HTTP/[12]\.x\" 5[0-9]{2}"

# Parse JSON and filter by field
{app="payment-service"} | json | level="error" | duration_ms > 500

# Count error rate per minute (metric query)
sum by (app) (
  rate({namespace="production"} |= "ERROR" [5m])
)

# 99th percentile latency from structured logs
quantile_over_time(0.99,
  {app="api-server"} | json | unwrap duration_ms [5m]
) by (app)

LogQL metric queries turn log streams into time-series data that can be graphed in Grafana or used in alerting rules — bridging the gap between logs and metrics.

Performance tip: Always start LogQL queries with the most selective label selector possible. Loki evaluates label selectors first (using the index) before scanning log content. A broad selector like {job="varlogs"} will scan far more data than {namespace="production", app="checkout"}.

Grafana Dashboards for Kubernetes Logs

Loki integrates directly with Grafana as a datasource. The Explore view lets you run ad-hoc LogQL queries; the Logs panel embeds live log streams into dashboards alongside metric panels.

# Get the Grafana admin password from the secret
kubectl get secret --namespace monitoring loki-stack-grafana \
  -o jsonpath="{.data.admin-password}" | base64 --decode; echo

# Port-forward to access the Grafana UI
kubectl port-forward --namespace monitoring svc/loki-stack-grafana 3000:80

Useful Grafana dashboard panels for Kubernetes log monitoring:

Log volume by namespace — sum by (namespace) (rate({job=~".+"} [1m]))
Error rate by app — sum by (app) (rate({namespace="production"} |= "ERROR" [5m]))
Live log stream — Logs panel with {namespace="$namespace", app="$app"} using template variables
Top error messages — topk(10, sum by (message) (count_over_time({namespace="production"} |= "ERROR" | json [1h])))

Import Grafana dashboard ID 15141 (Loki Kubernetes Logs) for a pre-built Kubernetes log overview. The community also publishes dashboards for specific stacks like Spring Boot (14852) and NGINX ingress logs (12559).

Multi-Tenancy and Label Strategy

Loki supports multi-tenancy through an X-Scope-OrgID HTTP header. When auth_enabled: true, each request must include a tenant ID and data is fully isolated per tenant. This is useful for shared Loki installations serving multiple teams or environments.

# Promtail config for multi-tenant setup
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-platform

# Per-namespace tenant routing using relabeling
scrape_configs:
  - job_name: kubernetes-pods
    ...
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: __tenant_id__

Label strategy best practices for Kubernetes Loki deployments:

Use static labels: namespace, app, environment (staging/production), cluster
Avoid high-cardinality labels: pod name, node name, request ID — these create too many streams
Promote important structured fields to labels with labels: pipeline stage, but keep the count under 10 per stream
Use parsed fields (json/regex stages) for filtering rather than labels for high-cardinality data

Stream limit: Loki has a default limit of 10,000 active streams per tenant. Exceeding this causes too many outstanding requests errors. Monitor stream count with loki_ingester_streams_created_total and loki_ingester_active_streams metrics.

Production Configuration and Retention

Several configuration areas need attention before running Loki in production on Kubernetes:

# Production Loki limits_config
limits_config:
  # Global retention (overridden per tenant)
  retention_period: 30d
  # Rate limit ingestion to prevent runaway log producers
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32
  # Limit query time range to prevent expensive scans
  max_query_length: 721h   # 30 days
  max_query_range: 168h    # 7 days per query
  max_entries_limit_per_query: 5000
  # Parallel query execution
  max_query_parallelism: 32

# Compactor handles retention enforcement
compactor:
  working_directory: /loki/compactor
  shared_store: s3
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

# Per-tenant retention via overrides
overrides:
  tenant-dev:
    retention_period: 7d
    ingestion_rate_mb: 4
  tenant-production:
    retention_period: 90d
    ingestion_rate_mb: 32

Resource sizing guidelines for production Loki on Kubernetes:

Distributor: 0.5 CPU / 512Mi per 10 MB/s ingestion throughput
Ingester: 2 CPU / 4Gi per instance; use 3+ replicas with WAL (write-ahead log) enabled
Querier: 1 CPU / 2Gi per instance; scale based on query concurrency
PodDisruptionBudget: set minAvailable: 2 for ingesters to prevent data loss during node drains

Frequently Asked Questions

What is the difference between Loki and Elasticsearch for Kubernetes logging?

Elasticsearch indexes the full content of every log line, enabling fast full-text search but requiring significant CPU and RAM (typically 8–32GB per node). Loki only indexes metadata labels, making it 10–100x cheaper to store and operate, but queries must scan compressed log chunks rather than using an inverted index. Loki wins on cost and simplicity; Elasticsearch wins on ad-hoc full-text search performance across billions of lines.

Can I use Fluentbit instead of Promtail?

Yes. Grafana publishes an official Fluent Bit output plugin for Loki. Fluent Bit is preferred when you need advanced filtering, parsing of many log formats, or already run Fluent Bit for Elasticsearch. Promtail is simpler and integrates more tightly with Loki's label model and service discovery.

How do I alert on log patterns using Loki?

Enable the Loki Ruler component and define alerting rules using LogQL metric queries. The rules are evaluated on the Loki server and can fire alerts to Alertmanager exactly like Prometheus rules. Example: alert when error rate exceeds 10 per minute for any production app.

What object storage does Loki support?

Loki supports AWS S3 (and S3-compatible stores like MinIO), Google Cloud Storage, Azure Blob Storage, and filesystem storage. For production, S3 or GCS is recommended. MinIO is popular for on-premises deployments.