Kubernetes Loki: Log Aggregation and Querying (2026)
Grafana Loki is a horizontally scalable, highly available log aggregation system designed specifically for cloud-native environments. Unlike traditional logging solutions that index the full content of log lines, Loki indexes only metadata labels — making it dramatically cheaper to operate at scale. Combined with Promtail for log shipping and Grafana for visualization, the Loki stack gives Kubernetes clusters a powerful, cost-effective observability pillar that complements Prometheus metrics and Jaeger traces.
Table of Contents
Loki Architecture Overview
Loki follows a microservices-inspired architecture with several distinct components that can be deployed together (single binary) or independently (microservices mode) depending on scale requirements.
- Distributor — receives log streams from agents (Promtail, Fluentbit, etc.) and fans out writes to multiple ingesters
- Ingester — buffers incoming log chunks in memory and periodically flushes them to object storage (S3, GCS, Azure Blob)
- Querier — executes LogQL queries against both the ingester in-memory data and the long-term object store
- Query Frontend — queues and splits large queries for parallelism, caches results
- Compactor — merges index shards and enforces retention policies
- Ruler — evaluates alerting rules written in LogQL
The critical design choice is Loki's label model. Every log stream is identified by a set of key-value labels (e.g., namespace="production", app="api-server"). These labels are indexed; the log content itself is stored compressed and only scanned at query time. This keeps index size small but means you should avoid high-cardinality labels such as pod names that change frequently.
Installing Loki with Helm
The easiest way to deploy the full Loki stack on Kubernetes is via the official Grafana Helm chart. The loki-stack chart bundles Loki, Promtail, and optionally Grafana in a single release.
# Add the Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Create a dedicated namespace
kubectl create namespace monitoring
# Install loki-stack (Loki + Promtail + Grafana)
helm upgrade --install loki-stack grafana/loki-stack \
--namespace monitoring \
--set loki.persistence.enabled=true \
--set loki.persistence.size=50Gi \
--set loki.persistence.storageClassName=standard \
--set grafana.enabled=true \
--set grafana.sidecar.datasources.enabled=true \
--set promtail.enabled=true
For production, you should use Loki Distributed or Loki Simple Scalable mode backed by object storage. Here is a values file for simple scalable mode with S3:
# loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 3
storage:
type: s3
s3:
endpoint: s3.amazonaws.com
region: us-east-1
bucketnames: my-loki-chunks
access_key_id: ${AWS_ACCESS_KEY_ID}
secret_access_key: ${AWS_SECRET_ACCESS_KEY}
insecure: false
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
max_query_series: 10000
write:
replicas: 3
read:
replicas: 3
backend:
replicas: 3
helm upgrade --install loki grafana/loki \
--namespace monitoring \
--values loki-values.yaml
Configuring Promtail Log Shipping
Promtail is a log shipping agent deployed as a DaemonSet — one pod per node — that reads log files from the node's /var/log/pods/ directory and pushes them to Loki. It automatically discovers pods via the Kubernetes API and attaches metadata labels.
# promtail-config.yaml (ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
namespace: monitoring
data:
promtail.yaml: |
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki-stack:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
pipeline_stages:
- cri: {}
- multiline:
firstline: '^\d{4}-\d{2}-\d{2}'
max_wait_time: 3s
- labeldrop:
- filename
relabel_configs:
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: node
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
The pipeline_stages section is powerful — it lets you parse, transform, and filter log lines before they reach Loki. Common stages include:
- cri — parse CRI-O/containerd log format (timestamp, stream, flags, message)
- json — parse structured JSON log lines and extract fields as labels
- regex — extract values from unstructured log lines using named capture groups
- multiline — collapse stack traces into a single log entry
- drop — discard noisy log lines before they consume storage
pipeline_stages:
- cri: {}
# Parse JSON application logs
- json:
expressions:
level: level
request_id: requestId
duration_ms: durationMs
# Promote log level to a Loki label
- labels:
level:
# Drop debug logs in production
- drop:
source: level
expression: "debug"
Writing LogQL Queries
LogQL is Loki's query language, modelled after PromQL. Every query starts with a log stream selector (label filter) in curly braces, optionally followed by a filter expression and a metric query.
# Show all logs from the production namespace
{namespace="production"}
# Filter to error logs only
{namespace="production", app="api-server"} |= "ERROR"
# Regex filter — match lines containing HTTP 5xx
{namespace="production"} |~ "HTTP/[12]\.x\" 5[0-9]{2}"
# Parse JSON and filter by field
{app="payment-service"} | json | level="error" | duration_ms > 500
# Count error rate per minute (metric query)
sum by (app) (
rate({namespace="production"} |= "ERROR" [5m])
)
# 99th percentile latency from structured logs
quantile_over_time(0.99,
{app="api-server"} | json | unwrap duration_ms [5m]
) by (app)
LogQL metric queries turn log streams into time-series data that can be graphed in Grafana or used in alerting rules — bridging the gap between logs and metrics.
{job="varlogs"} will scan far more data than {namespace="production", app="checkout"}.
Grafana Dashboards for Kubernetes Logs
Loki integrates directly with Grafana as a datasource. The Explore view lets you run ad-hoc LogQL queries; the Logs panel embeds live log streams into dashboards alongside metric panels.
# Get the Grafana admin password from the secret
kubectl get secret --namespace monitoring loki-stack-grafana \
-o jsonpath="{.data.admin-password}" | base64 --decode; echo
# Port-forward to access the Grafana UI
kubectl port-forward --namespace monitoring svc/loki-stack-grafana 3000:80
Useful Grafana dashboard panels for Kubernetes log monitoring:
- Log volume by namespace —
sum by (namespace) (rate({job=~".+"} [1m])) - Error rate by app —
sum by (app) (rate({namespace="production"} |= "ERROR" [5m])) - Live log stream — Logs panel with
{namespace="$namespace", app="$app"}using template variables - Top error messages —
topk(10, sum by (message) (count_over_time({namespace="production"} |= "ERROR" | json [1h])))
Import Grafana dashboard ID 15141 (Loki Kubernetes Logs) for a pre-built Kubernetes log overview. The community also publishes dashboards for specific stacks like Spring Boot (14852) and NGINX ingress logs (12559).
Multi-Tenancy and Label Strategy
Loki supports multi-tenancy through an X-Scope-OrgID HTTP header. When auth_enabled: true, each request must include a tenant ID and data is fully isolated per tenant. This is useful for shared Loki installations serving multiple teams or environments.
# Promtail config for multi-tenant setup
clients:
- url: http://loki:3100/loki/api/v1/push
tenant_id: team-platform
# Per-namespace tenant routing using relabeling
scrape_configs:
- job_name: kubernetes-pods
...
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
target_label: __tenant_id__
Label strategy best practices for Kubernetes Loki deployments:
- Use static labels:
namespace,app,environment(staging/production),cluster - Avoid high-cardinality labels: pod name, node name, request ID — these create too many streams
- Promote important structured fields to labels with
labels:pipeline stage, but keep the count under 10 per stream - Use parsed fields (json/regex stages) for filtering rather than labels for high-cardinality data
too many outstanding requests errors. Monitor stream count with loki_ingester_streams_created_total and loki_ingester_active_streams metrics.
Production Configuration and Retention
Several configuration areas need attention before running Loki in production on Kubernetes:
# Production Loki limits_config
limits_config:
# Global retention (overridden per tenant)
retention_period: 30d
# Rate limit ingestion to prevent runaway log producers
ingestion_rate_mb: 16
ingestion_burst_size_mb: 32
# Limit query time range to prevent expensive scans
max_query_length: 721h # 30 days
max_query_range: 168h # 7 days per query
max_entries_limit_per_query: 5000
# Parallel query execution
max_query_parallelism: 32
# Compactor handles retention enforcement
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
# Per-tenant retention via overrides
overrides:
tenant-dev:
retention_period: 7d
ingestion_rate_mb: 4
tenant-production:
retention_period: 90d
ingestion_rate_mb: 32
Resource sizing guidelines for production Loki on Kubernetes:
- Distributor: 0.5 CPU / 512Mi per 10 MB/s ingestion throughput
- Ingester: 2 CPU / 4Gi per instance; use 3+ replicas with WAL (write-ahead log) enabled
- Querier: 1 CPU / 2Gi per instance; scale based on query concurrency
- PodDisruptionBudget: set
minAvailable: 2for ingesters to prevent data loss during node drains
Frequently Asked Questions
What is the difference between Loki and Elasticsearch for Kubernetes logging?
Elasticsearch indexes the full content of every log line, enabling fast full-text search but requiring significant CPU and RAM (typically 8–32GB per node). Loki only indexes metadata labels, making it 10–100x cheaper to store and operate, but queries must scan compressed log chunks rather than using an inverted index. Loki wins on cost and simplicity; Elasticsearch wins on ad-hoc full-text search performance across billions of lines.
Can I use Fluentbit instead of Promtail?
Yes. Grafana publishes an official Fluent Bit output plugin for Loki. Fluent Bit is preferred when you need advanced filtering, parsing of many log formats, or already run Fluent Bit for Elasticsearch. Promtail is simpler and integrates more tightly with Loki's label model and service discovery.
How do I alert on log patterns using Loki?
Enable the Loki Ruler component and define alerting rules using LogQL metric queries. The rules are evaluated on the Loki server and can fire alerts to Alertmanager exactly like Prometheus rules. Example: alert when error rate exceeds 10 per minute for any production app.
What object storage does Loki support?
Loki supports AWS S3 (and S3-compatible stores like MinIO), Google Cloud Storage, Azure Blob Storage, and filesystem storage. For production, S3 or GCS is recommended. MinIO is popular for on-premises deployments.