Kubernetes Monitoring with Prometheus and Grafana (2026)

Prometheus and Grafana have become the de facto observability stack for Kubernetes. The kube-prometheus-stack Helm chart bundles everything you need: Prometheus Operator, Grafana, AlertManager, Node Exporter, kube-state-metrics, and a full set of pre-built dashboards and alerts. This guide walks through installation, real PromQL queries, alert routing, log aggregation with Loki, and custom application metrics.

kube-prometheus-stack Installation

The kube-prometheus-stack (formerly prometheus-operator) is the fastest path to a production-grade Kubernetes monitoring setup. It installs Prometheus, Grafana, AlertManager, and all supporting exporters in a single Helm release.

# Add the Prometheus Community chart repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install into a dedicated namespace
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword='YourSecurePassword' \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=gp3 \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=gp3 \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi
# Verify all components are running
kubectl get pods -n monitoring

# Access Grafana (port-forward for testing)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open http://localhost:3000 — admin / YourSecurePassword

# Access Prometheus UI
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

# Access AlertManager UI
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093
Pro Tip: Use a values.yaml file instead of --set flags for production installs. Store this file in Git alongside your other infrastructure code. Run helm show values prometheus-community/kube-prometheus-stack > values.yaml to get the full defaults as a starting point.

Essential PromQL Queries

PromQL (Prometheus Query Language) is a functional query language for time series data. Here are the most useful queries for Kubernetes cluster monitoring.

CPU Queries

# CPU usage per node (percentage)
100 - (avg by(node) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# CPU usage per namespace
sum by(namespace) (
  rate(container_cpu_usage_seconds_total{container!=""}[5m])
)

# Top 5 CPU-consuming pods
topk(5,
  sum by(pod, namespace) (
    rate(container_cpu_usage_seconds_total{container!=""}[5m])
  )
)

# CPU throttling ratio per container (high = limits too low)
sum by(pod, container) (
  rate(container_cpu_cfs_throttled_seconds_total[5m])
) /
sum by(pod, container) (
  rate(container_cpu_cfs_periods_total[5m])
)

Memory Queries

# Working set memory per pod (what the kernel won't reclaim)
sum by(pod, namespace) (
  container_memory_working_set_bytes{container!=""}
)

# Memory usage as percentage of limit
sum by(pod) (container_memory_working_set_bytes{container!=""})
  /
sum by(pod) (kube_pod_container_resource_limits{resource="memory"})
  * 100

# OOMKilled events in the last hour
kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1

HTTP Request Rate and Latency

# Request rate per service (requires app instrumentation)
sum by(service) (
  rate(http_requests_total[5m])
)

# 95th percentile latency using histogram_quantile
histogram_quantile(0.95,
  sum by(le, service) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

# Error rate per service
sum by(service) (rate(http_requests_total{status=~"5.."}[5m]))
  /
sum by(service) (rate(http_requests_total[5m]))

Cluster-Level Health

# Nodes not ready
kube_node_status_condition{condition="Ready",status="true"} == 0

# Pods in non-running state
kube_pod_status_phase{phase!~"Running|Succeeded"} == 1

# PersistentVolumes with low disk space
(
  kubelet_volume_stats_available_bytes
  / kubelet_volume_stats_capacity_bytes
) * 100 < 15

Grafana Dashboards

kube-prometheus-stack ships with pre-built dashboards. Here are the most important ones and what to look for in each.

Node Exporter / Nodes Dashboard (ID: 1860)

Shows per-node CPU, memory, disk I/O, and network. Key panels to watch: CPU steal (indicates noisy neighbor on shared VMs), iowait spikes (storage bottleneck), and memory available dropping toward zero (risk of OOM evictions). Set alerts when available memory falls below 10%.

Kubernetes / Compute Resources / Cluster (ID: 15757)

The most useful cluster-wide dashboard. Shows CPU and memory requests vs limits vs actual usage across all namespaces. Use the "CPU Throttling" panel to identify containers with limits set too low — throttling above 25% degrades latency significantly.

Kubernetes / Pods Dashboard

Drill down to individual pod CPU/memory over time. Essential for debugging OOMKill events (look for memory climbing steadily to limit then resetting) and CPU starvation (flat CPU despite queued requests, high throttling).

Adding a Custom Dashboard

# Provision a dashboard via ConfigMap (Grafana sidecar auto-loads it)
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-app-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"   # Triggers the sidecar to load this
data:
  my-app-dashboard.json: |
    {
      "title": "My App Metrics",
      "panels": [
        {
          "title": "Request Rate",
          "type": "graph",
          "targets": [{
            "expr": "sum(rate(http_requests_total{job=\"my-app\"}[5m]))",
            "legendFormat": "req/s"
          }]
        }
      ]
    }

AlertManager Configuration

AlertManager handles deduplication, grouping, silencing, and routing of alerts from Prometheus. Configure it via a Secret that kube-prometheus-stack manages.

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: production-routing
  namespace: production
spec:
  route:
    groupBy: ['alertname', 'namespace']
    groupWait: 30s         # Wait before sending first alert in a group
    groupInterval: 5m      # Wait before sending updates to a group
    repeatInterval: 12h    # Resend if still firing after this long
    receiver: 'slack-critical'
    routes:
    # High priority: page on-call
    - matchers:
      - name: severity
        value: critical
      receiver: pagerduty-oncall
      groupWait: 0s
    # Warning: Slack only
    - matchers:
      - name: severity
        value: warning
      receiver: slack-warnings
    # Silence noisy namespace during maintenance
    - matchers:
      - name: namespace
        value: staging
      receiver: 'null'
  receivers:
  - name: 'null'
  - name: slack-critical
    slackConfigs:
    - apiURL:
        name: alertmanager-slack-secret
        key: webhookUrl
      channel: '#alerts-critical'
      title: '{{ template "slack.default.title" . }}'
      text: >-
        {{ range .Alerts }}
        *Alert:* {{ .Annotations.summary }}
        *Namespace:* {{ .Labels.namespace }}
        *Pod:* {{ .Labels.pod }}
        {{ end }}
      sendResolved: true
  - name: pagerduty-oncall
    pagerdutyConfigs:
    - routingKey:
        name: alertmanager-pagerduty-secret
        key: routingKey
      description: '{{ .CommonAnnotations.summary }}'

ServiceMonitor and PodMonitor CRDs

ServiceMonitor and PodMonitor tell the Prometheus Operator which services/pods to scrape. This is the preferred way to add new scrape targets — no need to edit Prometheus configuration files.

# ServiceMonitor — scrape a Service's metrics endpoint
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: production
  labels:
    release: kube-prometheus-stack   # Must match Prometheus's serviceMonitorSelector
spec:
  selector:
    matchLabels:
      app: my-app                    # Matches the Service labels
  endpoints:
  - port: metrics                    # Port name on the Service
    path: /actuator/prometheus       # Spring Boot Actuator path
    interval: 30s
    scheme: http
    tlsConfig:
      insecureSkipVerify: false
  namespaceSelector:
    matchNames:
    - production
# PodMonitor — scrape pods directly (when no Service exists)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-worker-monitor
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: my-worker
  podMetricsEndpoints:
  - port: metrics
    path: /metrics
    interval: 15s
# Verify ServiceMonitor is being picked up by Prometheus
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# In Prometheus UI: Status > Targets — look for your service
Note: The ServiceMonitor's release: kube-prometheus-stack label must match the serviceMonitorSelector configured on your Prometheus resource. Check with: kubectl get prometheus -n monitoring -o yaml | grep serviceMonitorSelector -A5

PrometheusRule for Custom Alerts

Define alerting and recording rules as PrometheusRule CRDs. The Prometheus Operator automatically loads them into Prometheus.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: production
  labels:
    release: kube-prometheus-stack
    app: kube-prometheus-stack
spec:
  groups:
  - name: my-app.rules
    interval: 30s
    rules:
    # Recording rule: pre-compute expensive query
    - record: job:http_requests:rate5m
      expr: sum by(job) (rate(http_requests_total[5m]))

    # Alert: high error rate
    - alert: HighErrorRate
      expr: |
        sum by(service) (rate(http_requests_total{status=~"5.."}[5m]))
          /
        sum by(service) (rate(http_requests_total[5m]))
          > 0.05
      for: 5m
      labels:
        severity: critical
        namespace: production
      annotations:
        summary: "High error rate on {{ $labels.service }}"
        description: "Error rate is {{ $value | humanizePercentage }} on {{ $labels.service }}"
        runbook_url: "https://wiki.example.com/runbooks/high-error-rate"

    # Alert: pod restart loop
    - alert: PodCrashLooping
      expr: |
        rate(kube_pod_container_status_restarts_total[15m]) * 60 > 3
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} is crash looping"
        description: "Container {{ $labels.container }} in pod {{ $labels.pod }} restarted {{ $value }} times/min"

    # Alert: deployment not progressing
    - alert: DeploymentReplicasMismatch
      expr: |
        kube_deployment_spec_replicas != kube_deployment_status_available_replicas
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "Deployment {{ $labels.deployment }} has unavailable replicas"

Grafana Loki for Log Aggregation

Loki is Grafana's log aggregation system designed for Kubernetes. It indexes only labels (not log content), making it much cheaper than Elasticsearch at scale. Promtail is the log shipper that runs as a DaemonSet.

# Install Loki stack (Loki + Promtail)
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi \
  --set loki.persistence.storageClassName=gp3 \
  --set promtail.enabled=true \
  --set grafana.enabled=false   # Already installed via kube-prometheus-stack
# Add Loki as a data source in Grafana
# In Grafana UI: Connections > Data Sources > Add > Loki
# URL: http://loki.monitoring.svc:3100

# Or configure it via Helm values for kube-prometheus-stack:
# grafana.additionalDataSources:
# - name: Loki
#   type: loki
#   url: http://loki.monitoring.svc:3100

Query logs in Grafana with LogQL:

# All logs from a specific pod
{namespace="production", pod=~"my-app-.*"}

# Filter for ERROR logs in the last 5 minutes
{namespace="production", app="my-app"} |= "ERROR"

# Parse structured JSON logs and filter on a field
{namespace="production"} | json | level="error" | duration > 1000

# Count error rate over time (for Grafana panel)
sum(rate({namespace="production"} |= "ERROR" [5m])) by (app)

# Find specific exception
{namespace="production", container="app"} |~ "NullPointerException"

Custom Application Metrics

Exposing application-specific metrics lets you alert on business logic, not just infrastructure. Here's how to do it in Java (Micrometer) and Python.

Java with Micrometer (Spring Boot)

# pom.xml dependencies
# spring-boot-starter-actuator
# micrometer-registry-prometheus
# application.properties
management.endpoints.web.exposure.include=health,prometheus,info
management.endpoint.prometheus.enabled=true
management.metrics.export.prometheus.enabled=true
# Metrics available at /actuator/prometheus
// Custom counter in a Spring service
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;

@Service
public class OrderService {
    private final Counter ordersCreated;
    private final Counter ordersFailed;
    private final Timer orderProcessingTime;

    public OrderService(MeterRegistry registry) {
        this.ordersCreated = Counter.builder("orders.created.total")
            .description("Total orders successfully created")
            .tag("region", "us-east")
            .register(registry);
        this.ordersFailed = Counter.builder("orders.failed.total")
            .description("Total orders that failed")
            .register(registry);
        this.orderProcessingTime = Timer.builder("order.processing.duration")
            .description("Time to process an order")
            .publishPercentiles(0.5, 0.95, 0.99)
            .register(registry);
    }

    public Order createOrder(OrderRequest req) {
        return orderProcessingTime.record(() -> {
            try {
                Order order = processOrder(req);
                ordersCreated.increment();
                return order;
            } catch (Exception e) {
                ordersFailed.increment();
                throw e;
            }
        });
    }
}

Python with prometheus_client

pip install prometheus-client
from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time

# Define metrics at module level
REQUEST_COUNT = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)
REQUEST_LATENCY = Histogram(
    'http_request_duration_seconds',
    'HTTP request latency',
    ['endpoint'],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)
ACTIVE_CONNECTIONS = Gauge(
    'active_connections',
    'Number of active connections'
)

# Use in your Flask/FastAPI handler
def process_request(method, endpoint):
    ACTIVE_CONNECTIONS.inc()
    start = time.time()
    try:
        result = handle(method, endpoint)
        REQUEST_COUNT.labels(method=method, endpoint=endpoint, status='200').inc()
        return result
    except Exception as e:
        REQUEST_COUNT.labels(method=method, endpoint=endpoint, status='500').inc()
        raise
    finally:
        REQUEST_LATENCY.labels(endpoint=endpoint).observe(time.time() - start)
        ACTIVE_CONNECTIONS.dec()

# Expose metrics on port 8080
start_http_server(8080)

Frequently Asked Questions

How much storage does Prometheus need for a 50-node cluster?

A rough estimate: Prometheus uses about 1–2 bytes per sample. With 50 nodes and typical kube-state-metrics + node-exporter scraping every 30s, expect roughly 500K–1M active time series. At 15-day retention, that's 20–40 GB. At 30-day retention, 40–80 GB. Use the prometheus_tsdb_storage_blocks_bytes metric on your existing cluster to measure actual usage. Enable compression with --storage.tsdb.wal-compression to reduce this by ~30%.

Why are my ServiceMonitors not being picked up by Prometheus?

The most common cause is label mismatch. Check: (1) the ServiceMonitor has a label matching prometheus.prometheusSpec.serviceMonitorSelector (often release: kube-prometheus-stack), (2) the ServiceMonitor's namespace matches serviceMonitorNamespaceSelector (or it's set to match all namespaces), (3) the port name in the ServiceMonitor matches the port name defined in the Service spec. Check Prometheus UI under Status > Targets and Status > Configuration for confirmation.

How do I silence a noisy alert during a maintenance window?

Use the AlertManager UI or API to create a silence: in the AlertManager UI, click "New Silence", set the matcher labels (e.g., alertname=PodCrashLooping, namespace=staging), set start/end time, and add a comment. Via CLI: amtool silence add --alertmanager.url=http://localhost:9093 alertname=PodCrashLooping namespace=staging --comment="Planned maintenance" --duration=2h. Silences don't delete alerts — they suppress notifications.

What's the difference between Loki and Elasticsearch for Kubernetes logs?

Loki indexes only labels (namespace, pod, container, app), not the full log content. This makes it 10–100x cheaper to operate at scale. The tradeoff: full-text search is slower since Loki must stream and grep log chunks. For Kubernetes operational logs where you filter by pod/namespace and grep for patterns, Loki is the right choice. Use Elasticsearch when you need complex full-text search, log parsing, or integration with the broader ELK ecosystem.

How do I reduce Prometheus cardinality issues?

High cardinality (too many unique label combinations) is the #1 Prometheus performance issue. Avoid using high-cardinality values as labels: user IDs, request IDs, IP addresses, and URLs with path parameters all blow up cardinality. Use metric_relabel_configs in ServiceMonitor to drop or relabel high-cardinality labels before ingestion. Monitor cardinality with prometheus_tsdb_head_series — above 1M series on a single Prometheus instance, consider sharding or Thanos/Mimir.