Prometheus and Grafana: Application Monitoring Setup (2026)

Prometheus and Grafana are the de facto monitoring stack for cloud-native applications. Prometheus scrapes metrics from your services on a pull basis, stores them in a time-series database, and evaluates alerting rules. Grafana queries Prometheus and renders rich dashboards. Together they give you request rates, error rates, latency percentiles, and resource utilization — the four signals you need to operate production systems confidently.

Prometheus Architecture

Prometheus is a pull-based system: it periodically scrapes HTTP endpoints (/metrics) on your services rather than receiving pushed data. This design makes it easy to detect when a service goes down (scrape fails) and keeps your application code decoupled from the monitoring backend.

Core components:

  • Prometheus server: scrapes targets, stores time-series data in its embedded TSDB (two-hour blocks on disk), evaluates alerting rules.
  • Exporters: adapters that expose metrics for systems that do not natively expose a /metrics endpoint (node_exporter for OS metrics, mysqld_exporter, redis_exporter, etc.).
  • Pushgateway: for short-lived batch jobs that cannot be scraped — they push metrics before exiting.
  • AlertManager: receives alerts from Prometheus, groups them, deduplicates, and routes to receivers (Slack, PagerDuty, email).
  • Grafana: visualization layer — queries Prometheus via its HTTP API and renders dashboards.

Scrape Configuration

The main Prometheus configuration file defines scrape intervals and target discovery:

# prometheus.yml
global:
  scrape_interval: 15s       # how often to scrape targets
  evaluation_interval: 15s   # how often to evaluate alerting rules
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - "rules/recording_rules.yml"
  - "rules/alerting_rules.yml"

scrape_configs:
  # Prometheus scrapes itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Spring Boot / Java apps (Micrometer exposes /actuator/prometheus)
  - job_name: 'java-apps'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 10s
    static_configs:
      - targets:
          - 'api-service:8080'
          - 'order-service:8080'
          - 'payment-service:8080'
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
      - target_label: environment
        replacement: production

  # Node exporter (OS-level metrics)
  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - 'web01:9100'
          - 'web02:9100'
          - 'db01:9100'

  # Kubernetes pod discovery via service discovery
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Only scrape pods with annotation prometheus.io/scrape: "true"
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port,
                         __meta_kubernetes_pod_ip]
        action: replace
        regex: (\d+);(.*)
        replacement: $2:$1
        target_label: __address__
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

Essential PromQL Queries

PromQL (Prometheus Query Language) is the query language for selecting and aggregating time-series data. These are the queries you will use most in production:

# --- Request rate (per-second over 5 min window) ---
rate(http_server_requests_seconds_count[5m])

# Rate per service and HTTP status code
sum by (job, status) (rate(http_server_requests_seconds_count[5m]))

# --- Error rate (ratio of 5xx to total) ---
sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/
sum(rate(http_server_requests_seconds_count[5m]))

# --- p99 latency using histogram_quantile ---
histogram_quantile(
  0.99,
  sum by (le, job) (rate(http_server_requests_seconds_bucket[5m]))
)

# p50, p95, p99 — all three at once (use in Grafana with legend {{quantile}})
histogram_quantile(0.50, sum by (le) (rate(http_server_requests_seconds_bucket[5m])))
histogram_quantile(0.95, sum by (le) (rate(http_server_requests_seconds_bucket[5m])))
histogram_quantile(0.99, sum by (le) (rate(http_server_requests_seconds_bucket[5m])))

# --- JVM memory ---
# Heap used
jvm_memory_used_bytes{area="heap"}
# Heap used as percentage of max
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} * 100

# --- CPU utilization (node exporter) ---
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# --- Top 5 slowest endpoints ---
topk(5,
  histogram_quantile(0.99,
    sum by (le, uri) (rate(http_server_requests_seconds_bucket[5m]))
  )
)

# --- increase() — total count over a period (not per-second rate) ---
increase(http_server_requests_seconds_count{status="500"}[1h])

# --- Disk usage percentage ---
(node_filesystem_size_bytes - node_filesystem_free_bytes)
/ node_filesystem_size_bytes * 100

Recording and Alerting Rules

Recording rules pre-compute expensive queries and store the result as a new time-series — essential for dashboard performance and alert evaluation speed.

# rules/recording_rules.yml
groups:
  - name: http_request_rates
    interval: 30s
    rules:
      # Pre-compute per-job request rate
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_server_requests_seconds_count[5m]))

      # Pre-compute per-job error rate
      - record: job:http_errors:rate5m
        expr: >
          sum by (job) (rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
          /
          sum by (job) (rate(http_server_requests_seconds_count[5m]))

      # Pre-compute p99 latency
      - record: job:http_latency_p99:5m
        expr: >
          histogram_quantile(0.99,
            sum by (le, job) (rate(http_server_requests_seconds_bucket[5m]))
          )
# rules/alerting_rules.yml
groups:
  - name: application_alerts
    rules:
      - alert: HighErrorRate
        expr: job:http_errors:rate5m > 0.05
        for: 2m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "High error rate on {{ $labels.job }}"
          description: >
            Error rate is {{ $value | humanizePercentage }} on {{ $labels.job }}
            (threshold: 5%). Check logs immediately.
          runbook_url: "https://wiki.example.com/runbooks/high-error-rate"

      - alert: HighLatencyP99
        expr: job:http_latency_p99:5m > 1.0
        for: 5m
        labels:
          severity: warning
          team: backend
        annotations:
          summary: "p99 latency above 1s on {{ $labels.job }}"
          description: "p99 latency is {{ $value }}s on {{ $labels.job }}."

      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.job }} / {{ $labels.instance }} is DOWN"
          description: "Prometheus cannot scrape {{ $labels.instance }}."

      - alert: HighJvmHeapUsage
        expr: >
          jvm_memory_used_bytes{area="heap"}
          / jvm_memory_max_bytes{area="heap"} > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "JVM heap usage above 85% on {{ $labels.instance }}"

AlertManager Configuration

AlertManager receives firing alerts from Prometheus, groups related alerts, deduplicates, applies inhibition rules, and routes to the correct receiver:

# alertmanager.yml
global:
  resolve_timeout: 5m
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'app-password-here'

# Route tree: top-level route, then children matched by labels
route:
  receiver: 'default-slack'
  group_by: ['alertname', 'job']
  group_wait: 30s        # wait before sending first notification in a group
  group_interval: 5m     # wait before sending new notifications for a group
  repeat_interval: 4h    # wait before resending if still firing

  routes:
    # Critical alerts → PagerDuty immediately
    - match:
        severity: critical
      receiver: pagerduty
      repeat_interval: 1h

    # Database alerts → DBA team Slack channel
    - match_re:
        job: '.*db.*|.*postgres.*|.*mysql.*'
      receiver: dba-slack

    # Business hours only for warnings
    - match:
        severity: warning
      receiver: warning-email
      active_time_intervals:
        - business_hours

receivers:
  - name: 'default-slack'
    slack_configs:
      - channel: '#alerts'
        title: '[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}'
        text: >
          {{ range .Alerts }}
          *Alert:* {{ .Annotations.summary }}
          *Description:* {{ .Annotations.description }}
          *Severity:* {{ .Labels.severity }}
          {{ end }}
        send_resolved: true

  - name: 'pagerduty'
    pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
        description: '{{ .CommonAnnotations.summary }}'

  - name: 'dba-slack'
    slack_configs:
      - channel: '#dba-alerts'
        send_resolved: true

  - name: 'warning-email'
    email_configs:
      - to: 'oncall@example.com'
        send_resolved: true

# Inhibition: suppress warning if critical is already firing for same job
inhibit_rules:
  - source_match:
      severity: critical
    target_match:
      severity: warning
    equal: ['alertname', 'job']

time_intervals:
  - name: business_hours
    time_intervals:
      - weekdays: ['monday:friday']
        times: ['09:00:00:00:17:00:00:00']

Grafana Datasource and Dashboards

Add Prometheus as a Grafana datasource via the UI (Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090) or via provisioning YAML:

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    jsonData:
      timeInterval: '15s'
      httpMethod: POST
Community Dashboards: Import these dashboard IDs from grafana.com/dashboards for instant value:
1860 — Node Exporter Full (OS metrics)
4701 — JVM Micrometer dashboard
9614 — NGINX Ingress Controller
6417 — Kubernetes Cluster
Import via Dashboards → Import → enter ID → Load.

For custom dashboards, use Grafana variables to make a single dashboard work for all services:

# Grafana variable: job (dropdown listing all scraped jobs)
# Variable name: job
# Type: Query
# Query: label_values(up, job)

# Use $job variable in panel queries:
rate(http_server_requests_seconds_count{job="$job"}[5m])

histogram_quantile(0.99,
  sum by (le) (
    rate(http_server_requests_seconds_bucket{job="$job"}[5m])
  )
)

Custom Metrics: Java with Micrometer

Spring Boot with Micrometer exposes a /actuator/prometheus endpoint automatically. Add custom business metrics:

// pom.xml dependencies
// spring-boot-starter-actuator
// micrometer-registry-prometheus

// application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
  metrics:
    tags:
      application: ${spring.application.name}
      environment: ${spring.profiles.active:default}

// Custom metrics in a Spring service
@Service
public class OrderService {
    private final Counter ordersCreated;
    private final Counter ordersFailed;
    private final Timer orderProcessingTime;
    private final Gauge activeOrders;
    private final AtomicInteger activeOrderCount = new AtomicInteger(0);

    public OrderService(MeterRegistry registry) {
        this.ordersCreated = Counter.builder("orders.created.total")
            .description("Total orders created")
            .tag("service", "order-service")
            .register(registry);

        this.ordersFailed = Counter.builder("orders.failed.total")
            .description("Total orders that failed processing")
            .register(registry);

        this.orderProcessingTime = Timer.builder("orders.processing.duration")
            .description("Time to process an order end-to-end")
            .publishPercentiles(0.5, 0.95, 0.99)
            .register(registry);

        this.activeOrders = Gauge.builder("orders.active", activeOrderCount, AtomicInteger::get)
            .description("Number of orders currently being processed")
            .register(registry);
    }

    public Order processOrder(CreateOrderRequest req) {
        activeOrderCount.incrementAndGet();
        return orderProcessingTime.record(() -> {
            try {
                Order order = doProcessOrder(req);
                ordersCreated.increment();
                return order;
            } catch (Exception e) {
                ordersFailed.increment(Tags.of("reason", e.getClass().getSimpleName()));
                throw e;
            } finally {
                activeOrderCount.decrementAndGet();
            }
        });
    }
}

Custom Metrics: Python with prometheus_client

# Install
pip install prometheus-client
from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time
import random

# Define metrics at module level
REQUEST_COUNT = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status_code']
)

REQUEST_LATENCY = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration in seconds',
    ['method', 'endpoint'],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

ACTIVE_REQUESTS = Gauge(
    'http_requests_in_progress',
    'Number of HTTP requests in progress'
)

QUEUE_SIZE = Gauge(
    'job_queue_size',
    'Current number of jobs in the processing queue'
)

# Usage in your application
def handle_request(method, endpoint):
    ACTIVE_REQUESTS.inc()
    start = time.time()
    try:
        # ... process request ...
        status = "200"
        time.sleep(random.uniform(0.01, 0.5))  # simulate work
    except Exception:
        status = "500"
        raise
    finally:
        duration = time.time() - start
        REQUEST_COUNT.labels(method=method, endpoint=endpoint, status_code=status).inc()
        REQUEST_LATENCY.labels(method=method, endpoint=endpoint).observe(duration)
        ACTIVE_REQUESTS.dec()

if __name__ == '__main__':
    # Expose /metrics on port 8000
    start_http_server(8000)
    print("Metrics server started on port 8000")
    while True:
        handle_request("GET", "/api/orders")
        time.sleep(0.1)

Kubernetes Monitoring with PrometheusRule CRD

When using the Prometheus Operator (kube-prometheus-stack Helm chart), alerting rules are managed as Kubernetes CRDs:

# PrometheusRule CRD — managed by Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: myapp-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack  # must match Prometheus operator selector
spec:
  groups:
    - name: myapp.rules
      rules:
        - alert: MyAppHighErrorRate
          expr: |
            sum(rate(http_server_requests_seconds_count{
              job="myapp", status=~"5.."}[5m]))
            /
            sum(rate(http_server_requests_seconds_count{job="myapp"}[5m]))
            > 0.01
          for: 2m
          labels:
            severity: critical
            app: myapp
          annotations:
            summary: "myapp error rate above 1%"
            description: "Current error rate: {{ $value | humanizePercentage }}"

---
# ServiceMonitor — tells Prometheus Operator which services to scrape
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: myapp
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: http
      path: /actuator/prometheus
      interval: 15s

FAQ

How long does Prometheus retain data by default?
15 days by default. Adjust with --storage.tsdb.retention.time=30d or by size: --storage.tsdb.retention.size=50GB. For long-term retention (months/years), use Thanos or Cortex to ship data to object storage (S3, GCS) while keeping Prometheus as a short-term local cache.
What is the difference between rate() and irate() in PromQL?
rate() computes the per-second rate over the entire range window, averaging over all data points — it is smooth and suitable for alerts and dashboards. irate() uses only the last two data points — it is more responsive to sudden spikes but noisier. Use rate() for alerting rules (you want smooth, stable values) and consider irate() only for exploratory queries where you want to see momentary spikes.
How do I avoid alert fatigue?
Set meaningful for: durations (at least 2–5 minutes) so transient spikes do not page anyone. Use AlertManager's group_wait and group_interval to batch related alerts. Implement inhibition rules so a critical alert suppresses related warning alerts. Review your alert backlog monthly and silence or delete alerts that never require action.
What is the Pushgateway and when should I use it?
The Pushgateway is for batch jobs that run and exit before Prometheus can scrape them. The job pushes its metrics to the Pushgateway, which holds them until Prometheus scrapes it. Do not use Pushgateway for long-running services — use it only for cron jobs, one-off scripts, or CI pipeline steps that need to report metrics.
How do I monitor multiple Kubernetes clusters with one Grafana?
Add each cluster's Prometheus as a separate datasource in Grafana (with a distinct name like prometheus-prod, prometheus-staging). Create a Grafana variable datasource of type "Data source" filtered to Prometheus — dashboards that use ${datasource} in their queries will work across all clusters with a single dropdown switch.