Prometheus and Grafana: Application Monitoring Setup (2026)
Prometheus and Grafana are the de facto monitoring stack for cloud-native applications. Prometheus scrapes metrics from your services on a pull basis, stores them in a time-series database, and evaluates alerting rules. Grafana queries Prometheus and renders rich dashboards. Together they give you request rates, error rates, latency percentiles, and resource utilization — the four signals you need to operate production systems confidently.
Prometheus Architecture
Prometheus is a pull-based system: it periodically scrapes HTTP endpoints (/metrics) on your services rather than receiving pushed data. This design makes it easy to detect when a service goes down (scrape fails) and keeps your application code decoupled from the monitoring backend.
Core components:
- Prometheus server: scrapes targets, stores time-series data in its embedded TSDB (two-hour blocks on disk), evaluates alerting rules.
- Exporters: adapters that expose metrics for systems that do not natively expose a
/metricsendpoint (node_exporter for OS metrics, mysqld_exporter, redis_exporter, etc.). - Pushgateway: for short-lived batch jobs that cannot be scraped — they push metrics before exiting.
- AlertManager: receives alerts from Prometheus, groups them, deduplicates, and routes to receivers (Slack, PagerDuty, email).
- Grafana: visualization layer — queries Prometheus via its HTTP API and renders dashboards.
Scrape Configuration
The main Prometheus configuration file defines scrape intervals and target discovery:
# prometheus.yml
global:
scrape_interval: 15s # how often to scrape targets
evaluation_interval: 15s # how often to evaluate alerting rules
scrape_timeout: 10s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- "rules/recording_rules.yml"
- "rules/alerting_rules.yml"
scrape_configs:
# Prometheus scrapes itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Spring Boot / Java apps (Micrometer exposes /actuator/prometheus)
- job_name: 'java-apps'
metrics_path: '/actuator/prometheus'
scrape_interval: 10s
static_configs:
- targets:
- 'api-service:8080'
- 'order-service:8080'
- 'payment-service:8080'
relabel_configs:
- source_labels: [__address__]
target_label: instance
- target_label: environment
replacement: production
# Node exporter (OS-level metrics)
- job_name: 'node-exporter'
static_configs:
- targets:
- 'web01:9100'
- 'web02:9100'
- 'db01:9100'
# Kubernetes pod discovery via service discovery
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Only scrape pods with annotation prometheus.io/scrape: "true"
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: "true"
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port,
__meta_kubernetes_pod_ip]
action: replace
regex: (\d+);(.*)
replacement: $2:$1
target_label: __address__
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
Essential PromQL Queries
PromQL (Prometheus Query Language) is the query language for selecting and aggregating time-series data. These are the queries you will use most in production:
# --- Request rate (per-second over 5 min window) ---
rate(http_server_requests_seconds_count[5m])
# Rate per service and HTTP status code
sum by (job, status) (rate(http_server_requests_seconds_count[5m]))
# --- Error rate (ratio of 5xx to total) ---
sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/
sum(rate(http_server_requests_seconds_count[5m]))
# --- p99 latency using histogram_quantile ---
histogram_quantile(
0.99,
sum by (le, job) (rate(http_server_requests_seconds_bucket[5m]))
)
# p50, p95, p99 — all three at once (use in Grafana with legend {{quantile}})
histogram_quantile(0.50, sum by (le) (rate(http_server_requests_seconds_bucket[5m])))
histogram_quantile(0.95, sum by (le) (rate(http_server_requests_seconds_bucket[5m])))
histogram_quantile(0.99, sum by (le) (rate(http_server_requests_seconds_bucket[5m])))
# --- JVM memory ---
# Heap used
jvm_memory_used_bytes{area="heap"}
# Heap used as percentage of max
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} * 100
# --- CPU utilization (node exporter) ---
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# --- Top 5 slowest endpoints ---
topk(5,
histogram_quantile(0.99,
sum by (le, uri) (rate(http_server_requests_seconds_bucket[5m]))
)
)
# --- increase() — total count over a period (not per-second rate) ---
increase(http_server_requests_seconds_count{status="500"}[1h])
# --- Disk usage percentage ---
(node_filesystem_size_bytes - node_filesystem_free_bytes)
/ node_filesystem_size_bytes * 100
Recording and Alerting Rules
Recording rules pre-compute expensive queries and store the result as a new time-series — essential for dashboard performance and alert evaluation speed.
# rules/recording_rules.yml
groups:
- name: http_request_rates
interval: 30s
rules:
# Pre-compute per-job request rate
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_server_requests_seconds_count[5m]))
# Pre-compute per-job error rate
- record: job:http_errors:rate5m
expr: >
sum by (job) (rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/
sum by (job) (rate(http_server_requests_seconds_count[5m]))
# Pre-compute p99 latency
- record: job:http_latency_p99:5m
expr: >
histogram_quantile(0.99,
sum by (le, job) (rate(http_server_requests_seconds_bucket[5m]))
)
# rules/alerting_rules.yml
groups:
- name: application_alerts
rules:
- alert: HighErrorRate
expr: job:http_errors:rate5m > 0.05
for: 2m
labels:
severity: critical
team: backend
annotations:
summary: "High error rate on {{ $labels.job }}"
description: >
Error rate is {{ $value | humanizePercentage }} on {{ $labels.job }}
(threshold: 5%). Check logs immediately.
runbook_url: "https://wiki.example.com/runbooks/high-error-rate"
- alert: HighLatencyP99
expr: job:http_latency_p99:5m > 1.0
for: 5m
labels:
severity: warning
team: backend
annotations:
summary: "p99 latency above 1s on {{ $labels.job }}"
description: "p99 latency is {{ $value }}s on {{ $labels.job }}."
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} / {{ $labels.instance }} is DOWN"
description: "Prometheus cannot scrape {{ $labels.instance }}."
- alert: HighJvmHeapUsage
expr: >
jvm_memory_used_bytes{area="heap"}
/ jvm_memory_max_bytes{area="heap"} > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "JVM heap usage above 85% on {{ $labels.instance }}"
AlertManager Configuration
AlertManager receives firing alerts from Prometheus, groups related alerts, deduplicates, applies inhibition rules, and routes to the correct receiver:
# alertmanager.yml
global:
resolve_timeout: 5m
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'app-password-here'
# Route tree: top-level route, then children matched by labels
route:
receiver: 'default-slack'
group_by: ['alertname', 'job']
group_wait: 30s # wait before sending first notification in a group
group_interval: 5m # wait before sending new notifications for a group
repeat_interval: 4h # wait before resending if still firing
routes:
# Critical alerts → PagerDuty immediately
- match:
severity: critical
receiver: pagerduty
repeat_interval: 1h
# Database alerts → DBA team Slack channel
- match_re:
job: '.*db.*|.*postgres.*|.*mysql.*'
receiver: dba-slack
# Business hours only for warnings
- match:
severity: warning
receiver: warning-email
active_time_intervals:
- business_hours
receivers:
- name: 'default-slack'
slack_configs:
- channel: '#alerts'
title: '[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}'
text: >
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
*Severity:* {{ .Labels.severity }}
{{ end }}
send_resolved: true
- name: 'pagerduty'
pagerduty_configs:
- routing_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
description: '{{ .CommonAnnotations.summary }}'
- name: 'dba-slack'
slack_configs:
- channel: '#dba-alerts'
send_resolved: true
- name: 'warning-email'
email_configs:
- to: 'oncall@example.com'
send_resolved: true
# Inhibition: suppress warning if critical is already firing for same job
inhibit_rules:
- source_match:
severity: critical
target_match:
severity: warning
equal: ['alertname', 'job']
time_intervals:
- name: business_hours
time_intervals:
- weekdays: ['monday:friday']
times: ['09:00:00:00:17:00:00:00']
Grafana Datasource and Dashboards
Add Prometheus as a Grafana datasource via the UI (Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090) or via provisioning YAML:
# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
jsonData:
timeInterval: '15s'
httpMethod: POST
1860 — Node Exporter Full (OS metrics)
4701 — JVM Micrometer dashboard
9614 — NGINX Ingress Controller
6417 — Kubernetes Cluster
Import via Dashboards → Import → enter ID → Load.
For custom dashboards, use Grafana variables to make a single dashboard work for all services:
# Grafana variable: job (dropdown listing all scraped jobs)
# Variable name: job
# Type: Query
# Query: label_values(up, job)
# Use $job variable in panel queries:
rate(http_server_requests_seconds_count{job="$job"}[5m])
histogram_quantile(0.99,
sum by (le) (
rate(http_server_requests_seconds_bucket{job="$job"}[5m])
)
)
Custom Metrics: Java with Micrometer
Spring Boot with Micrometer exposes a /actuator/prometheus endpoint automatically. Add custom business metrics:
// pom.xml dependencies
// spring-boot-starter-actuator
// micrometer-registry-prometheus
// application.yml
management:
endpoints:
web:
exposure:
include: health,info,prometheus,metrics
metrics:
tags:
application: ${spring.application.name}
environment: ${spring.profiles.active:default}
// Custom metrics in a Spring service
@Service
public class OrderService {
private final Counter ordersCreated;
private final Counter ordersFailed;
private final Timer orderProcessingTime;
private final Gauge activeOrders;
private final AtomicInteger activeOrderCount = new AtomicInteger(0);
public OrderService(MeterRegistry registry) {
this.ordersCreated = Counter.builder("orders.created.total")
.description("Total orders created")
.tag("service", "order-service")
.register(registry);
this.ordersFailed = Counter.builder("orders.failed.total")
.description("Total orders that failed processing")
.register(registry);
this.orderProcessingTime = Timer.builder("orders.processing.duration")
.description("Time to process an order end-to-end")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
this.activeOrders = Gauge.builder("orders.active", activeOrderCount, AtomicInteger::get)
.description("Number of orders currently being processed")
.register(registry);
}
public Order processOrder(CreateOrderRequest req) {
activeOrderCount.incrementAndGet();
return orderProcessingTime.record(() -> {
try {
Order order = doProcessOrder(req);
ordersCreated.increment();
return order;
} catch (Exception e) {
ordersFailed.increment(Tags.of("reason", e.getClass().getSimpleName()));
throw e;
} finally {
activeOrderCount.decrementAndGet();
}
});
}
}
Custom Metrics: Python with prometheus_client
# Install
pip install prometheus-client
from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time
import random
# Define metrics at module level
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status_code']
)
REQUEST_LATENCY = Histogram(
'http_request_duration_seconds',
'HTTP request duration in seconds',
['method', 'endpoint'],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)
ACTIVE_REQUESTS = Gauge(
'http_requests_in_progress',
'Number of HTTP requests in progress'
)
QUEUE_SIZE = Gauge(
'job_queue_size',
'Current number of jobs in the processing queue'
)
# Usage in your application
def handle_request(method, endpoint):
ACTIVE_REQUESTS.inc()
start = time.time()
try:
# ... process request ...
status = "200"
time.sleep(random.uniform(0.01, 0.5)) # simulate work
except Exception:
status = "500"
raise
finally:
duration = time.time() - start
REQUEST_COUNT.labels(method=method, endpoint=endpoint, status_code=status).inc()
REQUEST_LATENCY.labels(method=method, endpoint=endpoint).observe(duration)
ACTIVE_REQUESTS.dec()
if __name__ == '__main__':
# Expose /metrics on port 8000
start_http_server(8000)
print("Metrics server started on port 8000")
while True:
handle_request("GET", "/api/orders")
time.sleep(0.1)
Kubernetes Monitoring with PrometheusRule CRD
When using the Prometheus Operator (kube-prometheus-stack Helm chart), alerting rules are managed as Kubernetes CRDs:
# PrometheusRule CRD — managed by Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: myapp-alerts
namespace: monitoring
labels:
release: kube-prometheus-stack # must match Prometheus operator selector
spec:
groups:
- name: myapp.rules
rules:
- alert: MyAppHighErrorRate
expr: |
sum(rate(http_server_requests_seconds_count{
job="myapp", status=~"5.."}[5m]))
/
sum(rate(http_server_requests_seconds_count{job="myapp"}[5m]))
> 0.01
for: 2m
labels:
severity: critical
app: myapp
annotations:
summary: "myapp error rate above 1%"
description: "Current error rate: {{ $value | humanizePercentage }}"
---
# ServiceMonitor — tells Prometheus Operator which services to scrape
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: myapp
namespaceSelector:
matchNames:
- production
endpoints:
- port: http
path: /actuator/prometheus
interval: 15s
FAQ
- How long does Prometheus retain data by default?
- 15 days by default. Adjust with
--storage.tsdb.retention.time=30dor by size:--storage.tsdb.retention.size=50GB. For long-term retention (months/years), use Thanos or Cortex to ship data to object storage (S3, GCS) while keeping Prometheus as a short-term local cache. - What is the difference between rate() and irate() in PromQL?
rate()computes the per-second rate over the entire range window, averaging over all data points — it is smooth and suitable for alerts and dashboards.irate()uses only the last two data points — it is more responsive to sudden spikes but noisier. Userate()for alerting rules (you want smooth, stable values) and considerirate()only for exploratory queries where you want to see momentary spikes.- How do I avoid alert fatigue?
- Set meaningful
for:durations (at least 2–5 minutes) so transient spikes do not page anyone. Use AlertManager'sgroup_waitandgroup_intervalto batch related alerts. Implement inhibition rules so a critical alert suppresses related warning alerts. Review your alert backlog monthly and silence or delete alerts that never require action. - What is the Pushgateway and when should I use it?
- The Pushgateway is for batch jobs that run and exit before Prometheus can scrape them. The job pushes its metrics to the Pushgateway, which holds them until Prometheus scrapes it. Do not use Pushgateway for long-running services — use it only for cron jobs, one-off scripts, or CI pipeline steps that need to report metrics.
- How do I monitor multiple Kubernetes clusters with one Grafana?
- Add each cluster's Prometheus as a separate datasource in Grafana (with a distinct name like
prometheus-prod,prometheus-staging). Create a Grafana variabledatasourceof type "Data source" filtered to Prometheus — dashboards that use${datasource}in their queries will work across all clusters with a single dropdown switch.