Kubernetes Thanos: Long-Term Prometheus Metrics Storage
Prometheus is the de facto monitoring standard for Kubernetes, but it was designed for short-term local storage — its local TSDB is not meant to retain months or years of data, and running multiple Prometheus instances across clusters gives you fragmented views that are hard to query together. Thanos solves both problems by adding a sidecar to each Prometheus pod that ships blocks to object storage (S3, GCS, Azure Blob), and providing a global Querier that merges results from all Prometheus instances in real time. The result is unlimited long-term retention, a single query endpoint across all clusters, and Prometheus HA without data gaps.
Table of Contents
Thanos Architecture Overview
Thanos is a set of loosely coupled components that each handle one aspect of the long-term metrics problem. Understanding their roles before deployment prevents configuration mistakes:
- Sidecar — runs next to each Prometheus pod. It exposes Prometheus data over gRPC (for real-time queries) and uploads completed 2-hour TSDB blocks to object storage.
- Store Gateway — reads historical blocks from object storage and exposes them over the same gRPC API as the Sidecar. The Querier cannot tell the difference between a Sidecar and a Store Gateway endpoint.
- Querier — receives PromQL queries, fans them out to all registered Store endpoints (Sidecars + Store Gateways), deduplicates overlapping results (for HA Prometheus pairs), and returns a merged response.
- Compactor — a single-instance job that compacts small blocks in object storage into larger ones and produces 5-minute and 1-hour downsampled versions for faster long-range queries.
- Ruler — evaluates recording rules and alerting rules against the global Querier view, enabling cross-cluster alerts that Prometheus cannot express alone.
- Receive — accepts Prometheus remote-write traffic and writes to object storage directly, useful when sidecars are not feasible (e.g., Prometheus in different networks).
Deploying the Thanos Sidecar
The Thanos Sidecar runs as an additional container in your Prometheus pod. It mounts the same data directory as Prometheus and communicates with Prometheus via its HTTP API to read TSDB metadata.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.51.0
args:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.min-block-duration=2h
- --storage.tsdb.max-block-duration=2h # Required for Thanos
- --web.enable-lifecycle
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
- name: thanos-sidecar
image: quay.io/thanos/thanos:v0.35.0
args:
- sidecar
- --prometheus.url=http://localhost:9090
- --tsdb.path=/prometheus
- --objstore.config-file=/etc/thanos/objstore.yml
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
ports:
- name: grpc
containerPort: 10901
- name: http
containerPort: 10902
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
- name: thanos-objstore
mountPath: /etc/thanos
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
Object Store Configuration
Thanos uses a YAML configuration file to specify the object storage backend. The same config file is shared by the Sidecar, Store Gateway, and Compactor — store it as a Kubernetes Secret.
# objstore.yml — store as a Kubernetes Secret
type: S3
config:
bucket: my-thanos-metrics
endpoint: s3.us-east-1.amazonaws.com
region: us-east-1
# For EKS with IRSA, omit access_key/secret_key
# and annotate the ServiceAccount with the IAM role ARN
# Create the Secret
kubectl create secret generic thanos-objstore \
--from-file=objstore.yml=./objstore.yml \
-n monitoring
# For GCS
# type: GCS
# config:
# bucket: my-thanos-metrics
# service_account: |
# { ...GCP service account JSON... }
For AWS, attach an IAM policy granting s3:GetObject, s3:PutObject, s3:DeleteObject, s3:ListBucket to the bucket. Use IRSA on EKS to avoid embedding credentials in the Secret.
Thanos Store Gateway
The Store Gateway serves historical data from object storage. It maintains a small local cache of block indexes to avoid full S3 scans on every query. For large deployments with terabytes of metrics, use multiple Store Gateway replicas with sharding.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-store
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app: thanos-store
template:
spec:
containers:
- name: thanos-store
image: quay.io/thanos/thanos:v0.35.0
args:
- store
- --data-dir=/var/thanos/store
- --objstore.config-file=/etc/thanos/objstore.yml
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --index-cache-size=500MB
- --chunk-pool-size=2GB
ports:
- name: grpc
containerPort: 10901
volumeMounts:
- name: store-data
mountPath: /var/thanos/store
- name: thanos-objstore
mountPath: /etc/thanos
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: store-data
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
Thanos Querier: Global Query View
The Querier is the entry point for all PromQL queries. It discovers Store endpoints via static configuration, DNS service discovery, or the Thanos SD mechanism, then fans queries out and merges results.
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-querier
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app: thanos-querier
template:
spec:
containers:
- name: thanos-querier
image: quay.io/thanos/thanos:v0.35.0
args:
- query
- --http-address=0.0.0.0:9090
- --grpc-address=0.0.0.0:10901
- --query.replica-label=prometheus_replica
# Sidecar endpoints (one per Prometheus replica)
- --store=dnssrv+_grpc._tcp.prometheus-headless.monitoring.svc
# Store Gateway endpoint
- --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc
- --query.auto-downsampling
ports:
- name: http
containerPort: 9090
Point your Grafana data source at the Querier's HTTP service. The Querier exposes the same Prometheus HTTP API, so Grafana dashboards work without any changes. The --query.replica-label flag tells the Querier to deduplicate series that differ only in the specified label — preventing duplicate data points from HA Prometheus pairs.
Compactor and Downsampling
The Compactor reduces object storage costs and speeds up long-range queries by merging small 2-hour blocks into larger blocks and creating 5-minute and 1-hour downsampled versions. Run it as a single-instance Deployment with a persistent volume for temporary work.
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-compactor
namespace: monitoring
spec:
replicas: 1 # MUST be 1 — multiple compactors corrupt data
selector:
matchLabels:
app: thanos-compactor
template:
spec:
containers:
- name: thanos-compactor
image: quay.io/thanos/thanos:v0.35.0
args:
- compact
- --wait
- --objstore.config-file=/etc/thanos/objstore.yml
- --data-dir=/var/thanos/compact
- --retention.resolution-raw=30d
- --retention.resolution-5m=90d
- --retention.resolution-1h=365d
- --http-address=0.0.0.0:10902
volumeMounts:
- name: compact-data
mountPath: /var/thanos/compact
- name: thanos-objstore
mountPath: /etc/thanos
Thanos Ruler for Global Alerting
Prometheus alerting rules only see the metrics Prometheus itself scraped. Thanos Ruler evaluates rules against the global Querier, enabling cross-cluster alerts such as "alert if the total error rate across all clusters exceeds 1%."
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-ruler
namespace: monitoring
spec:
replicas: 1
template:
spec:
containers:
- name: thanos-ruler
image: quay.io/thanos/thanos:v0.35.0
args:
- rule
- --data-dir=/var/thanos/ruler
- --eval-interval=1m
- --rule-file=/etc/thanos/rules/*.yml
- --alertmanagers.url=http://alertmanager.monitoring.svc:9093
- --query=http://thanos-querier.monitoring.svc:9090
- --objstore.config-file=/etc/thanos/objstore.yml
- --label=ruler_cluster="production"
Production Sizing and Best Practices
Key considerations for running Thanos reliably at scale:
- Store Gateway memory — allocate at least 1–2 GB per 100 million active series stored in object storage. The Store Gateway's index cache dramatically reduces S3 API calls; size it generously.
- Querier timeout — set
--query.timeout=5mfor long-range queries. Default is 2 minutes which may be too short for year-long PromQL aggregations. - Block upload verification — monitor the
thanos_sidecar_prometheus_upmetric. If it drops to 0, the Sidecar lost connection to Prometheus and block uploads will fall behind. - Object storage costs — enable S3 Intelligent-Tiering on the metrics bucket. Older, infrequently accessed blocks will automatically move to cheaper storage tiers.
- Grafana data source — configure the Querier as a Prometheus-compatible data source. Set the scrape interval to match your Prometheus scrape interval to avoid gaps in panels.
# Useful Thanos debugging commands
# Check what blocks are in object storage
thanos tools bucket ls --objstore.config-file=objstore.yml
# Verify block integrity
thanos tools bucket verify --objstore.config-file=objstore.yml
# Check Querier health
curl http://thanos-querier:9090/-/healthy