Kubernetes Thanos: Long-Term Prometheus Metrics Storage

Prometheus is the de facto monitoring standard for Kubernetes, but it was designed for short-term local storage — its local TSDB is not meant to retain months or years of data, and running multiple Prometheus instances across clusters gives you fragmented views that are hard to query together. Thanos solves both problems by adding a sidecar to each Prometheus pod that ships blocks to object storage (S3, GCS, Azure Blob), and providing a global Querier that merges results from all Prometheus instances in real time. The result is unlimited long-term retention, a single query endpoint across all clusters, and Prometheus HA without data gaps.

Thanos Architecture Overview
Deploying the Thanos Sidecar
Object Store Configuration
Thanos Store Gateway
Thanos Querier: Global Query View
Compactor and Downsampling
Thanos Ruler for Global Alerting
Production Sizing and Best Practices

Thanos Architecture Overview

Thanos is a set of loosely coupled components that each handle one aspect of the long-term metrics problem. Understanding their roles before deployment prevents configuration mistakes:

Sidecar — runs next to each Prometheus pod. It exposes Prometheus data over gRPC (for real-time queries) and uploads completed 2-hour TSDB blocks to object storage.
Store Gateway — reads historical blocks from object storage and exposes them over the same gRPC API as the Sidecar. The Querier cannot tell the difference between a Sidecar and a Store Gateway endpoint.
Querier — receives PromQL queries, fans them out to all registered Store endpoints (Sidecars + Store Gateways), deduplicates overlapping results (for HA Prometheus pairs), and returns a merged response.
Compactor — a single-instance job that compacts small blocks in object storage into larger ones and produces 5-minute and 1-hour downsampled versions for faster long-range queries.
Ruler — evaluates recording rules and alerting rules against the global Querier view, enabling cross-cluster alerts that Prometheus cannot express alone.
Receive — accepts Prometheus remote-write traffic and writes to object storage directly, useful when sidecars are not feasible (e.g., Prometheus in different networks).

Note: Only one Compactor should run per object storage bucket. Running multiple Compactors against the same bucket causes data corruption. Use a lock mechanism or ensure the Compactor runs as a Kubernetes CronJob rather than a long-running Deployment.

Deploying the Thanos Sidecar

The Thanos Sidecar runs as an additional container in your Prometheus pod. It mounts the same data directory as Prometheus and communicates with Prometheus via its HTTP API to read TSDB metadata.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: prometheus
  template:
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.51.0
          args:
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/prometheus
            - --storage.tsdb.min-block-duration=2h
            - --storage.tsdb.max-block-duration=2h   # Required for Thanos
            - --web.enable-lifecycle
          volumeMounts:
            - name: prometheus-data
              mountPath: /prometheus

        - name: thanos-sidecar
          image: quay.io/thanos/thanos:v0.35.0
          args:
            - sidecar
            - --prometheus.url=http://localhost:9090
            - --tsdb.path=/prometheus
            - --objstore.config-file=/etc/thanos/objstore.yml
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
          ports:
            - name: grpc
              containerPort: 10901
            - name: http
              containerPort: 10902
          volumeMounts:
            - name: prometheus-data
              mountPath: /prometheus
            - name: thanos-objstore
              mountPath: /etc/thanos
  volumeClaimTemplates:
    - metadata:
        name: prometheus-data
      spec:
        accessModes: [ReadWriteOnce]
        resources:
          requests:
            storage: 50Gi

Object Store Configuration

Thanos uses a YAML configuration file to specify the object storage backend. The same config file is shared by the Sidecar, Store Gateway, and Compactor — store it as a Kubernetes Secret.

# objstore.yml — store as a Kubernetes Secret
type: S3
config:
  bucket: my-thanos-metrics
  endpoint: s3.us-east-1.amazonaws.com
  region: us-east-1
  # For EKS with IRSA, omit access_key/secret_key
  # and annotate the ServiceAccount with the IAM role ARN

# Create the Secret
kubectl create secret generic thanos-objstore \
  --from-file=objstore.yml=./objstore.yml \
  -n monitoring

# For GCS
# type: GCS
# config:
#   bucket: my-thanos-metrics
#   service_account: |
#     { ...GCP service account JSON... }

For AWS, attach an IAM policy granting s3:GetObject, s3:PutObject, s3:DeleteObject, s3:ListBucket to the bucket. Use IRSA on EKS to avoid embedding credentials in the Secret.

Thanos Store Gateway

The Store Gateway serves historical data from object storage. It maintains a small local cache of block indexes to avoid full S3 scans on every query. For large deployments with terabytes of metrics, use multiple Store Gateway replicas with sharding.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-store
  template:
    spec:
      containers:
        - name: thanos-store
          image: quay.io/thanos/thanos:v0.35.0
          args:
            - store
            - --data-dir=/var/thanos/store
            - --objstore.config-file=/etc/thanos/objstore.yml
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
            - --index-cache-size=500MB
            - --chunk-pool-size=2GB
          ports:
            - name: grpc
              containerPort: 10901
          volumeMounts:
            - name: store-data
              mountPath: /var/thanos/store
            - name: thanos-objstore
              mountPath: /etc/thanos
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 2
              memory: 4Gi
  volumeClaimTemplates:
    - metadata:
        name: store-data
      spec:
        accessModes: [ReadWriteOnce]
        resources:
          requests:
            storage: 10Gi

Thanos Querier: Global Query View

The Querier is the entry point for all PromQL queries. It discovers Store endpoints via static configuration, DNS service discovery, or the Thanos SD mechanism, then fans queries out and merges results.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-querier
  template:
    spec:
      containers:
        - name: thanos-querier
          image: quay.io/thanos/thanos:v0.35.0
          args:
            - query
            - --http-address=0.0.0.0:9090
            - --grpc-address=0.0.0.0:10901
            - --query.replica-label=prometheus_replica
            # Sidecar endpoints (one per Prometheus replica)
            - --store=dnssrv+_grpc._tcp.prometheus-headless.monitoring.svc
            # Store Gateway endpoint
            - --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc
            - --query.auto-downsampling
          ports:
            - name: http
              containerPort: 9090

Point your Grafana data source at the Querier's HTTP service. The Querier exposes the same Prometheus HTTP API, so Grafana dashboards work without any changes. The --query.replica-label flag tells the Querier to deduplicate series that differ only in the specified label — preventing duplicate data points from HA Prometheus pairs.

Compactor and Downsampling

The Compactor reduces object storage costs and speeds up long-range queries by merging small 2-hour blocks into larger blocks and creating 5-minute and 1-hour downsampled versions. Run it as a single-instance Deployment with a persistent volume for temporary work.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-compactor
  namespace: monitoring
spec:
  replicas: 1   # MUST be 1 — multiple compactors corrupt data
  selector:
    matchLabels:
      app: thanos-compactor
  template:
    spec:
      containers:
        - name: thanos-compactor
          image: quay.io/thanos/thanos:v0.35.0
          args:
            - compact
            - --wait
            - --objstore.config-file=/etc/thanos/objstore.yml
            - --data-dir=/var/thanos/compact
            - --retention.resolution-raw=30d
            - --retention.resolution-5m=90d
            - --retention.resolution-1h=365d
            - --http-address=0.0.0.0:10902
          volumeMounts:
            - name: compact-data
              mountPath: /var/thanos/compact
            - name: thanos-objstore
              mountPath: /etc/thanos

Retention tiers: Raw resolution is kept for 30 days (for precise short-term queries), 5-minute downsamples for 90 days (good enough for weekly trends), and 1-hour downsamples for 1 year (for annual capacity planning). This tiered approach reduces storage costs by 80–90% compared to keeping raw data indefinitely.

Thanos Ruler for Global Alerting

Prometheus alerting rules only see the metrics Prometheus itself scraped. Thanos Ruler evaluates rules against the global Querier, enabling cross-cluster alerts such as "alert if the total error rate across all clusters exceeds 1%."

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-ruler
  namespace: monitoring
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: thanos-ruler
          image: quay.io/thanos/thanos:v0.35.0
          args:
            - rule
            - --data-dir=/var/thanos/ruler
            - --eval-interval=1m
            - --rule-file=/etc/thanos/rules/*.yml
            - --alertmanagers.url=http://alertmanager.monitoring.svc:9093
            - --query=http://thanos-querier.monitoring.svc:9090
            - --objstore.config-file=/etc/thanos/objstore.yml
            - --label=ruler_cluster="production"

Production Sizing and Best Practices

Key considerations for running Thanos reliably at scale:

Store Gateway memory — allocate at least 1–2 GB per 100 million active series stored in object storage. The Store Gateway's index cache dramatically reduces S3 API calls; size it generously.
Querier timeout — set --query.timeout=5m for long-range queries. Default is 2 minutes which may be too short for year-long PromQL aggregations.
Block upload verification — monitor the thanos_sidecar_prometheus_up metric. If it drops to 0, the Sidecar lost connection to Prometheus and block uploads will fall behind.
Object storage costs — enable S3 Intelligent-Tiering on the metrics bucket. Older, infrequently accessed blocks will automatically move to cheaper storage tiers.
Grafana data source — configure the Querier as a Prometheus-compatible data source. Set the scrape interval to match your Prometheus scrape interval to avoid gaps in panels.

# Useful Thanos debugging commands
# Check what blocks are in object storage
thanos tools bucket ls --objstore.config-file=objstore.yml

# Verify block integrity
thanos tools bucket verify --objstore.config-file=objstore.yml

# Check Querier health
curl http://thanos-querier:9090/-/healthy