Kubernetes Multi-Cluster Management with Rancher

Rancher is an open-source multi-cluster Kubernetes management platform from SUSE that provides a unified control plane for provisioning, securing, and operating Kubernetes clusters across on-premises data centres, public clouds, and edge locations. Rather than SSH-ing into each cluster or juggling multiple kubeconfig files, Rancher gives platform teams a single dashboard and API to manage RBAC, deploy workloads with Fleet GitOps, enforce policies, and collect metrics — all from one place. This guide covers installing Rancher, importing and provisioning clusters, and using its key features in production.

Rancher Architecture

Rancher runs as a set of pods on a dedicated management Kubernetes cluster (the "local" cluster). It communicates with downstream clusters via an authenticated WebSocket tunnel established by the Rancher agent — meaning downstream clusters do not need to expose their API servers to the internet.

Key components:

  • Rancher Server — the central API server and UI. Runs on the local cluster. All kubectl commands, UI actions, and Fleet operations flow through it.
  • Cattle Cluster Agent — deployed in each downstream cluster. Establishes the tunnel back to Rancher Server and executes instructions from it.
  • Fleet Manager — built-in GitOps engine. Watches Git repositories and applies manifests, Helm charts, or Kustomize configs to target clusters based on label selectors.
  • Rancher Monitoring — installs kube-prometheus-stack in each cluster and provides pre-built Grafana dashboards for cluster health, node metrics, and workload status.
Note: Rancher Server itself must run on a highly available Kubernetes cluster (minimum 3 nodes with an etcd quorum). Running Rancher on Docker (rancher/rancher in a single container) is only suitable for evaluation — it does not support HA and cannot be migrated to a proper cluster installation.

Installing Rancher with Helm

Rancher is installed on a dedicated Kubernetes cluster — typically a small 3-node RKE2 or k3s cluster. You need cert-manager installed first for TLS certificate management.

# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true

# Add Rancher Helm repo (stable channel)
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update

# Install Rancher
helm upgrade --install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --create-namespace \
  --set hostname=rancher.example.com \
  --set bootstrapPassword=admin \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=ops@example.com \
  --set letsEncrypt.ingress.class=nginx \
  --set replicas=3

# Wait for rollout
kubectl rollout status deployment/rancher -n cattle-system
# Verify Rancher pods
kubectl get pods -n cattle-system
# rancher-xxx   1/1   Running  (x3)

# Get the bootstrap password
kubectl get secret --namespace cattle-system bootstrap-secret \
  -o jsonpath='{.data.bootstrapPassword}' | base64 -d

Importing Existing Clusters

Rancher can manage any CNCF-conformant Kubernetes cluster — EKS, AKS, GKE, or self-managed — by importing it. Importing deploys the Cattle Cluster Agent into the downstream cluster via a kubectl command that Rancher generates.

# In the Rancher UI: Cluster Management > Import Existing
# Rancher generates a command like this:

kubectl apply -f https://rancher.example.com/v3/import/xyz-cluster-token.yaml

# Or using the Rancher CLI
rancher login https://rancher.example.com --token my-api-token
rancher cluster import my-eks-cluster

After the agent connects, the cluster appears in the Rancher UI within 30–60 seconds. You can then manage namespaces, deploy workloads, and configure RBAC from the central dashboard without touching kubeconfig files directly. Rancher generates short-lived kubeconfig credentials that are scoped to what each user is allowed to access.

Provisioning New Clusters

Rancher can provision new clusters on AWS (EKS), Azure (AKS), GCP (GKE), or on VMs using RKE2. Cloud credentials are stored as Rancher Secrets and referenced in cluster definitions.

# Rancher cluster provisioning via the Provisioning API (v2)
apiVersion: provisioning.cattle.io/v1
kind: Cluster
metadata:
  name: prod-eks-cluster
  namespace: fleet-default
spec:
  cloudCredentialSecretName: cattle-global-data:aws-credentials
  kubernetesVersion: v1.30.0
  rkeConfig: {}   # Empty for imported clusters
  # For EKS provisioning:
  # eksConfig:
  #   region: us-east-1
  #   nodeGroups:
  #     - nodegroupName: workers
  #       instanceType: m5.xlarge
  #       desiredSize: 3
  #       minSize: 2
  #       maxSize: 10

Multi-Cluster RBAC and Projects

Rancher adds two abstraction layers above Kubernetes RBAC: Clusters and Projects. A Project is a group of namespaces within a single cluster. You assign users to clusters or projects with Rancher roles that map to underlying Kubernetes ClusterRole and Role bindings.

# Rancher ClusterRoleTemplateBinding — grant a user cluster-owner on a cluster
apiVersion: management.cattle.io/v3
kind: ClusterRoleTemplateBinding
metadata:
  name: dev-team-binding
  namespace: c-m-xxxxxxx   # cluster namespace in Rancher
spec:
  clusterName: c-m-xxxxxxx
  roleTemplateName: cluster-owner
  userPrincipalName: local://user-abc123

---
# ProjectRoleTemplateBinding — grant read-only access to a project
apiVersion: management.cattle.io/v3
kind: ProjectRoleTemplateBinding
metadata:
  name: qa-readonly
  namespace: p-xxxxxxx
spec:
  projectName: c-m-xxxxxxx:p-xxxxxxx
  roleTemplateName: read-only
  userPrincipalName: local://user-def456

Rancher ships with built-in role templates: cluster-owner, cluster-member, project-owner, project-member, and read-only. You can also create custom role templates that grant specific Kubernetes RBAC permissions across all managed clusters simultaneously — a major operational advantage over managing RBAC in each cluster independently.

Fleet: GitOps for Multiple Clusters

Fleet is Rancher's built-in GitOps engine. You create GitRepo resources that point to a Git repository, and Fleet applies the manifests to clusters matching label selectors. This makes it trivial to push the same application to dozens of clusters simultaneously.

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: platform-apps
  namespace: fleet-default
spec:
  repo: https://github.com/myorg/platform-configs
  branch: main
  paths:
    - /monitoring
    - /security-policies
  targets:
    - name: all-prod-clusters
      clusterSelector:
        matchLabels:
          env: production
    - name: staging
      clusterSelector:
        matchLabels:
          env: staging
      helm:
        values:
          replicas: 1   # Override values for staging

Fleet supports plain Kubernetes YAML, Helm charts, Kustomize overlays, and combinations of all three within a single repository. Each cluster gets its own status tracked in the Fleet UI — you can see at a glance which clusters are synced, drifted, or errored.

Rancher Monitoring and Alerting

The Rancher Monitoring app installs kube-prometheus-stack (Prometheus, Alertmanager, Grafana, and exporters) into a downstream cluster with one click from the Rancher App Catalog. It comes with pre-built dashboards for node health, pod resource usage, etcd health, and network traffic.

# Install via Rancher CLI (or use the UI App Catalog)
rancher app install \
  --namespace cattle-monitoring-system \
  --values monitoring-values.yaml \
  rancher-monitoring

# Custom values to increase Prometheus retention
# monitoring-values.yaml:
# prometheus:
#   prometheusSpec:
#     retention: 15d
#     storageSpec:
#       volumeClaimTemplate:
#         spec:
#           storageClassName: gp3
#           resources:
#             requests:
#               storage: 100Gi
Multi-cluster metrics: Each cluster gets its own Prometheus instance managed by Rancher. For a unified view across all clusters, deploy Thanos Sidecar alongside each Prometheus instance and point a central Thanos Querier at all of them. Rancher does not federate metrics automatically.

Production Hardening

Checklist for running Rancher in production:

  • HA Rancher Server — run 3 Rancher replicas behind a load balancer with an external database (PostgreSQL or MySQL) rather than the embedded etcd.
  • Rancher backup — install the Rancher Backup Operator and schedule daily snapshots of Rancher configuration to S3. Without backups, a Rancher Server failure means re-importing all clusters and re-creating all RBAC bindings manually.
  • Audit logging — enable Rancher audit logging (--audit-log-path, --audit-level=1) to record all API calls for compliance and forensics.
  • Network policy — restrict access to cattle-system namespace. Only the ingress controller and Rancher agents should reach the Rancher API.
  • Version pinning — pin the Helm chart version and Rancher image tag. Do not use latest tags in production. Rancher releases are tested against specific Kubernetes version ranges.
# Install Rancher Backup Operator
helm upgrade --install rancher-backup rancher-stable/rancher-backup \
  --namespace cattle-resources-system \
  --create-namespace

# Create a daily backup to S3
kubectl apply -f - <<EOF
apiVersion: resources.cattle.io/v1
kind: Backup
metadata:
  name: daily-backup
spec:
  storageLocation:
    s3:
      bucketName: rancher-backups
      region: us-east-1
      credentialSecretName: s3-creds
      credentialSecretNamespace: cattle-resources-system
  schedule: "0 2 * * *"
  retentionCount: 7
EOF