Kubernetes Multi-Cluster Management with Rancher
Rancher is an open-source multi-cluster Kubernetes management platform from SUSE that provides a unified control plane for provisioning, securing, and operating Kubernetes clusters across on-premises data centres, public clouds, and edge locations. Rather than SSH-ing into each cluster or juggling multiple kubeconfig files, Rancher gives platform teams a single dashboard and API to manage RBAC, deploy workloads with Fleet GitOps, enforce policies, and collect metrics — all from one place. This guide covers installing Rancher, importing and provisioning clusters, and using its key features in production.
Table of Contents
Rancher Architecture
Rancher runs as a set of pods on a dedicated management Kubernetes cluster (the "local" cluster). It communicates with downstream clusters via an authenticated WebSocket tunnel established by the Rancher agent — meaning downstream clusters do not need to expose their API servers to the internet.
Key components:
- Rancher Server — the central API server and UI. Runs on the local cluster. All kubectl commands, UI actions, and Fleet operations flow through it.
- Cattle Cluster Agent — deployed in each downstream cluster. Establishes the tunnel back to Rancher Server and executes instructions from it.
- Fleet Manager — built-in GitOps engine. Watches Git repositories and applies manifests, Helm charts, or Kustomize configs to target clusters based on label selectors.
- Rancher Monitoring — installs kube-prometheus-stack in each cluster and provides pre-built Grafana dashboards for cluster health, node metrics, and workload status.
Installing Rancher with Helm
Rancher is installed on a dedicated Kubernetes cluster — typically a small 3-node RKE2 or k3s cluster. You need cert-manager installed first for TLS certificate management.
# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
# Add Rancher Helm repo (stable channel)
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update
# Install Rancher
helm upgrade --install rancher rancher-stable/rancher \
--namespace cattle-system \
--create-namespace \
--set hostname=rancher.example.com \
--set bootstrapPassword=admin \
--set ingress.tls.source=letsEncrypt \
--set letsEncrypt.email=ops@example.com \
--set letsEncrypt.ingress.class=nginx \
--set replicas=3
# Wait for rollout
kubectl rollout status deployment/rancher -n cattle-system
# Verify Rancher pods
kubectl get pods -n cattle-system
# rancher-xxx 1/1 Running (x3)
# Get the bootstrap password
kubectl get secret --namespace cattle-system bootstrap-secret \
-o jsonpath='{.data.bootstrapPassword}' | base64 -d
Importing Existing Clusters
Rancher can manage any CNCF-conformant Kubernetes cluster — EKS, AKS, GKE, or self-managed — by importing it. Importing deploys the Cattle Cluster Agent into the downstream cluster via a kubectl command that Rancher generates.
# In the Rancher UI: Cluster Management > Import Existing
# Rancher generates a command like this:
kubectl apply -f https://rancher.example.com/v3/import/xyz-cluster-token.yaml
# Or using the Rancher CLI
rancher login https://rancher.example.com --token my-api-token
rancher cluster import my-eks-cluster
After the agent connects, the cluster appears in the Rancher UI within 30–60 seconds. You can then manage namespaces, deploy workloads, and configure RBAC from the central dashboard without touching kubeconfig files directly. Rancher generates short-lived kubeconfig credentials that are scoped to what each user is allowed to access.
Provisioning New Clusters
Rancher can provision new clusters on AWS (EKS), Azure (AKS), GCP (GKE), or on VMs using RKE2. Cloud credentials are stored as Rancher Secrets and referenced in cluster definitions.
# Rancher cluster provisioning via the Provisioning API (v2)
apiVersion: provisioning.cattle.io/v1
kind: Cluster
metadata:
name: prod-eks-cluster
namespace: fleet-default
spec:
cloudCredentialSecretName: cattle-global-data:aws-credentials
kubernetesVersion: v1.30.0
rkeConfig: {} # Empty for imported clusters
# For EKS provisioning:
# eksConfig:
# region: us-east-1
# nodeGroups:
# - nodegroupName: workers
# instanceType: m5.xlarge
# desiredSize: 3
# minSize: 2
# maxSize: 10
Multi-Cluster RBAC and Projects
Rancher adds two abstraction layers above Kubernetes RBAC: Clusters and Projects. A Project is a group of namespaces within a single cluster. You assign users to clusters or projects with Rancher roles that map to underlying Kubernetes ClusterRole and Role bindings.
# Rancher ClusterRoleTemplateBinding — grant a user cluster-owner on a cluster
apiVersion: management.cattle.io/v3
kind: ClusterRoleTemplateBinding
metadata:
name: dev-team-binding
namespace: c-m-xxxxxxx # cluster namespace in Rancher
spec:
clusterName: c-m-xxxxxxx
roleTemplateName: cluster-owner
userPrincipalName: local://user-abc123
---
# ProjectRoleTemplateBinding — grant read-only access to a project
apiVersion: management.cattle.io/v3
kind: ProjectRoleTemplateBinding
metadata:
name: qa-readonly
namespace: p-xxxxxxx
spec:
projectName: c-m-xxxxxxx:p-xxxxxxx
roleTemplateName: read-only
userPrincipalName: local://user-def456
Rancher ships with built-in role templates: cluster-owner, cluster-member, project-owner, project-member, and read-only. You can also create custom role templates that grant specific Kubernetes RBAC permissions across all managed clusters simultaneously — a major operational advantage over managing RBAC in each cluster independently.
Fleet: GitOps for Multiple Clusters
Fleet is Rancher's built-in GitOps engine. You create GitRepo resources that point to a Git repository, and Fleet applies the manifests to clusters matching label selectors. This makes it trivial to push the same application to dozens of clusters simultaneously.
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: platform-apps
namespace: fleet-default
spec:
repo: https://github.com/myorg/platform-configs
branch: main
paths:
- /monitoring
- /security-policies
targets:
- name: all-prod-clusters
clusterSelector:
matchLabels:
env: production
- name: staging
clusterSelector:
matchLabels:
env: staging
helm:
values:
replicas: 1 # Override values for staging
Fleet supports plain Kubernetes YAML, Helm charts, Kustomize overlays, and combinations of all three within a single repository. Each cluster gets its own status tracked in the Fleet UI — you can see at a glance which clusters are synced, drifted, or errored.
Rancher Monitoring and Alerting
The Rancher Monitoring app installs kube-prometheus-stack (Prometheus, Alertmanager, Grafana, and exporters) into a downstream cluster with one click from the Rancher App Catalog. It comes with pre-built dashboards for node health, pod resource usage, etcd health, and network traffic.
# Install via Rancher CLI (or use the UI App Catalog)
rancher app install \
--namespace cattle-monitoring-system \
--values monitoring-values.yaml \
rancher-monitoring
# Custom values to increase Prometheus retention
# monitoring-values.yaml:
# prometheus:
# prometheusSpec:
# retention: 15d
# storageSpec:
# volumeClaimTemplate:
# spec:
# storageClassName: gp3
# resources:
# requests:
# storage: 100Gi
Production Hardening
Checklist for running Rancher in production:
- HA Rancher Server — run 3 Rancher replicas behind a load balancer with an external database (PostgreSQL or MySQL) rather than the embedded etcd.
- Rancher backup — install the Rancher Backup Operator and schedule daily snapshots of Rancher configuration to S3. Without backups, a Rancher Server failure means re-importing all clusters and re-creating all RBAC bindings manually.
- Audit logging — enable Rancher audit logging (
--audit-log-path,--audit-level=1) to record all API calls for compliance and forensics. - Network policy — restrict access to
cattle-systemnamespace. Only the ingress controller and Rancher agents should reach the Rancher API. - Version pinning — pin the Helm chart version and Rancher image tag. Do not use
latesttags in production. Rancher releases are tested against specific Kubernetes version ranges.
# Install Rancher Backup Operator
helm upgrade --install rancher-backup rancher-stable/rancher-backup \
--namespace cattle-resources-system \
--create-namespace
# Create a daily backup to S3
kubectl apply -f - <<EOF
apiVersion: resources.cattle.io/v1
kind: Backup
metadata:
name: daily-backup
spec:
storageLocation:
s3:
bucketName: rancher-backups
region: us-east-1
credentialSecretName: s3-creds
credentialSecretNamespace: cattle-resources-system
schedule: "0 2 * * *"
retentionCount: 7
EOF