Skip to content

Governance & Power

Multi-Cluster Governance and the Problem of Sovereignty

Multiple clusters create political boundaries: ownership, identity, policy, and observability become governance problems, not tooling problems.

Text

Authored as doctrine; evaluated as systems craft.

Doctrine

Multi-cluster is rarely adopted for beauty. It is adopted for failure domains, tenancy boundaries, latency, regulatory constraints, and organizational reality. Each reason implies a governance model.

Kubblai doctrine: do not create sovereign clusters without defining who can change them, how policy is enforced, and how incidents are coordinated.

  • Define the unit of governance: cluster, namespace, or workload class.
  • Standardize identity and audit across the fleet.
  • Make policy distribution observable and reversible.

The first question: why multiple clusters?

If the reason is ‘scale,’ define which dimension: API QPS, etcd limits, node count, or human coordination. If the reason is ‘isolation,’ define the threat model and the expected blast radius.

Without a crisp reason, you build a fleet that is expensive and fragile.

  • Scale driver: API server and etcd capacity; controller churn.
  • Isolation driver: tenant boundaries, policy variance, compliance zones.
  • Latency driver: regional service-to-service latency and data locality.

Fleet policy: one doctrine, many jurisdictions

A fleet needs baselines: RBAC patterns, PSA posture, network policy stance, and admission standards. But it also needs sanctioned variance: per-cluster exceptions for regulated or legacy workloads.

Central policy must not become a single point of outage.

  • Avoid coupling policy enforcement to a central webhook that can fail the fleet.
  • Prefer policy that degrades safely: audit/warn, then enforce with staged rollout.
  • Track exceptions as objects with owners and expirations.

Identity and trust boundaries

Multi-cluster multiplies identity questions: workload identity, human access, and service-to-service authentication. If identity is inconsistent, audit is meaningless and incident response becomes guesswork.

The Order treats identity as a fleet-wide contract.

  • Standardize service account patterns and least privilege expectations.
  • Centralize audit log retention and correlation across clusters.
  • Maintain a break-glass process that is fleet-aware and reviewed.

Observability across the fleet

A fleet without unified observability is a fleet that cannot be governed. At minimum: consistent labels, cluster identifiers, and correlation between deploy events and telemetry.

The fleet must answer: which cluster is failing, which version, which policy baseline, which incident runbook.

  • Normalize cluster labels and tenant identifiers in metrics/logs.
  • Correlate deploy events, policy changes, and incidents.
  • Treat signal routing as critical infrastructure with SLOs.