Skip to content

Governance & Power

The Cost of Tenant Illusions in Shared Clusters

Shared clusters promise efficiency. Without real isolation, they deliver shared outages: quota fights, RBAC mistakes, policy coupling, and security ambiguity.

Text

Authored as doctrine; evaluated as systems craft.

Doctrine

Namespaces are not isolation; they are a naming boundary. Multi-tenancy is a portfolio of controls: identity, policy, quotas, network boundaries, and operational procedures that prevent one tenant’s mistake from becoming everyone’s incident.

Kubblai doctrine: do not sell ‘isolation’ when you have only names.

  • Define the tenant model: who owns namespaces, quotas, policy, and budgets.
  • Treat quota as economics: resource usage is a cost with governance.
  • Instrument tenant-level blast radius: errors, latency, and capacity consumption.

Noisy neighbors are governance failures

A tenant that floods the API with churn, creates too many objects, or triggers eviction storms is a governance failure, not a moral failing. The platform must prevent it by design.

In shared clusters, fairness is policy, not goodwill.

  • Use ResourceQuota and LimitRange to encode fairness.
  • Use priority classes and preemption consciously; document who can take scarcity.
  • Use admission to reject pathological objects (huge env vars, giant ConfigMaps).

Identity boundaries and RBAC reality

RBAC is easy to misconfigure at scale. ‘View’ roles become write roles through aggregation. Service accounts become human proxies. Break-glass becomes permanent access.

Tenant isolation requires identity posture: least privilege, audit, and structured exceptions.

  • Avoid cluster-admin by default. Make break-glass explicit and time-bound.
  • Standardize role templates; avoid bespoke RBAC per team.
  • Audit role bindings continuously; treat drift as an incident precursor.

Network policy is not a checkbox

Network policy enforces communication boundaries only when implemented correctly by the CNI and when services are designed with explicit trust boundaries.

In shared clusters, ambiguous connectivity becomes a security incident waiting for a timestamp.

  • Define default-deny posture for sensitive namespaces.
  • Document ingress/egress paths; treat exceptions as artifacts with owners.
  • Test policy with real traffic flows, not assumptions.

Operational posture

Shared clusters multiply incident complexity. You need runbooks that include tenant communication, blast radius estimation, and safe throttling mechanisms.

If you cannot pause one tenant’s chaos without pausing the cluster, you do not have multi-tenancy.

  • Maintain tenant-level dashboards and budgets.
  • Provide ‘pause’ mechanisms for pathological controllers or workloads.
  • Treat tenancy as a product: documentation, expectations, and enforcement.