Governance & Power
Platform Cost Doctrine: Waste, Density, and the Economics of the Cluster
Cost is a signal. When ignored, it reappears as fragility: overloaded nodes, under-provisioned control planes, and rushed change driven by budget panic.
Text
Authored as doctrine; evaluated as systems craft.
Doctrine
Platforms are economic systems. Every choice—requests, placement constraints, isolation posture—allocates scarce resources. If you refuse to govern cost, you govern outage instead.
Kubblai doctrine: cost must be legible to those who make decisions, and the incentives must align with reliability.
- Make resource usage visible per tenant and workload class.
- Treat requests discipline as a reliability control, not merely a budget issue.
- Publish a cost posture: where you overpay intentionally for resilience.
Waste is rarely malicious
Most waste is structural: default limits, cargo-cult requests, and fear-driven overprovisioning. Teams pad requests to avoid incident blame, then the scheduler packs less efficiently and autoscalers scale more.
The Order addresses waste by making the system safe enough to be honest.
- Provide safe rollback and incident processes so teams don’t hoard capacity.
- Use VPA recommendations judiciously; treat them as inputs, not authority.
- Audit top waste sources quarterly and tie to remediation work.
Density vs isolation: the governance tradeoff
Higher density reduces cost but increases blast radius. Isolation increases cost but reduces coupling and incident spread. There is no free posture; there is only an explicit trade.
Mature platforms choose isolation boundaries based on threat model and failure domains, not aesthetics.
- Use multi-cluster or node pool separation for high-risk tenants/workloads.
- Use quotas, policy baselines, and network boundaries in shared clusters.
- Avoid false isolation: names without enforcement.
Autoscaling economics
Autoscalers convert demand into nodes. Their economics are shaped by provisioning latency, fragmentation, and your willingness to pay for headroom. Underprovisioning causes latency and scheduling failures; overprovisioning burns budget silently.
The Order defines headroom targets per failure domain and workload tier.
- Set headroom targets for critical tiers; measure them continuously.
- Plan for provisioning time; scarcity is often time-based, not total-based.
- Treat node pool churn as an operational cost; minimize thrash.
A practical cost protocol
Cost discipline that survives real orgs is a cadence, not a crusade.
- Monthly: top 20 workloads by requested vs used CPU/memory; remediate the worst.
- Quarterly: review priority classes, quotas, and tenant budgets; adjust governance.
- After incidents: update requests and rollout safety based on observed contention.
Canonical Link
Canonical URL: /library/platform-cost-doctrine-waste-density-and-the-economics-of-the-cluster
Related Readings
Advanced Disciplines
LibraryCapacity, Bin Packing, and the Lies We Tell the Scheduler
The scheduler is not a magician. It places pods based on the numbers you give it. When those numbers are lies, placement becomes a slow-motion incident.
Advanced Disciplines
LibraryCluster Autoscaling and the Economics of Expansion
Adding nodes is not ‘scale.’ It is a controlled expansion of failure domains, cost, and operational surface area.
Governance & Power
LibraryThe Cost of Tenant Illusions in Shared Clusters
Shared clusters promise efficiency. Without real isolation, they deliver shared outages: quota fights, RBAC mistakes, policy coupling, and security ambiguity.
Governance & Power
LibraryPolicy as Doctrine, Not Suggestion
Policy is what makes a platform institutional. Without it, every incident is negotiated from scratch.
Governance & Power
LibraryMulti-Cluster Governance and the Problem of Sovereignty
Multiple clusters create political boundaries: ownership, identity, policy, and observability become governance problems, not tooling problems.