Advanced Disciplines
Resource Requests, Limits, and Scheduling Tradeoffs
Requests are promises. Limits are constraints. Misusing either creates clusters that lie about capacity and workloads that fail when load arrives.
Text
Authored as doctrine; evaluated as systems craft.
Doctrine
The scheduler places pods based on requests. If requests are dishonest, scheduling is dishonest, and every downstream system inherits the lie: autoscaling, capacity planning, and reliability posture.
Kubblai doctrine: treat requests as economic inputs. Treat limits as risk constraints. Do not set them as rituals.
- Requests drive placement; keep them aligned to observed steady-state.
- Limits shape runtime failure modes (throttling, OOM) and must be tested under load.
- Headroom is a policy decision; scarcity becomes governance.
QoS classes and eviction reality
Kubernetes assigns QoS classes based on requests and limits. Under node pressure, QoS influences eviction order. This is not a theoretical detail; it is how outages propagate.
If you operate multi-tenant clusters, QoS and quotas are part of your containment posture.
- BestEffort is fragile under pressure; it should be intentional.
- Burstable is common; tune with awareness of eviction behavior.
- Guaranteed workloads are expensive; reserve them for real commitments.
CPU throttling and tail collapse
CPU limits can cause throttling. Under bursty load, throttling increases latency, which can cause readiness flapping, which can cause retry storms, which can cause incidents. This chain is common.
If you set CPU limits, test under the load profile you claim to support.
- Watch p95/p99 latency when CPU is constrained.
- Avoid probe endpoints that share the same constrained thread pools as heavy work.
- Tune limits alongside concurrency and queueing posture.
Memory limits and OOM behavior
Memory limits kill containers. If you set a limit below real peaks, you get OOMKilled during warm-up, cache fill, or rare load spikes. If you set no limit, you risk node-level pressure and eviction cascades.
Right-sizing requires measurement and humility: you are not guessing; you are budgeting.
Field notes
The fastest way to ruin an otherwise healthy cluster is to ‘improve efficiency’ by padding requests or squeezing limits without measurement. The system will obey you. The incident will be yours.
Cost doctrine is not simply density. It is stable density under SLO constraints.
Canonical Link
Canonical URL: /library/resource-requests-limits-and-scheduling-tradeoffs
Related Readings
Advanced Disciplines
LibraryCapacity, Bin Packing, and the Lies We Tell the Scheduler
The scheduler is not a magician. It places pods based on the numbers you give it. When those numbers are lies, placement becomes a slow-motion incident.
Advanced Disciplines
LibraryThe Scheduler Under Scarcity: Priority, Preemption, and Hard Choices
When capacity is insufficient, the scheduler becomes governance. Priority and preemption encode institutional values: who runs, who waits, and who is displaced.
Advanced Disciplines
LibraryCluster Autoscaling and the Economics of Expansion
Adding nodes is not ‘scale.’ It is a controlled expansion of failure domains, cost, and operational surface area.
Canonical Texts
LibraryObservability for People Who Actually Carry the Pager
If observability does not change decisions during an incident, it is decoration. Signal must be tied to failure modes and owned by the people who respond.
Rites & Trials
LibraryIncident Doctrine for Platform Teams
Platform incidents are governance incidents. The doctrine must define authority, evidence, safe mitigations, and how memory becomes guardrail.