Canonical Texts
Principal Lessons on Kubernetes API Design and CRD Discipline
A CRD is not a schema. It is a contract between humans, controllers, and the control plane. The strongest designs assume failure and make drift legible.
Text
Authored as doctrine; evaluated as systems craft.
Doctrine
Custom resources become institutional truth. When they are poorly designed, you create permanent operational debt: unclear status, breaking version changes, and controllers that thrash.
Kubblai doctrine: you earn new scripture by maintaining it through incidents, upgrades, and human turnover.
- Design for upgrade and rollback; versioning is not optional.
- Treat status as an API, not a dumping ground.
- Controllers must be idempotent and backpressure-aware.
Spec vs status: contracts and ownership
Spec is intent. Status is testimony. Do not let humans write status. Do not let controllers rewrite spec without a paper trail.
Use conditions to communicate progress and failure modes in a stable shape.
- Define conditions with type, status, reason, message, observedGeneration.
- Keep status small; large status objects amplify etcd write load.
- Document field ownership under SSA to prevent tug-of-war.
Validation and admission
Validation is governance. OpenAPI schema validation catches shape errors; admission catches policy. The more logic you put in a webhook, the more you couple your API to availability.
Prefer simple, deterministic validation. Avoid remote dependencies in admission.
- Use CEL (where supported) for local validation over remote webhook calls.
- Budget webhook latency; failurePolicy is an incident posture choice.
- Provide clear user-facing errors; avoid opaque rejections.
Versioning discipline
CRD versioning is where teams collapse. You must support old versions long enough to migrate safely. You must write conversion logic or constrain change.
Breaking changes without migration plans are governance failures.
- Define deprecation windows; publish migration guides.
- Keep conversion deterministic and test it with real objects.
- Avoid renaming fields casually; prefer additive evolution.
Operational backpressure
CRDs can overload the API: too many objects, too many watches, too many writes. Controllers can amplify load with status spam and retry storms.
Design your APIs with quotas, rate limits, and bounded reconcile behavior.
- Limit object fanout; avoid per-pod custom resources unless necessary.
- Instrument reconcile duration and error rates; enforce backoff.
- Avoid tight loops that rewrite status on every sync.
Canonical Link
Canonical URL: /library/principal-lessons-on-kubernetes-api-design-and-crd-discipline
Related Readings
Advanced Disciplines
LibraryCRDs as New Scripture
CRDs extend the cluster’s language. They also extend its liabilities: storage, watch load, and governance surface area.
Advanced Disciplines
LibraryControllers as Living Interpreters of Intent
A controller is the interpreter that turns declarations into durable outcomes—if it is designed to survive conflict and load.
Sacred Systems
LibraryThe API Server as the Gate of Truth
The API is the only public reality in Kubernetes. Everything else is implementation detail and transient effect.
Doctrine / Theology
LibraryHow Reconciliation Loops Fail in Practice
Controllers fail by thrashing, stalling, or lying. Mature operators read the shape of convergence: queue depth, reconcile duration, and conflict rates.
Canonical Texts
LibraryCRD Lifecycle Discipline for Teams That Intend to Survive
A CRD becomes a platform contract. Lifecycle discipline is how you keep that contract stable through upgrades, incidents, and team turnover.