Skip to content

Canonical Texts

Principal Lessons on Kubernetes API Design and CRD Discipline

A CRD is not a schema. It is a contract between humans, controllers, and the control plane. The strongest designs assume failure and make drift legible.

Text

Authored as doctrine; evaluated as systems craft.

Doctrine

Custom resources become institutional truth. When they are poorly designed, you create permanent operational debt: unclear status, breaking version changes, and controllers that thrash.

Kubblai doctrine: you earn new scripture by maintaining it through incidents, upgrades, and human turnover.

  • Design for upgrade and rollback; versioning is not optional.
  • Treat status as an API, not a dumping ground.
  • Controllers must be idempotent and backpressure-aware.

Spec vs status: contracts and ownership

Spec is intent. Status is testimony. Do not let humans write status. Do not let controllers rewrite spec without a paper trail.

Use conditions to communicate progress and failure modes in a stable shape.

  • Define conditions with type, status, reason, message, observedGeneration.
  • Keep status small; large status objects amplify etcd write load.
  • Document field ownership under SSA to prevent tug-of-war.

Validation and admission

Validation is governance. OpenAPI schema validation catches shape errors; admission catches policy. The more logic you put in a webhook, the more you couple your API to availability.

Prefer simple, deterministic validation. Avoid remote dependencies in admission.

  • Use CEL (where supported) for local validation over remote webhook calls.
  • Budget webhook latency; failurePolicy is an incident posture choice.
  • Provide clear user-facing errors; avoid opaque rejections.

Versioning discipline

CRD versioning is where teams collapse. You must support old versions long enough to migrate safely. You must write conversion logic or constrain change.

Breaking changes without migration plans are governance failures.

  • Define deprecation windows; publish migration guides.
  • Keep conversion deterministic and test it with real objects.
  • Avoid renaming fields casually; prefer additive evolution.

Operational backpressure

CRDs can overload the API: too many objects, too many watches, too many writes. Controllers can amplify load with status spam and retry storms.

Design your APIs with quotas, rate limits, and bounded reconcile behavior.

  • Limit object fanout; avoid per-pod custom resources unless necessary.
  • Instrument reconcile duration and error rates; enforce backoff.
  • Avoid tight loops that rewrite status on every sync.