The Hidden Burdens of etcd

Doctrine

You can operate Kubernetes without reading etcd—until you can’t. At scale, etcd becomes the moral center of the platform: it demands restraint, hygiene, and precise discipline.

Kubblai doctrine treats etcd as an archive: what you write must be worth storing.

Operational realities

Large objects (ConfigMaps, Secrets, CRDs, and status fields) increase storage and watch traffic. High-churn workloads turn etcd into a write amplifier. Compaction and defrag are not optional ceremonies; they are maintenance.

If your platform is slow, it is often etcd telling you the truth.

Watch pressure grows with object count and churn.
Status updates are writes; avoid noisy status spam.
CRDs with large schemas and frequent updates are expensive.

Compaction, defrag, and the cost of neglect

Compaction removes historical revisions; defragmentation reclaims space. Both have performance consequences. The operator’s task is to schedule them with awareness of peak traffic and failure tolerance.

Do not treat etcd maintenance as an afterthought. It is the core’s health.

Symptoms that matter

When etcd is strained, the platform speaks through latency and timeouts.

API server latency spikes and 429/5xx during write bursts.
Controller reconcile durations increasing across unrelated resources.
Watch disconnects/resync storms.
Leader election flapping.

Design discipline

The best etcd optimization is architecture: reduce object churn, reduce write frequency, and avoid storing what you do not need.

Kubblai doctrine: store intent, not noise.

Text

Doctrine

Operational realities

Compaction, defrag, and the cost of neglect

Symptoms that matter

Design discipline