Skip to content

Doctrine / Theology

How Reconciliation Loops Fail in Practice

Controllers fail by thrashing, stalling, or lying. Mature operators read the shape of convergence: queue depth, reconcile duration, and conflict rates.

Text

Authored as doctrine; evaluated as systems craft.

Doctrine

Controllers are the living interpreters of intent. When they fail, they rarely crash cleanly. They churn. They retry. They saturate the API. They make the platform unstable while appearing ‘healthy.’

Kubblai doctrine: a controller’s first duty is restraint. Backpressure is part of correctness.

  • Idempotence is mandatory; retries are routine.
  • Backoff must be deliberate; requeue storms are self-inflicted outages.
  • Status is testimony; do not write it casually.

Hot keys and infinite attention

A hot key is an object that consumes disproportionate reconcile cycles: a broken finalizer, a flapping dependency, or a conflict loop. Hot keys starve the rest of the system.

Find them early. Reduce their churn. Fix the root cause with minimal writes.

  • Instrument reconcile by key; rank by frequency and duration.
  • Avoid immediate requeues on transient errors; use exponential backoff.
  • Use predicates to avoid reconciling on irrelevant updates.

Conflict storms and ownership ambiguity

409 conflicts are normal under concurrency. They become storms when multiple writers patch the same fields repeatedly or when controllers fight over field ownership.

SSA helps only when ownership is explicit; otherwise it becomes another layer of confusion.

  • Define which controller owns which fields; document it.
  • Avoid writing status on every loop; write only on state change.
  • Prefer patch patterns that minimize conflict surface.

Leader flapping and authority churn

Leader election is a governance mechanism. When leases are tuned too aggressively or the API is slow, leadership can flap. Flapping leadership produces duplicated work and inconsistent behavior.

Tune leases for reality: API latency under load, not under ideal conditions.

  • Instrument leader changes and correlate with API latency.
  • Avoid controllers that require strict global ordering across unrelated keys.
  • Ensure a single leader writes status for a given class of objects.

Backpressure patterns that work

Backpressure is encoded in queues, concurrency, and early exits. It is also encoded in refusing to act when prerequisites are not met.

In doctrine: speed without stability is vanity.

  • Bound concurrency; use work queues with rate limiting.
  • Debounce rapid churn on the same key.
  • Prefer ‘wait and observe’ over repeated writes during systemic events.