Skip to content

Doctrine / Theology

On Drift, Entropy, and the Burden of Configuration

Drift is not merely difference; it is accumulated uncertainty. Entropy grows wherever intent is not recorded and enforced.

Text

Authored as doctrine; evaluated as operations.

Entropy is the default

In production, drift is constant: autoscalers change replica counts, nodes churn, pods restart, certificates rotate, and humans make emergency changes.

Entropy is not the enemy. Unobserved entropy is.

Sources of drift

Drift has distinct causes. Treating them as one category makes you solve the wrong problem.

  • Human drift: imperative fixes not captured back into declarative source of truth.
  • Controller drift: loops fighting each other, or loops with incomplete ownership.
  • Policy drift: changing admission rules without retrofitting existing objects.
  • Infrastructure drift: node images, CNI changes, storage class behavior shifts.
  • Supply chain drift: mutable tags, overwritten artifacts, and ambiguous provenance.

The burden of configuration

Configuration is power. It is also debt. Every knob you expose becomes an operational surface area: it must be versioned, validated, and supported under stress.

Kubblai doctrine favors smaller surfaces with stronger guarantees.

  • Prefer opinionated platforms with enforced standards.
  • Centralize policy and validate at admission time.
  • Measure configuration churn and tie it to incident volume.

GitOps as entropy control

GitOps is not a religion. It is a discipline for controlling drift: a singular source of intent, a repeatable reconciliation agent, and an auditable change history.

If GitOps makes your incident response slower, your workflow is wrong—not the concept.

  • Define emergency change procedures that reconcile back into Git.
  • Use progressive delivery; don’t gate everything on a monolith pipeline.
  • Keep rollbacks fast and boring.

Operator practice: drift budgets

Mature platforms treat drift like latency: it has budgets. A small amount is acceptable. Prolonged divergence is a breach.

Define what ‘out of policy’ means, how you detect it, and how you remediate it without destabilizing the cluster.