Skip to content

Canonical Texts

Incident Response as a Trial of Faith

Incidents reveal the true governance of your platform: who can act, what can be changed, and whether your system can recover with discipline.

Text

Authored as doctrine; evaluated as operations.

Doctrine

A trial is not a spectacle. It is a procedure. In incidents, your goal is not heroism; it is restoration and evidence preservation.

Kubblai doctrine: the incident is where doctrine is tested, not recited.

Evidence-first protocol

Start with observation: symptoms, scope, recent changes. Confirm control plane health. Read events. Only then act.

The worst incidents are accelerated by unverified assumptions.

  • Confirm namespace and context before any write.
  • Prefer read-only diagnosis until a clear intervention emerges.
  • Make one change at a time when possible; measure impact.

Blast radius and reversibility

Prefer changes that are reversible and scoped: scale down, cordon/drain carefully, roll back deployments, disable admission only with explicit compensation and monitoring.

If you cannot roll back, you are not doing operations—you are gambling.

After: memorialize

Postmortems are how the order becomes institutional. Turn lessons into guardrails: policies, alerts, runbooks, and training.

A platform that repeats the same incident is refusing to learn.