Advanced Disciplines

The Dark Order’s Guide to Observability in Kubernetes

Observability is not dashboards. It is the discipline of evidence: the ability to prove what happened, what changed, and why the system behaved as it did.

Return to Archive Governance Initiation

Text

Authored as doctrine; evaluated as systems craft.

Doctrine: evidence before narrative

During incidents, teams tell stories. Observability is how you replace stories with evidence.

The Order’s rule: if you cannot answer ‘what changed’ and ‘where is the bottleneck’, you do not yet have an observability system.

Events for object-level narrative.
Metrics for saturation and error budgets.
Traces for causality across boundaries.
Audit for governance and attribution.

Control plane visibility is mandatory

Clusters fail through the control plane: API latency, admission failures, scheduler stalls, etcd pressure. If you only observe workloads, you will misdiagnose platform incidents as application incidents.

A serious shrine keeps watch over the gate of truth.

Signal quality: the quiet art

Too many logs is not observability; it is storage debt. Too many alerts is not readiness; it is learned helplessness.

Align signals with failure modes: control plane QPS/latency, reconcile duration, queue depths, node pressure, and rollout health.

Runbooks as canonical texts

Runbooks are doctrine translated into action. They must be executable under stress: concrete commands, expected outputs, and decision points.

Postmortems are not documents. They are how you update doctrine.

kubectl

shell

kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
kubectl top nodes
kubectl top pods -A | head
kubectl get --raw /metrics | head

Common failure modes (and how to avoid them)

The Order sees the same patterns repeat across organizations.

No control plane telemetry → platform incidents misdiagnosed.
Alert storms without ownership → alerts ignored during real events.
Missing change correlation → ‘nothing changed’ becomes the default lie.
Metrics without SLOs → data without decisions.

Canonical Link

Canonical URL: /library/the-dark-orders-guide-to-observability-in-kubernetes