Advanced Disciplines
The Dark Order’s Guide to Observability in Kubernetes
Observability is not dashboards. It is the discipline of evidence: the ability to prove what happened, what changed, and why the system behaved as it did.
Text
Authored as doctrine; evaluated as systems craft.
Doctrine: evidence before narrative
During incidents, teams tell stories. Observability is how you replace stories with evidence.
The Order’s rule: if you cannot answer ‘what changed’ and ‘where is the bottleneck’, you do not yet have an observability system.
- Events for object-level narrative.
- Metrics for saturation and error budgets.
- Traces for causality across boundaries.
- Audit for governance and attribution.
Control plane visibility is mandatory
Clusters fail through the control plane: API latency, admission failures, scheduler stalls, etcd pressure. If you only observe workloads, you will misdiagnose platform incidents as application incidents.
A serious shrine keeps watch over the gate of truth.
Signal quality: the quiet art
Too many logs is not observability; it is storage debt. Too many alerts is not readiness; it is learned helplessness.
Align signals with failure modes: control plane QPS/latency, reconcile duration, queue depths, node pressure, and rollout health.
Runbooks as canonical texts
Runbooks are doctrine translated into action. They must be executable under stress: concrete commands, expected outputs, and decision points.
Postmortems are not documents. They are how you update doctrine.
kubectl
shell
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
kubectl top nodes
kubectl top pods -A | head
kubectl get --raw /metrics | headCommon failure modes (and how to avoid them)
The Order sees the same patterns repeat across organizations.
- No control plane telemetry → platform incidents misdiagnosed.
- Alert storms without ownership → alerts ignored during real events.
- Missing change correlation → ‘nothing changed’ becomes the default lie.
- Metrics without SLOs → data without decisions.
Canonical Link
Canonical URL: /library/the-dark-orders-guide-to-observability-in-kubernetes
Related Readings
Advanced Disciplines
LibraryObservability as Revelation
Observability is the discipline of evidence. Without it, incident response becomes storytelling.
Advanced Disciplines
LibraryTraces, Metrics, and the Reading of Omens
Telemetry is a system. If you do not govern cardinality and cost, observability becomes its own outage.
Canonical Texts
LibraryIncident Response as a Trial of Faith
Incidents reveal the true governance of your platform: who can act, what can be changed, and whether your system can recover with discipline.
Sacred Systems
LibraryThe API Server as the Gate of Truth
The API is the only public reality in Kubernetes. Everything else is implementation detail and transient effect.
Sacred Systems
LibraryThe Hidden Burdens of etcd
etcd is where intent is stored. It is also where unbounded ambition becomes latency, instability, and collapse.