Learn · Operations
Operations
Operations is a protocol. Under pressure you need sequences that preserve attribution: observe → narrow → act → confirm → memorialize.
What this module covers
Rites that survive incidents.
- Evidence-first workflows: describe/events/logs, then targeted diffs and rollouts.
- Change discipline: smallest-safe fixes, reversible rollouts, and explicit verification.
- Containment: reducing blast radius while you regain signal.
Readings
Operator-grade and precise.
Chapter
Chapter 16Operations Handbook
Debugging pods, rollouts, logs, events, namespaces, and failure modes.
Text
Codex GigasIncident Doctrine for Platform Teams
Containment, communication, reversibility, and the discipline of truth.
Text
Codex GigasDebugging the Control Plane Under Pressure
API saturation, admission failures, controller churn—and how not to amplify the outage.
Practice
Exercises that build calm execution.
Next
Continue with security and reliability posture.