Chapter 16 · Initiate Tutorials
Operations Handbook
A calm set of rites for when things break: observe, narrow, act, confirm, memorialize.
The operator protocol
A sequence that prevents panic from becoming damage.
- Observe: what is the symptom? what changed? what scope is affected?
- Narrow: namespace, workload, node, ingress, or dependency?
- Act: the smallest safe change that increases signal or restores service.
- Confirm: don’t assume; verify rollout and user impact.
- Memorialize: capture the lesson as guardrails, alerts, or runbooks.
Debugging a failing Pod
Start with describe + events, then logs.
Describe and events
shell
kubectl get pod -n <ns>
kubectl describe pod/<pod-name> -n <ns>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -n 30Logs
shell
kubectl logs -n <ns> <pod-name>
# If multi-container:
kubectl logs -n <ns> <pod-name> -c <container-name>
# Previous crash:
kubectl logs -n <ns> <pod-name> --previousRollouts
Verify progress; don’t assume.
Rollout status + history
shell
kubectl rollout status deployment/<name> -n <ns>
kubectl rollout history deployment/<name> -n <ns>
# Roll back if needed:
kubectl rollout undo deployment/<name> -n <ns>Namespaces
Scope is safety.
Set a default namespace for a context
shell
kubectl config set-context --current --namespace=<ns>Use namespaces to reduce accidental cross-talk and to structure policies, quotas, and access. Namespaces are not hard security boundaries by themselves, but they create essential operational containment.
Common failure modes
Name the pattern; shorten the incident.
- ImagePullBackOff: wrong image name, missing registry creds, blocked egress.
- CrashLoopBackOff: app failing at startup, config missing, probe too aggressive.
- Pending: scheduling constraints unmet, insufficient CPU/memory, taints not tolerated.
- 503 at ingress: endpoints missing, readiness failing, service selector mismatch.