Skip to content

Chapter 16 · Initiate Tutorials

Operations Handbook

A calm set of rites for when things break: observe, narrow, act, confirm, memorialize.

The operator protocol

A sequence that prevents panic from becoming damage.

  1. Observe: what is the symptom? what changed? what scope is affected?
  2. Narrow: namespace, workload, node, ingress, or dependency?
  3. Act: the smallest safe change that increases signal or restores service.
  4. Confirm: don’t assume; verify rollout and user impact.
  5. Memorialize: capture the lesson as guardrails, alerts, or runbooks.

Debugging a failing Pod

Start with describe + events, then logs.

Describe and events

shell

kubectl get pod -n <ns>
kubectl describe pod/<pod-name> -n <ns>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -n 30

Logs

shell

kubectl logs -n <ns> <pod-name>
# If multi-container:
kubectl logs -n <ns> <pod-name> -c <container-name>
# Previous crash:
kubectl logs -n <ns> <pod-name> --previous

Rollouts

Verify progress; don’t assume.

Rollout status + history

shell

kubectl rollout status deployment/<name> -n <ns>
kubectl rollout history deployment/<name> -n <ns>
# Roll back if needed:
kubectl rollout undo deployment/<name> -n <ns>

Namespaces

Scope is safety.

Set a default namespace for a context

shell

kubectl config set-context --current --namespace=<ns>

Use namespaces to reduce accidental cross-talk and to structure policies, quotas, and access. Namespaces are not hard security boundaries by themselves, but they create essential operational containment.

Common failure modes

Name the pattern; shorten the incident.

  • ImagePullBackOff: wrong image name, missing registry creds, blocked egress.
  • CrashLoopBackOff: app failing at startup, config missing, probe too aggressive.
  • Pending: scheduling constraints unmet, insufficient CPU/memory, taints not tolerated.
  • 503 at ingress: endpoints missing, readiness failing, service selector mismatch.