Atlas: Pods in CrashLoopBackOff

Symptom

Pods repeatedly restart; status shows CrashLoopBackOff.

WorkloadsOperationsReliability

What this usually means

The container starts and then stops. Kubernetes retries it. Your job is to identify whether the stop is deliberate (process exits) or imposed (probe, OOM, eviction).

Likely causes

Most crash loops fall into a small set of classes. Classify before you change anything.

Process exits immediately (bad args, missing files, missing env).
Probe-driven restarts (liveness too aggressive, startup too slow).
OOMKilled or eviction (memory pressure, node pressure).
Dependency failure (DNS, service routing, auth, secret/config).

What to inspect first

Collect evidence in the order that preserves attribution.

Exit code and reason in container state (Error vs OOMKilled).
Events for probe failures, mounts, permission denials.
Restart cadence: fast loops correlate with immediate exit or probe misconfig.

kubectl

shell

kubectl get pod <pod> -n <ns> -o wide
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns> --previous --all-containers=true

Resolution guidance

Apply one smallest-safe fix at a time. Verify convergence. Record what changed the outcome.

Probe-driven: adjust initialDelaySeconds / timeoutSeconds / failureThreshold; confirm startup behavior.
Config-driven: verify ConfigMap/Secret keys and references; avoid silent defaults.
OOM: increase requests/limits or reduce memory usage; confirm node pressure and eviction signals.

Text

What this usually means

Likely causes

What to inspect first

Resolution guidance