Atlas Entry
Atlas: Pods in CrashLoopBackOff
Pods repeatedly restart; status shows CrashLoopBackOff.
Text
Symptom → evidence → resolution.
Symptom
Pods repeatedly restart; status shows CrashLoopBackOff.
WorkloadsOperationsReliability
What this usually means
The container starts and then stops. Kubernetes retries it. Your job is to identify whether the stop is deliberate (process exits) or imposed (probe, OOM, eviction).
Likely causes
Most crash loops fall into a small set of classes. Classify before you change anything.
- Process exits immediately (bad args, missing files, missing env).
- Probe-driven restarts (liveness too aggressive, startup too slow).
- OOMKilled or eviction (memory pressure, node pressure).
- Dependency failure (DNS, service routing, auth, secret/config).
What to inspect first
Collect evidence in the order that preserves attribution.
- Exit code and reason in container state (Error vs OOMKilled).
- Events for probe failures, mounts, permission denials.
- Restart cadence: fast loops correlate with immediate exit or probe misconfig.
kubectl
shell
kubectl get pod <pod> -n <ns> -o wide
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns> --previous --all-containers=trueResolution guidance
Apply one smallest-safe fix at a time. Verify convergence. Record what changed the outcome.
- Probe-driven: adjust initialDelaySeconds / timeoutSeconds / failureThreshold; confirm startup behavior.
- Config-driven: verify ConfigMap/Secret keys and references; avoid silent defaults.
- OOM: increase requests/limits or reduce memory usage; confirm node pressure and eviction signals.
Related
Canonical link
Canonical URL: /atlas/pods-crashloopbackoff