Atlas Entry
Atlas: OOMKilled and Evictions
Containers terminate with OOMKilled, or pods are evicted under memory/disk pressure.
Text
Symptom → evidence → resolution.
Symptom
Containers terminate with OOMKilled, or pods are evicted under memory/disk pressure.
WorkloadsReliabilityOperations
What this usually means
Either the container exceeded its memory limit (cgroup OOM) or the node entered pressure and evicted pods. The fix depends on which mechanism applied.
What to inspect first
Read container termination reason, then check node pressure.
- Distinguish OOMKilled vs Evicted.
- Check requests/limits; starvation can look like instability.
kubectl
shell
kubectl describe pod <pod> -n <ns>
# Look for lastState.terminated.reason=OOMKilled
kubectl describe node <node>
# Optional if metrics server exists:
kubectl top pod -n <ns>
kubectl top nodeLikely causes
Memory issues are often a blend of application behavior and scheduling economics.
- Limits too low for real peak usage; memory spikes during warm-up or GC.
- Requests too low, causing oversubscription and eviction pressure.
- Node-level pressure from many co-located workloads or system daemons.
- Large page cache or ephemeral storage pressure presenting as memory pressure.
Resolution guidance
Right-size intentionally. Treat requests as promises.
- Increase limits only after confirming true usage; otherwise you mask leaks.
- Set realistic requests to avoid noisy-neighbor eviction cascades.
- If node pressure is systemic, add capacity or separate workload classes with taints/affinity.
- Use HPA/VPA only with a clear posture; automation without constraints is chaos.
Related
Canonical link
Canonical URL: /atlas/oomkilled-and-evictions