Atlas: OOMKilled and Evictions

Symptom

Containers terminate with OOMKilled, or pods are evicted under memory/disk pressure.

WorkloadsReliabilityOperations

What this usually means

Either the container exceeded its memory limit (cgroup OOM) or the node entered pressure and evicted pods. The fix depends on which mechanism applied.

What to inspect first

Read container termination reason, then check node pressure.

Distinguish OOMKilled vs Evicted.
Check requests/limits; starvation can look like instability.

kubectl

shell

kubectl describe pod <pod> -n <ns>
# Look for lastState.terminated.reason=OOMKilled
kubectl describe node <node>
# Optional if metrics server exists:
kubectl top pod -n <ns>
kubectl top node

Likely causes

Memory issues are often a blend of application behavior and scheduling economics.

Limits too low for real peak usage; memory spikes during warm-up or GC.
Requests too low, causing oversubscription and eviction pressure.
Node-level pressure from many co-located workloads or system daemons.
Large page cache or ephemeral storage pressure presenting as memory pressure.

Resolution guidance

Right-size intentionally. Treat requests as promises.

Increase limits only after confirming true usage; otherwise you mask leaks.
Set realistic requests to avoid noisy-neighbor eviction cascades.
If node pressure is systemic, add capacity or separate workload classes with taints/affinity.
Use HPA/VPA only with a clear posture; automation without constraints is chaos.

Text

What this usually means

What to inspect first

Likely causes

Resolution guidance