Atlas: Node NotReady

Symptom

A node reports NotReady; pods may be evicted, stuck, or unreachable depending on the failure.

OperationsReliabilitySecurity

What this usually means

The control plane has stopped receiving healthy heartbeats from the node, or the node reports a condition that makes it unsafe. Treat it as a partial partition until proven otherwise.

What to inspect first

Contain the impact. Then read conditions.

Check conditions: Ready, MemoryPressure, DiskPressure, NetworkUnavailable.
Check which workloads are on the node.

kubectl

shell

kubectl get nodes -o wide
kubectl describe node <node>
# Optional containment:
kubectl cordon <node>

Likely causes

NotReady can be network, kubelet, disk, or runtime instability.

Network partition between node and API server.
Disk pressure (node filesystem full; image garbage collection failing).
CNI failure causing NetworkUnavailable.
Container runtime issues; kubelet cannot manage pods.
Host maintenance or kernel-level instability.

Resolution guidance

Prefer containment and controlled evacuation over heroic intervention.

If the node is unhealthy, drain: `kubectl drain <node> --ignore-daemonsets --delete-emptydir-data` (validate flags for your posture).
Fix disk pressure by freeing space; verify kubelet recovers conditions.
If CNI is failing, restore the node networking plane before expecting pods to be reachable.
Replace compromised nodes rather than patching indefinitely; treat nodes as cattle.

Text

What this usually means

What to inspect first

Likely causes

Resolution guidance