Evidence First
Troubleshooting Atlas
A searchable index of common Kubernetes failures. Each entry follows the same doctrine: symptom → likely causes → what to inspect → commands → resolution → related reading.
How to use the atlas
A small protocol that prevents thrash.
- Confirm the symptom precisely (don’t generalize).
- Run the inspection commands and collect evidence.
- Choose the smallest safe fix and verify convergence.
- Follow related readings to strengthen the underlying model.
Entries
13 diagnostic texts · built for search and speed
Showing 13 of 13.
Atlas
TroubleshootAtlas: Pods in CrashLoopBackOff
CrashLoopBackOff is a symptom. This entry provides a canonical triage sequence and safe resolutions.
Atlas
TroubleshootAtlas: ImagePullBackOff / ErrImagePull
Pull failures are usually naming, auth, or network. This entry gives the shortest path to truth.
Atlas
TroubleshootAtlas: Service Has No Endpoints
If endpoints are empty, traffic cannot route. This entry teaches the endpoint-first diagnostic sequence.
Atlas
TroubleshootAtlas: Pods Pending (Scheduling)
Pending pods are placement failures. This entry teaches you to read scheduler testimony and fix the governing constraint.
Atlas
TroubleshootAtlas: Readiness Probe Failing
Readiness is the traffic gate. This entry teaches probe semantics that prevent silent outages and restart storms.
Atlas
TroubleshootAtlas: Admission Webhook Timeouts
When admission fails, deploys stop. This entry teaches the shortest path to identifying the webhook and restoring the gate of truth.
Atlas
TroubleshootAtlas: Deployment Rollout Stalled
A rollout is a control loop with gates. This entry teaches how to read the gates and restore forward motion safely.
Atlas
TroubleshootAtlas: Liveness Probe Restarts
Liveness is the kill switch. When it is wrong, it creates outages that look like instability.
Atlas
TroubleshootAtlas: Ingress Returns 502/503
When ingress returns 502/503, the edge is telling you upstream is missing, unhealthy, or too slow.
Atlas
TroubleshootAtlas: PVC Pending (Storage)
PVC Pending is a binding failure. This entry teaches how to read storage events and unblock provisioning safely.
Atlas
TroubleshootAtlas: Node NotReady
Node NotReady is a failure domain boundary. This entry teaches containment first, then root cause.
Atlas
TroubleshootAtlas: OOMKilled and Evictions
Memory failures are accounting failures. This entry shows how to prove the killer and right-size with restraint.
Atlas
TroubleshootAtlas: HPA Not Scaling
When HPA does nothing, either metrics are missing or the signal is wrong. This entry teaches the proof path.