Reference

Lexicon

Definitions that hold in production. Each term includes: what it is, how it fails, what to inspect, and where to read next.

Kubernetes Primer Troubleshooting Atlas Shrine Archive

Terms

15 entries · concise and operational

Lexicon

Term

Pod

The smallest schedulable unit in Kubernetes.

Lexicon

Term

Deployment

A rollout controller for stateless workloads.

Lexicon

Term

Service

A stable network identity backed by endpoints.

Lexicon

Term

Ingress

External HTTP routing into the cluster.

Lexicon

Term

Control Plane

The governing system that stores intent and drives convergence.

Lexicon

Term

Namespace

A logical boundary for names and policy.

Lexicon

Term

Node

A machine that runs pods under kubelet control.

Lexicon

Term

kubelet

The node agent that makes pods real.

Lexicon

Term

Scheduler

The control-plane component that decides placement.

Lexicon

Term

Reconciliation

The control-loop discipline that closes drift.

Lexicon

Term

ConfigMap

Configuration data injected into workloads.

Lexicon

Term

Secret

Sensitive values distributed to workloads.

Lexicon

Term

ServiceAccount

A workload identity for in-cluster API access.

Lexicon

Term

RBAC

Authorization rules for the Kubernetes API.

Lexicon

Term

Probe

Health checks that gate traffic and restarts.

Entries

Readable, copyable, and linked into the rest of the shrine.

Term

Pod

The smallest schedulable unit in Kubernetes.

Top

Definition

A Pod is a wrapper around one or more containers that share networking and (optionally) storage. The scheduler places pods onto nodes; kubelet makes them real. Most operational symptoms begin at the pod layer: probes, resource pressure, image pulls, and container exits.

In practice

A pod is ephemeral by design; treat it as cattle, not a pet.
Readiness gates traffic; liveness controls restart behavior.
Pod status is testimony: look at conditions, container states, and events.

What to inspect

kubectl get pod -n <ns> -o wide
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns> --previous

Common mistakes

Using liveness probes to model dependency readiness (causes restart storms).
Ignoring events and reading only the ‘STATUS’ column.
Treating a pod restart as a fix instead of a symptom.

Deployment

A rollout controller for stateless workloads.

Top

Definition

A Deployment manages ReplicaSets and performs rolling updates toward a declared pod template. It is the default mechanism for safe, staged change—if probes and surge math reflect reality.

In practice

Deployments converge via ReplicaSet churn; watch conditions and rollout status.
maxSurge/maxUnavailable are capacity and risk decisions, not defaults.
Rollback is a bounded operation; external state and migrations can be one-way doors.

What to inspect

kubectl rollout status deploy/<name> -n <ns>
kubectl describe deploy/<name> -n <ns>
kubectl get rs -n <ns> --sort-by=.metadata.creationTimestamp

Common mistakes

Shipping without readiness probes and calling it ‘safe’.
Over-surge rollouts that exceed headroom and create Pending storms.
Treating success as ‘pods are Running’ instead of verifying serving health.

Service

A stable network identity backed by endpoints.

Top

Definition

A Service selects pods (endpoints) and provides stable DNS and virtual IP routing. If endpoints are empty, routing cannot work—regardless of DNS, ingress, or client code.

In practice

Selectors must match labels exactly; one mismatch yields zero endpoints.
Readiness gates whether a pod becomes an endpoint.
Ports must align: port/targetPort/container listener.

What to inspect

kubectl get svc,ep,endpointslices -n <ns>
kubectl describe svc <svc> -n <ns>
kubectl get pods -n <ns> --show-labels

Common mistakes

Assuming DNS is broken when endpoints are empty.
Misaligned targetPort causing silent connection failures.
Using overly broad selectors that accidentally pick the wrong pods.

Ingress

External HTTP routing into the cluster.

Top

Definition

Ingress defines HTTP routing rules, but it is only as real as the ingress controller that implements it. Many ‘ingress problems’ are actually service endpoint problems, DNS problems, or controller health issues.

In practice

Ingress is a contract + an implementation (controller).
Debug from the edge inward: controller → service → endpoints → pods.
TLS and DNS failures often masquerade as routing failures.

What to inspect

kubectl get ingress -A
kubectl describe ingress <ing> -n <ns>
kubectl logs -n <ingress-ns> deploy/<controller> --tail=200

Common mistakes

Changing ingress YAML repeatedly without checking controller logs.
Ignoring service endpoints (routing cannot work without them).
Confusing DNS resolution failures for ingress routing problems.

Control Plane

The governing system that stores intent and drives convergence.

Top

Definition

The control plane is the API server, persistence (etcd), controllers, and scheduler. If it is slow or failing admission, everything else becomes unreliable: rollouts stall, controllers thrash, and recovery actions fail.

In practice

Treat API latency and admission health as first-order signals.
Backpressure and rate limits are real failure modes.
Many incidents are control-plane incidents wearing workload masks.

What to inspect

kubectl get --raw /readyz?verbose
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50

Common mistakes

Assuming workload failures are app-only during API latency spikes.
Ignoring webhook timeouts until deploys stop.
Saturating the API with list/watch loops during an incident.

Namespace

A logical boundary for names and policy.

Top

Definition

A namespace scopes names (most objects), RBAC bindings, quotas, and many policies. Namespaces are a governance tool, not a security boundary by themselves; isolation requires policy, identity discipline, and network posture.

In practice

Use namespaces to express ownership and blast radius.
Apply quotas and policy baselines per namespace.
Keep exceptions explicit and reviewed.

What to inspect

kubectl get ns
kubectl get resourcequota,limitrange -n <ns>
kubectl auth can-i --list -n <ns>

Common mistakes

Treating namespaces as tenant isolation without enforcement.
Allowing unowned namespaces to persist indefinitely.
Mixing unrelated workloads that should not share failure domains.

Node

A machine that runs pods under kubelet control.

Top

Definition

A node provides compute, memory, and local runtime state. Scheduling is placement onto nodes; reliability depends on node health, pressure signals, and correct capacity reporting.

In practice

Treat nodes as failure domains.
Watch pressure conditions and eviction behavior.
Separate node pools by workload class and risk.

What to inspect

kubectl get nodes -o wide
kubectl describe node <node>
kubectl top nodes

Common mistakes

Ignoring node pressure until evictions cascade.
Assuming all nodes are interchangeable when topology differs.
Mixing sensitive and noisy workloads on the same pool without governance.

kubelet

The node agent that makes pods real.

Top

Definition

kubelet watches the API for pod assignments, pulls images, sets up volumes, and reports status. Many incidents present as ‘pods failing’ while the root cause is kubelet health, container runtime issues, or node pressure.

In practice

Know where kubelet logs live in your environment.
Correlate pod failures with node pressure and runtime errors.
Treat kubelet as critical infrastructure.

What to inspect

kubectl describe node <node>
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
kubectl describe pod <pod> -n <ns>

Common mistakes

Restarting workloads repeatedly instead of fixing node/runtime.
Ignoring image pull and volume mount errors at the node layer.
Assuming node NotReady is always a network issue.

Scheduler

The control-plane component that decides placement.

Top

Definition

The scheduler chooses a node for each pod based on requests and constraints: taints/tolerations, affinity, topology, quotas, and priorities. Under scarcity, scheduling becomes governance.

In practice

Requests must be honest; they drive placement.
Constraints must be satisfiable; avoid accidental impossibility.
Define priority/preemption posture before scarcity arrives.

What to inspect

kubectl describe pod <pod> -n <ns>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -n 40
kubectl get nodes

Common mistakes

Using anti-affinity defaults that waste capacity.
Over-constraining placement until pods can never schedule.
Treating Pending as ‘the cluster is broken’ instead of reading scheduler testimony.

Reconciliation

The control-loop discipline that closes drift.

Top

Definition

Reconciliation is the continuous process of comparing declared intent to observed state and taking repeatable steps toward convergence. It is the core operational model of Kubernetes controllers.

In practice

Measure time-to-converge, not just current state.
Design controllers to be idempotent and backpressure-aware.
Treat drift as an incident precursor.

What to inspect

kubectl get <kind> -o yaml
kubectl describe <kind> <name>
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50

Common mistakes

Assuming ‘eventual consistency’ excuses persistent drift.
Building reconcile loops that thrash the API with status spam.
Not instrumenting controllers and then guessing under pressure.

ConfigMap

Configuration data injected into workloads.

Top

Definition

ConfigMaps store non-secret configuration and can be mounted as files or injected as env vars. Operationally, the important questions are: how config changes are rolled out, how defaults are controlled, and how misconfigurations are detected quickly.

In practice

Version config changes like code.
Prefer explicit keys and validation over implicit defaults.
Treat config rollout as a change with a rollback story.

What to inspect

kubectl get configmap -n <ns>
kubectl describe configmap <cm> -n <ns>
kubectl get deploy/<name> -n <ns> -o yaml | rg -n "configMap"

Common mistakes

Embedding huge config blobs (creates API/storage debt).
Changing config without correlating to rollout behavior.
Mixing secrets into ConfigMaps.

Secret

Sensitive values distributed to workloads.

Top

Definition

Secrets are API objects used to distribute sensitive values. Their safety depends on encryption at rest, RBAC, audit, node compromise assumptions, and how workloads consume and rotate credentials.

In practice

Minimize secret material in-cluster; prefer workload identity when possible.
Make rotation a designed workflow.
Audit access continuously.

What to inspect

kubectl auth can-i get secrets -n <ns>
kubectl get secret -n <ns>
kubectl describe sa <sa> -n <ns>

Common mistakes

Treating base64 as encryption.
Overbroad RBAC that makes secrets effectively public.
No rotation plan; credentials become permanent liabilities.

ServiceAccount

A workload identity for in-cluster API access.

Top

Definition

Service accounts represent workload identity within a namespace. Their permissions come from RBAC bindings. Mis-scoped bindings are a common source of both outages (forbidden) and security incidents (overgrant).

In practice

Bind permissions to the smallest scope possible.
Separate build/deploy identities from runtime identities.
Audit bindings regularly.

What to inspect

kubectl get sa -n <ns>
kubectl get rolebinding,clusterrolebinding -A | rg -n "<sa-name>" || true

Common mistakes

Using the default service account unintentionally.
Granting cluster-wide privileges for a namespace-only need.
Embedding long-lived credentials when workload identity exists.

RBAC

Authorization rules for the Kubernetes API.

Top

Definition

RBAC defines who can do what to which resources, in which namespaces. RBAC errors are deterministic and should be diagnosed with exact subject/verb/resource tests.

In practice

Treat RBAC as governance, not convenience.
Use templates and review to avoid bespoke sprawl.
Make break-glass explicit and audited.

What to inspect

kubectl auth whoami
kubectl auth can-i <verb> <resource> -n <ns>
kubectl get rolebinding,clusterrolebinding -A

Common mistakes

Fixing forbidden by granting cluster-admin.
Ignoring aggregated roles and wildcard rules.
Letting break-glass become permanent access.

Probe

Health checks that gate traffic and restarts.

Top

Definition

Probes are contracts. Readiness controls whether a pod receives traffic; liveness controls restarts. Incorrect probes are a frequent cause of self-inflicted incidents.

In practice

Use readiness to represent serving ability.
Use liveness only for irrecoverable deadlock, not transient slowness.
Use startupProbe for slow initialization.

What to inspect

kubectl describe pod <pod> -n <ns>
kubectl get pod <pod> -n <ns> -o yaml | rg -n "readinessProbe|livenessProbe|startupProbe"

Common mistakes

Aggressive liveness that kills slow startups under load.
Readiness that depends on fragile external systems without justification.
Probes with timeouts too small for real-world latency.

Lexicon

Terms

Entries

Pod

Definition

In practice

What to inspect

Common mistakes

Related reading

Deployment

Definition

In practice

What to inspect

Common mistakes

Related reading

Service

Definition

In practice

What to inspect

Common mistakes

Related reading

Ingress

Definition

In practice

What to inspect

Common mistakes

Related reading

Control Plane

Definition

In practice

What to inspect

Common mistakes

Related reading

Namespace

Definition

In practice

What to inspect

Common mistakes

Related reading

Node

Definition

In practice

What to inspect

Common mistakes

Related reading

kubelet

Definition

In practice

What to inspect

Common mistakes

Related reading

Scheduler

Definition

In practice

What to inspect

Common mistakes

Related reading

Reconciliation

Definition

In practice

What to inspect

Common mistakes

Related reading

ConfigMap

Definition

In practice

What to inspect

Common mistakes

Related reading

Secret

Definition

In practice

What to inspect

Common mistakes

Related reading

ServiceAccount

Definition

In practice

What to inspect

Common mistakes