Lab: Probe Semantics Under Load

Goal

You will learn probe discipline: make readiness represent serving ability, make liveness conservative, and use startupProbe to protect warm-up.

Separate readiness from liveness.
Avoid dependency checks in liveness.
Tune timeouts using measured latency.

Scenario

A workload behaves correctly in quiet conditions, but under load the liveness probe fails and restarts the pod. Stability collapses when you need it most.

Your job is to redesign probes so the system degrades safely.

Investigate the current probe configuration

Read the spec and correlate with events.

Probe failures should be visible in events.
Look for timeouts vs explicit status codes.

kubectl

shell

kubectl get pod <pod> -n <ns> -o yaml | rg -n "readinessProbe|livenessProbe|startupProbe|timeoutSeconds|failureThreshold|periodSeconds"
kubectl describe pod <pod> -n <ns>

Resolution patterns

Fix the semantics before tuning numbers.

Move dependency checks into readiness (traffic gate).
Add startupProbe if warm-up is slow or bursty.
Make liveness check only what must be true to keep running (avoid deep calls).
Tune timeouts to observed p99 under expected load, not under idle conditions.

Prerequisites

Lab text

Goal

Scenario

Investigate the current probe configuration

Resolution patterns