Skip to content

Lab · Intermediate

Lab: Probe Semantics Under Load

Probes are not decoration. Practice designing probes that prevent silent outages and avoid restart storms.

Prerequisites

What you should have before you begin.

WorkloadsReliabilityOperations
  • A cluster and namespace
  • kubectl installed
  • Basic probe knowledge

Lab text

Follow the sequence. Change one thing at a time.

Goal

You will learn probe discipline: make readiness represent serving ability, make liveness conservative, and use startupProbe to protect warm-up.

  • Separate readiness from liveness.
  • Avoid dependency checks in liveness.
  • Tune timeouts using measured latency.

Scenario

A workload behaves correctly in quiet conditions, but under load the liveness probe fails and restarts the pod. Stability collapses when you need it most.

Your job is to redesign probes so the system degrades safely.

Investigate the current probe configuration

Read the spec and correlate with events.

  • Probe failures should be visible in events.
  • Look for timeouts vs explicit status codes.

kubectl

shell

kubectl get pod <pod> -n <ns> -o yaml | rg -n "readinessProbe|livenessProbe|startupProbe|timeoutSeconds|failureThreshold|periodSeconds"
kubectl describe pod <pod> -n <ns>

Resolution patterns

Fix the semantics before tuning numbers.

  • Move dependency checks into readiness (traffic gate).
  • Add startupProbe if warm-up is slow or bursty.
  • Make liveness check only what must be true to keep running (avoid deep calls).
  • Tune timeouts to observed p99 under expected load, not under idle conditions.

Canonical link

Canonical URL: /labs/probe-semantics-under-load