Lab: Rollout Stalled (ProgressDeadlineExceeded)

Goal

You will learn to debug a stalled rollout without guesswork: read Deployment conditions, identify which gate is holding, and choose a safe rollback or smallest-forward fix.

The Order’s posture: availability first, attribution always.

Read Deployment conditions.
Inspect the new ReplicaSet pods.
Decide rollback vs fix-forward with restraint.

Scenario

A Deployment update is applied. New pods come up, but they never become Ready. After enough time, the Deployment reports ProgressDeadlineExceeded.

This lab is about proving why the rollout cannot advance.

Inspect the rollout gates

Start with the controller’s own testimony. Don’t start by editing YAML.

Find the condition that is false (Available/Progressing).
Pick one new pod and describe it to read events.

kubectl

shell

kubectl rollout status deploy/<name> -n <ns>
kubectl describe deploy/<name> -n <ns>
kubectl get rs -n <ns> -l app=<label> -o wide
kubectl get pods -n <ns> -l app=<label> -o wide

Common blockers to prove

Your investigation should end in one explicit sentence: “the rollout cannot progress because …”

Readiness failing (probe wrong port/path, dependency down, timeout too strict).
Scheduling failure (insufficient resources, taints/affinity).
Strategy constraints (maxUnavailable/maxSurge posture has no headroom).
Policy/admission blocks the new ReplicaSet.

Safe responses

Choose the smallest safe change that restores stability and preserves attribution.

If impact is growing, rollback: `kubectl rollout undo deploy/<name> -n <ns>`.
If the issue is readiness semantics, fix the probe (or dependency) and redeploy deliberately.
If the issue is capacity, add nodes or reduce requests before increasing replicas.

Verification

Your work is complete only when the rollout converges and stays converged.

New pods become Ready.
The Deployment is Available.
Events stop repeating.