Skip to content

Lab · Intermediate

Lab: DNS Triage Inside the Cluster

DNS failures masquerade as everything. This lab teaches a minimal sequence to prove whether the problem is DNS, routing, policy, or the application.

Prerequisites

What you should have before you begin.

DNSNetworkingOperationsReliability
  • A cluster with CoreDNS/kube-dns
  • kubectl installed
  • Ability to run a debug pod

Lab text

Follow the sequence. Change one thing at a time.

Goal

You will learn to prove DNS behavior from inside the cluster: service names, FQDNs, search domains, and failure modes caused by policy or egress restrictions.

  • Confirm DNS service and CoreDNS pods are healthy.
  • Run queries from a debug pod.
  • Distinguish NXDOMAIN from timeouts.

Check the DNS system

Start with the system components. If they are unhealthy, stop there.

  • If CoreDNS is CrashLooping, fix that first.
  • If the DNS service has no endpoints, you have a control-plane/label issue.

kubectl

shell

kubectl get svc -n kube-system | rg -n "dns"
kubectl get pods -n kube-system | rg -n "coredns|dns"

Query from inside the cluster

Run queries from a pod in the affected namespace (policy differs by namespace).

  • NXDOMAIN suggests name mismatch.
  • Timeout suggests routing/policy/egress or CoreDNS overload.

kubectl

shell

kubectl run -n <ns> dns-debug --image=busybox:1.36 --restart=Never --command -- sh -c "sleep 3600"
kubectl exec -n <ns> dns-debug -- nslookup kubernetes.default.svc.cluster.local

Common causes

DNS symptoms often come from non-DNS sources.

  • NetworkPolicy default-deny blocks UDP/TCP 53 to CoreDNS.
  • Node-level DNS or CNI issues prevent reaching kube-dns service VIP.
  • CoreDNS overload or upstream resolution failures.
  • Search domain/ndots behavior causing surprising query patterns.

Verify and record

Write down what changed the outcome. DNS issues recur.

  • Service FQDN resolves consistently.
  • Timeouts stop.
  • Applications stop retry storms (reduced load).

Canonical link

Canonical URL: /labs/dns-triage-inside-the-cluster