Skip to content

Sacred Systems

CSI and the Persistence of State

Storage is where orchestration meets physics. CSI is the treaty between the cluster and the reality of disks.

Text

Authored as doctrine; evaluated as operations.

Doctrine

Stateful systems demand humility. Storage is slower than compute, less forgiving than networking, and more expensive to recover.

Kubblai doctrine: treat CSI as critical infrastructure, not as a feature.

Topology and attachment realities

Volumes are not universal. They have zones, attachment limits, and failure domains. Scheduling and storage must agree, or you will create unschedulable workloads.

Operators who ignore topology inevitably debug ‘Pending’ at 3 a.m.

  • Understand storage class topology constraints.
  • Know attachment limits per node type.
  • Test failover and reattach time.

Expansion, snapshots, and recovery

Volume expansion and snapshots are operational tools. They must be tested before you need them. The first time you attempt restoration should not be during incident response.

Treat backup/restore as a production workflow with SLOs.

Failure signatures

CSI failures are often slow: attach timeouts, mount hangs, and node-level issues. Your observability must include node logs, CSI controller logs, and events.

Do not debug stateful incidents with only application logs.

Canonical Link

Canonical URL: /library/csi-and-the-persistence-of-state