Sacred Systems
CSI and the Persistence of State
Storage is where orchestration meets physics. CSI is the treaty between the cluster and the reality of disks.
Text
Authored as doctrine; evaluated as operations.
Doctrine
Stateful systems demand humility. Storage is slower than compute, less forgiving than networking, and more expensive to recover.
Kubblai doctrine: treat CSI as critical infrastructure, not as a feature.
Topology and attachment realities
Volumes are not universal. They have zones, attachment limits, and failure domains. Scheduling and storage must agree, or you will create unschedulable workloads.
Operators who ignore topology inevitably debug ‘Pending’ at 3 a.m.
- Understand storage class topology constraints.
- Know attachment limits per node type.
- Test failover and reattach time.
Expansion, snapshots, and recovery
Volume expansion and snapshots are operational tools. They must be tested before you need them. The first time you attempt restoration should not be during incident response.
Treat backup/restore as a production workflow with SLOs.
Failure signatures
CSI failures are often slow: attach timeouts, mount hangs, and node-level issues. Your observability must include node logs, CSI controller logs, and events.
Do not debug stateful incidents with only application logs.
Canonical Link
Canonical URL: /library/csi-and-the-persistence-of-state
Related Readings
Advanced Disciplines
LibraryStatefulSets and the Burden of Memory
StatefulSets are not Deployments with disks. They encode identity and order—and therefore encode risk.
Advanced Disciplines
LibraryCluster Autoscaling and the Economics of Expansion
Adding nodes is not ‘scale.’ It is a controlled expansion of failure domains, cost, and operational surface area.
Canonical Texts
LibraryIncident Response as a Trial of Faith
Incidents reveal the true governance of your platform: who can act, what can be changed, and whether your system can recover with discipline.