T The Triage ManualTechnical Guides for IT Emergencies
P2 · Virtualisation & Storage

Storage performance has collapsed (latency spike)

Disk I/O latency has risen to the point that VMs, databases, or file services are unusable. Find the bottleneck before throwing hardware at it.

Indicators

Likely causes

Diagnostic steps

  1. Identify the layer — guest OS perfmon, hypervisor (esxtop / Hyper-V perf counters), array dashboard. Latency rises as you go down the stack
  2. Check for stuck snapshots — Get-VMSnapshot, vSphere snapshot manager, Veeam orphaned snapshots
  3. Verify controller cache state — vendor utility (perccli, ssacli) for battery/cache module health
  4. Identify per-VM IOPS distribution — find the noisy neighbour
  5. Confirm no rebuild / scrub / consistency check active
  6. Check free space — heavily-used SSD pools degrade severely above ~85% usage

Resolution path

Prevention

Tools

References

storageperformanceiopslatencysnapshotsan