Hyper-V host crashed — recovering virtual machines
Host has crashed, restarted unexpectedly, or VMs are stuck in saved/paused-critical state. Goal: stabilise the host first, then recover guests in the right order.
Indicators
- Hyper-V host bug-checked or rebooted unexpectedly
- VMs showing 'Paused-Critical' (storage exhaustion or path loss)
- VMs in 'Saved-Critical' after host restart
- Cluster Shared Volume (CSV) offline or in redirected I/O
- VHDX file errors in event log (Hyper-V-VMMS / Hyper-V-Worker)
Likely causes
- Underlying storage path loss (iSCSI, FC, SMB3) or CSV offline
- Volume full on the VHDX-hosting LUN — dynamic disks expanded beyond capacity
- VHDX corruption following ungraceful shutdown
- Driver / firmware fault on storage controller or NIC teaming
- Cluster heartbeat/witness loss leading to split-brain
Diagnostic steps
-
Confirm host is stable — check Event Viewer System / Application / Hyper-V-VMMS / Hyper-V-Worker / FailoverClustering channels for the precipitating event
-
Verify storage health before touching any VM — Get-ClusterSharedVolume, Get-Disk, Get-Volume, controller / SAN dashboard
-
List VM states — Get-VM | Sort State; identify Saved-Critical, Paused-Critical, Off, Running
-
If storage is healthy and free space exists, attempt to start VMs in dependency order (DCs first, then DBs, then app servers)
-
If VHDX is suspect — copy aside before any repair attempt; never run chkdsk or fix tools against the live VHDX without a copy
-
For cluster split-brain — identify which node holds the authoritative copy via cluster log (Get-ClusterLog) before forcing quorum
Resolution path
- Stabilise host (storage, networking, time)
- Bring CSVs / volumes online cleanly
- Start domain controllers first — authentication must be available
- Start DBs / file services / app servers in dependency order
- Validate VSS writers and backup chain before declaring complete
Prevention
- Monitor VHDX-host volume free space with alerting at ≥80% used
- Avoid placing dynamic VHDXs on volumes without headroom
- Patch host storage drivers and firmware in maintenance windows
- Quorum witness configured (Cloud Witness or File Share Witness off the cluster)
- Tested DR runbook with VM start order documented
Tools
- Hyper-V Manager / Failover Cluster Manager
- PowerShell: Get-VM, Start-VM, Get-ClusterSharedVolume, Get-ClusterLog
- Event Viewer (Hyper-V-VMMS, Hyper-V-Worker, FailoverClustering)
- Storage vendor management (Dell OME, HPE iLO/OneView, Synology DSM, etc.)
- Robocopy or Storage vMotion / Live Migration to evacuate guests
References
- Microsoft Learn — Recover from saved/paused-critical state
- Microsoft Learn — VHDX file format & repair
- Microsoft Learn — Failover cluster troubleshooting
- Vendor SAN/HCI documentation (Dell, HPE, Synology, Starwind)