VMware ESXi host disconnected or PSOD
ESXi host has disconnected from vCenter, gone unresponsive, or hit a Purple Screen of Death. Restore management, then guests.
Indicators
- Host shown 'Not responding' or 'Disconnected' in vCenter
- PSOD with stack trace visible on console / IPMI
- VMs marked 'inaccessible' or 'invalid'
- vmkernel.log shows storage APD/PDL or NIC link flapping
Likely causes
- Storage all-paths-down (APD) or permanent device loss (PDL)
- Management network NIC failure or vSwitch misconfiguration
- ESXi memory leak / known bug — check VMware KB for the build
- Hardware fault (CPU, DIMM) — check IPMI/iDRAC system event log
- Hostd / vpxa daemon hang — sometimes recoverable without reboot
Diagnostic steps
-
Connect via DCUI / IPMI console — confirm host alive vs PSOD
-
If hostd hung but ESXi alive — restart management agents: services.sh restart
-
Pull vmkernel.log, hostd.log, vmkwarning.log via vm-support bundle
-
Check storage paths: esxcli storage core path list; identify APD/PDL events
-
Cross-reference build number against VMware KB for known PSOD bugs at this patch level
-
If host is unrecoverable — vCenter HA should have restarted VMs; verify and reconcile
Resolution path
- Restore management plane (vCenter visibility)
- Address root cause (storage path, NIC, firmware, build)
- Validate HA recovered VMs cleanly
- Patch host to fixed build during maintenance window
- Run cluster health and DRS recommendations afterwards
Prevention
- Match HCL strictly — drivers, firmware, ESXi build
- Keep at least N+1 host capacity for HA
- Storage multipathing tested — pull a path quarterly in lab
- Skyline / proactive support enabled where licensed
Tools
- vCenter / vSphere Client
- ESXi shell + esxcli (storage, network, system commands)
- DCUI via IPMI / iDRAC / iLO
- vm-support log bundle
- VMware Skyline / VMware KB
References
- VMware KB — Diagnosing PSOD
- VMware Docs — Storage APD / PDL handling
- Build number to release mapping (Broadcom support portal)