Azure VM Unresponsive or Failed to Start — Allocation or Boot Failure
An Azure virtual machine is unreachable via RDP/SSH, stuck in a failed or deallocated state, or refusing to boot due to OS-level errors. Resolution path depends on whether the failure is an Azure allocation constraint, OS corruption, or network/NSG misconfiguration.
Indicators
- VM shows 'Stopped (deallocated)' or 'Failed' state in Azure portal
- RDP or SSH unreachable; Azure Serial Console shows boot errors or blank screen
- Activity Log shows 'Allocation failed' — Microsoft.Compute/allocation-failed
- Boot Diagnostics screenshot shows Windows BSOD or grub rescue prompt
Likely causes
- Azure region or zone capacity constraint causing allocation failure
- OS-level boot failure — Windows BSOD, corrupted BCD, or Linux grub misconfiguration
- Disk corruption from ungraceful shutdown or snapshot inconsistency
- NSG rule change blocking inbound RDP (3389) or SSH (22)
- Disk encryption key access issue — Key Vault unavailable or access policy changed
Diagnostic steps
-
Azure portal > VM > Activity log — identify exact error code, timestamp and initiating principal
-
Enable and check Boot Diagnostics: VM > Diagnostics > Boot diagnostics > view screenshot and serial console output
-
If allocation failure: fully stop/deallocate the VM (not just restart), then start again — forces reallocation to an available host
-
If OS boot failure: use Azure VM Repair: az vm repair create -g <rg> -n <vm> --repair-username admin — attaches OS disk to a recovery VM for offline repair
-
Verify NSG inbound rules allow port 3389 (RDP) or 22 (SSH) on both the VM NIC and subnet NSG; check Effective Security Rules
-
Check Azure Service Health for region-specific incidents: portal.azure.com > Service Health > Health alerts
Resolution path
- Check Azure Service Health first to rule out platform incident
- Deallocate and reallocate if capacity constraint
- Use Boot Diagnostics to identify OS-level failure
- Attach OS disk to a recovery VM for offline startup repair or chkdsk
- Restore from Azure Backup recovery point if disk corruption is unrecoverable
Prevention
- Enable Azure Backup for all production VMs with daily recovery points
- Use Availability Zones or Availability Sets for critical workloads
- Set Azure Monitor alerts on VM availability and heartbeat
- Take disk snapshots before major OS changes or patch cycles
Tools
- Azure Portal (Boot Diagnostics, Activity Log, Service Health)
- Azure CLI — az vm start / az vm repair
- Azure Serial Console
- Azure Backup (restore from recovery point)
- Network Watcher — Effective Security Rules