Azure Backup Vault — Restore Failed or RPO Breach
Azure Backup restore job fails, backup job stops completing, or the Recovery Point Objective (RPO) has been breached because no recent recovery points exist. Affects Azure VMs, SQL Server in Azure, Azure Files, and on-premises workloads protected via MARS/MABS agents.
Indicators
- Restore job shows 'Failed' in Recovery Services vault > Backup jobs
- Last successful backup older than expected RPO window
- Backup job stuck in 'In Progress' for more than 4 hours
- Alert: 'Azure Backup pre-check warning' or 'Backup health degraded'
- MARS agent shows 'The process cannot access the file because it is being used by another process'
Likely causes
- VM snapshot failure — Azure agent not responsive or OS disk full
- Insufficient permissions — Backup MSI lacks Contributor on VM or Key Vault access for encrypted VMs
- Restore target storage account or VM size unavailable in the region
- MARS/MABS agent version outdated and incompatible with current Azure APIs
- Soft-delete enabled — recovery point appears deleted but is in soft-delete state
Diagnostic steps
-
Recovery Services vault > Backup jobs — find the failed job, click it, expand 'Error details' — note the error code (e.g., UserErrorVmNotInDesirableState, ExtensionFailedInVMAccessExtension)
-
Check VM agent status: Azure portal > VM > Settings > Extensions — verify 'MicrosoftAzureRecoveryServices' or 'VMSnapshot' extension is healthy
-
For encrypted VM restores: verify the vault's MSI has 'Key Vault Administrator' or appropriate Get/List permissions on the Key Vault
-
Check restore target: ensure target VM size exists in target region, storage account is in same region as vault, and no resource locks block creation
-
For MARS agent: check agent version in the vault > Backup infrastructure > confirm minimum supported version; update if outdated
-
Verify backup policy RPO settings: vault > Backup policies — confirm schedule and retention match SLA expectations
Resolution path
- Resolve agent/extension error first — reinstall VMSnapshot extension if corrupted
- Fix permissions gap — assign Backup MSI correct RBAC role on VM, RG, or Key Vault
- For stuck jobs: stop and re-trigger backup from the vault
- For restore failures: try cross-region restore to alternate region if primary restore target has capacity issues
- Update MARS/MABS agent to latest version if compatibility error
Prevention
- Enable Azure Backup alerts and configure Action Groups to page on-call when backup fails
- Test restores quarterly — a backup never tested is not a backup
- Enable soft-delete on all Recovery Services vaults (14-day retention minimum)
Tools
- Azure Recovery Services Vault — Backup jobs, Backup alerts
- Azure Monitor — Backup diagnostics workbook
- Azure CLI: az backup job list --vault-name <v> --resource-group <rg>
- Azure Backup Explorer (workbook in Azure Monitor)