Restore verification gap — backups exist, but do they actually restore?
Backup jobs say 'success' but no one has verified by restoring. The discovery happens during disaster — too late. Methodology to close the gap.
Indicators
- No documented restore test in the last 6 months
- Backup job 'success' but VSS writers in warning state at backup time
- Recovery time and recovery point objectives undefined or unmet
- Change of platform (e.g. cloud migration) without revisiting DR plan
Likely causes
- Cultural: 'the backups have always worked' assumption
- Resourcing: full restore tests deferred
- Tool gaps: SureBackup / virtualisation not licensed / configured
- Documentation gaps: no defined RTO/RPO per system
Diagnostic steps
-
Inventory each tier-1 system; for each: stated RTO, stated RPO, last successful restore test
-
Run a sandbox restore of one tier-1 system — measure time, completeness, application integrity
-
Identify gaps: missing application-aware components (DB transaction logs, AD system state, M365 tenant)
-
Document the actual RTO/RPO observed vs target — present the delta to leadership
-
Add automated restore verification (Veeam SureBackup / Datto local virtualisation / BackupAssist) to close the gap recurringly
Resolution path
- Move from 'backup success rate' KPI to 'tested restore' KPI
- Schedule recurring restore drills (quarterly tier-1, half-yearly tier-2)
- Update DR runbook from observed times, not assumed
- Report to leadership as a risk metric
Prevention
- Automated restore verification (SureBackup / equivalents)
- Defined RTO/RPO with sign-off per system
- DR runbook reviewed and exercised annually
- Immutable backup retention to protect tested points
Tools
- Veeam SureBackup
- Datto local virtualisation
- Hyper-V / VMware sandbox network for safe restore
- Backup vendor PowerShell modules for scheduled tests
References
- ISO 22301 — Business continuity management
- NIST SP 800-184 — Cybersecurity event recovery
- NCSC — Backup strategies