A Veeam backup job has been failing for days. Or weeks. Or — and this is the call we get most — the customer has just discovered the backups have been red on the dashboard for two months, and now the server they wanted to restore from is the one that died.

Backup failures are unique in that they are the disaster you don't notice until the disaster you actually feared has already happened. The cure is not to wait for a restore — it's to read backup failures the same day they appear. This guide walks through how a senior engineer diagnoses Veeam backup failures across Veeam Backup & Replication 12.x, the common error families, the meaningful repair steps, and the verification routine that turns "we have backups" into "we have backups that actually restore". Written for IT managers and backup admins running Veeam in Hyper-V or VMware environments.

If you have a backup failure now and a recovery scenario looming, call 01923 372471 — senior engineer answers directly, on-site within 2 hours.

Step 1: Read the actual error, not the colour

The Veeam dashboard shows job status as Success / Warning / Failed. Don't stop at the colour — open the job report and read the error text. Veeam's errors are precise; the cure follows directly from the wording.

In the Veeam console: Home > Jobs > [job] > right-click > Statistics > [latest session] > Errors tab.

Or from PowerShell:

Add-PSSnapin VeeamPSSnapin
Get-VBRJob -Name "<job name>" | Get-VBRBackupSession | Sort-Object EndTime -Descending |
    Select-Object -First 5 | Get-VBRTaskSession |
    Select-Object Name, Status, @{n="Reason";e={$_.Result.Reason}}

The Reason field is the answer to "why did this fail?". Capture it; it tells you which class of issue you're in:

Error categoryCommon Reason textSection
VSS / guest processing"Failed to create snapshot", "VSS provider failed", "Cannot freeze guest"Step 2
Network / repository"Cannot connect to host", "Failed to write block", "I/O error"Step 3
Storage / capacity"Not enough disk space", "Backup repository is full"Step 4
Retention / chain"Cannot find required backup file", "Backup chain is corrupted"Step 5
Authentication / agent"Logon failure", "Cannot establish connection"Step 6
Hypervisor"VSphere snapshot failed", "Hyper-V change tracking failed"Step 7

Step 2: VSS / guest processing failures

The Volume Shadow Copy Service is responsible for taking application-consistent snapshots of running services (SQL, Exchange, AD). When VSS fails, the backup either falls back to crash-consistent (worse for databases) or fails outright.

Common errors

Diagnose

On the affected VM (or physical server):

vssadmin list writers

Every writer should show State: [1] Stable and Last error: No error. Any writer in Failed state, or with a non-zero last error, is the cause.

# Restart the writers (no service restart approach):
# - For most application writers, restart the service hosting them.
# - SQL writer: restart "SQL Server VSS Writer" service.
# - Exchange writer: restart "Microsoft Exchange Replication" service.
# - System / shadow copy writers: restart "Volume Shadow Copy" and "COM+ System Application".

Restart-Service VSS, "COM+ System Application"

For persistent VSS failures:

# Re-register VSS components (run as admin):
cd /d %windir%\system32
net stop vss
net stop swprv
regsvr32 /s ole32.dll
regsvr32 /s oleaut32.dll
regsvr32 /s vss_ps.dll
vssvc /register
regsvr32 /s /i swprv.dll
regsvr32 /s /i eventcls.dll
regsvr32 /s es.es
regsvr32 /s stdprov.dll
regsvr32 /s vssui.dll
regsvr32 /s msxml.dll
regsvr32 /s msxml3.dll
regsvr32 /s msxml4.dll
net start swprv
net start vss
vssadmin list writers

Reboot to clear stuck snapshot operations is sometimes the only fix — schedule it; check vssadmin list shadows for stuck shadow copies and remove them with vssadmin delete shadows /all (when no backup is actively running).

VMware-specific VSS

For VMware-hosted VMs, application-aware processing relies on VMware Tools' VSS provider. If that's missing or broken:

Hyper-V-specific VSS

Hyper-V uses the Hyper-V Volume Shadow Copy Requestor integration component on the guest. Check:

Step 3: Repository connectivity / I/O failures

Common errors

Diagnose

For SMB repositories:

# Test from the Veeam server / proxy:
Test-NetConnection -ComputerName <repo-host> -Port 445
net use \\<repo-host>\<share> /user:<account> *

The account Veeam uses to connect to the repository must have the rights — for SMB repositories on a Windows file server, the simplest reliable model is a dedicated veeam-repo service account with full control on the share and on the underlying NTFS folder. If that account password has rotated and Veeam is using the old one, the job fails; recreate the credential in Veeam under Credentials Manager.

For dedicated-Linux hardened repositories (the recommended modern Veeam pattern):

# On the Linux repo:
df -h /mnt/veeam-repo                      # free space
ls -la /mnt/veeam-repo                     # ownership
sudo journalctl -u veeamtransport --since "1 hour ago"   # transport service log

Veeam Transport runs as the user veeamtransport (or whatever was specified at config). If the underlying disk has filled, gone read-only (filesystem hit a fault and remounted RO), or the filesystem has degraded, transport fails immediately.

Object-storage repositories

Capacity Tier (S3, Wasabi, Azure Blob, Backblaze B2) failures usually trace to:

Test the credentials independently with aws s3 ls (S3-compatible) or equivalent before assuming Veeam is broken.

Step 4: Repository capacity / chain length

Common errors

What's actually happening

A Veeam backup chain consists of a full backup file (.vbk) and a sequence of incrementals (.vib for forward incremental, .vrb for reverse). The repository must hold enough space for the whole chain plus the next incremental, plus headroom for Veeam's working space.

A common pattern: the customer set retention to 30 days, never sized the repository to support 30 days of data growth, and one busy quarter the repository fills.

# In the Veeam console:
Get-VBRBackup -Name "<job name>" | Select-Object Name, FullSize, IncrementSize, Files

To free space without breaking chains:

Step 5: Backup chain corruption

The error you don't want: Cannot find required backup file or Backup chain is corrupted. The job knows it needs a specific .vbk or .vib file; the file is missing, damaged, or unreadable.

Causes

Recovery options

Option A — Health check / repair

In the Veeam console, right-click the chain → Backup files > Health Check. Veeam scans every block, identifies corrupt blocks, and (if forward-incremental) writes good blocks into the next backup run. This rebuilds the chain over the next few backup cycles without losing all history.

Option B — Active full

Force the job's next run to be a new full backup, starting a fresh chain. Old chain remains for restore until retention removes it. This is the cleanest path forward if the corruption isn't restorable.

Option C — Restore from copy

If you have a Backup Copy Job to a secondary repository (or to capacity tier in the cloud), you can restore from that — even if the primary chain is destroyed. This is the value proposition of 3-2-1.

Option D — Rebuild the configuration

If the Veeam configuration database itself is the problem (vbm files exist on disk but Veeam doesn't see them), use Import Backup to rebuild the chain entry in the database from the on-disk files.

What you don't have

If the chain is corrupt and you have no copy: that range of historical backups is gone. The future chain (from the next active full forward) will be intact, but you have lost the ability to restore from any point during the corrupt period. This is precisely why 3-2-1 exists.

Step 6: Authentication / agent failures

Common errors

The Veeam server pushes a runtime agent to each protected machine over administrative shares (\\server\admin$) using an account with local admin rights. If the account password has changed, the local admin shares are disabled, or a firewall rule is blocking SMB, agent push fails.

Fix

Step 7: Hypervisor-side failures

VMware

Hyper-V

Verifying backups actually restore

Backup that hasn't been verified is hope, not insurance.

Manual: scheduled test restore

Once a quarter, restore one VM (or one critical file set) to an isolated network. Boot it. Log in. Confirm services start. Document the test in your ticketing system. If a restore fails, you've found a problem before the disaster does.

Automated: Veeam SureBackup

SureBackup runs scheduled test restores in a sandboxed virtual lab, boots the VMs, runs heartbeat tests, and reports pass/fail. Available in Veeam Enterprise and higher editions. The single most valuable feature in Veeam for businesses without the time for manual quarterly tests.

Automated: Veeam Backup Validator

Free utility (separate executable on the Veeam server) that checksums every backup file against its stored hash. Doesn't test restorability of the application, but catches silent file corruption before it becomes catastrophic.

"C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Validator.exe" /backup:"<job name>"

Run weekly via Task Scheduler. Alert on any failure.

What NOT to do

Prevention

When to call us

Call us if:

Engineerdirect.co.uk has senior engineers across Veeam Backup & Replication, Datto BCDR, Microsoft Azure Backup, and the underlying VMware / Hyper-V / NAS / object-storage layers.

Call 01923 372471 — senior engineer answers directly. Same-day on-site across London and the South East.

FAQ

My job has been showing "Success with warnings" for weeks. Is that OK? No. Warning indicates the backup ran but with degraded state — most often falling back from application-aware to crash-consistent. For SQL/Exchange/AD this means the backup is taken but not in a properly recoverable state. Investigate the warning, fix the root cause, get back to clean Success.

Should I delete old backup files manually if the repo is full? No — Veeam tracks files in its database, and manual deletion creates orphan entries. Reduce retention in the job, run an Active Full to start a new chain, or move data to a larger repository.

Can I run backups during business hours? Yes for most workloads — VSS snapshots are quick (typically under a minute) and the backup itself runs against the snapshot, not the live data. SQL/Exchange-heavy environments benefit from out-of-hours runs to avoid log truncation timing issues. Check vssadmin list writers during a backup window to spot any writer that's struggling.

My backup repository is on the same domain as my production environment. Is that a problem? For ransomware resilience, yes — a domain admin compromise reaches your backups. The modern Veeam pattern is: production data on the domain, backups on a hardened Linux repository in a workgroup, with credentials only known to Veeam, immutability enabled. This survives a domain-wide compromise that an in-domain repository would not.

Do I need both Veeam Replication and Veeam Backup? Replication gives near-real-time copies of running VMs to a secondary host (low RPO/RTO for infrastructure failure). Backup gives point-in-time copies with retention (recovery from corruption, deletion, ransomware). Most businesses need both — replication for "the host died" and backup for "someone deleted important files three weeks ago".

How big should my Veeam repository be? Rule of thumb: roughly the size of your protected production data, multiplied by 1.5 for a year of forward-incremental retention with monthly active fulls and 4 weekly retention points. So 5 TB of production = roughly 7-8 TB of repository. Tune by watching real growth over a few months.


If backups are failing with a restore looming, our emergency backup & server recovery can step in before it becomes a data-loss event.

Part of a series of disaster-recovery references. If your backups are broken and a restore is approaching: 01923 372471.

References

Authoritative vendor documentation behind this guide:

Dealing with this right now?

Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.

📞 01923 372471