Diagnostic playbooks for IT engineers, written like an engineer briefing a colleague at 2am.
Every entry is sourced from real incidents, grounded with verbatim source spans, and reviewed by a senior engineer before it gets published. No filler, no AI fluff, no untraceable claims.
- Verbatim-grounded — every command, registry key, and error code traces back to its source.
- Reviewed by humans — drafts only land here after a senior engineer signs off.
- Dated and versioned — every entry carries a captured-at date and a review-by date.
Domains
Active Directory
Domain controller recovery, FSMO seizure, Kerberos and secure-channel failures, replication faults, GPO, SYSVOL/DFSR — the backbone of Windows identity infrast…
Exchange & Mail Flow
Exchange Online and on-premises mail flow failures, mailbox database recovery, federation trust breaks, hybrid mail routing, and outbound delivery disruptions.
Microsoft 365 & Collaboration
Entra ID / Conditional Access lockouts, Azure AD Connect sync failures, Teams connectivity, Intune enrollment, and Microsoft 365 Backup restore operations.
Virtualisation & Storage
Hyper-V host crashes and VM recovery, VMware ESXi PSOD and host disconnection, RAID array degradation, and storage subsystem performance collapse.
Windows Server
Windows Server boot failures, OS performance degradation, NTFS permissions, RDS CAL exhaustion, licensing and activation, cumulative update failures, in-place…
Network Infrastructure
Switching, VLAN misconfiguration, STP storms, DNS failures, Wi-Fi client drops, router/gateway loss, WAN circuit outages, and firewall policy issues.
Remote Access & VPN
Site-to-site and remote-access VPN failures — IPSec SA negotiation, split tunnelling, routing conflicts, overlapping subnets, OpenVPN throughput, and Cisco ASA…
Cyber Incident Response
Ransomware containment, breach triage, BEC, credential compromise, forensic preservation, and nation-state TTPs — the first hours decide outcomes and notificat…
Backup & Recovery
Veeam / Datto / BCDR appliance failures, restore verification, bare-metal recovery, item-level restores, and backup job certificate mismatches.
Endpoint & Device Management
Intune and MECM policy failures, Windows Update and WSUS patch deployment, Autopilot provisioning, BitLocker/encryption issues, and device compliance remediati…
Cloud & Hybrid Infrastructure
Azure IaaS VM failures, Site Recovery health, hybrid VPN Gateway and ExpressRoute outages, Azure Backup, AWS EC2 connectivity, and Kubernetes control-plane cer…
PKI & Certificate Management
Expired TLS/SSL certificates, ADCS enrollment failures, certificate chain trust, OCSP/CRL unreachability, NDES/SCEP for mobile, code-signing blocks, and Let's…
Severity legend
- P1 Business stopped — total outage, authentication down, ransomware in progress
- P2 Major impairment — large group blocked, mail flow broken, VPN down for remote staff
- P3 Functional but degraded — single subsystem affected, workarounds available
- P4 Hardening / planned — root-cause follow-up, audit, prevention work