T The Triage ManualTechnical Guides for IT Emergencies
P1 · Windows Server

Resource Exhaustion: Disk Full and OOM Kills Causing System/Application Failures

Resource exhaustion occurs when a host runs out of disk space or physical/virtual memory, causing the OS to invoke the Out-of-Memory (OOM) killer or applications to fail writes with ENOSPC errors. This leads to service crashes, database commit failures, log truncation, and cascading failures. Remediation requires immediate identification of the exhausted resource, emergency space/memory recovery, and addressing root causes such as unbounded log growth or memory leaks.

Indicators

Likely causes

Diagnostic steps

  1. Check disk usage across all mounted filesystems: `df -h`
    Pinpoints the exhausted volume so remediation is targeted correctly
  2. Identify the largest directories consuming space on the full partition: `du -sh /* 2>/dev/null | sort -rh | head -20`
    Reveals whether logs, databases, core dumps, or another artifact is the culprit
  3. Check kernel and system logs for OOM kill events: `dmesg -T | grep -i 'oom\|killed process\|out of memory'`
    Confirms whether OOM kills occurred, identifies the killed process, and timestamps the event for correlation
  4. Check current memory usage and identify top memory consumers: `free -h && ps aux --sort=-%mem | head -20`
    Determines whether memory pressure is still present and which process is consuming the most RAM
  5. On Linux, check journald for OOM events: `journalctl -k --since '1 hour ago' | grep -i 'oom\|memory'`; on Windows, check System Event Log for source 'Microsoft-Windows-Resource-Exhaustion-Detector'
    Provides additional context on frequency and pattern of OOM events
  6. Identify open but deleted files holding disk space (Linux only): `lsof +L1 | grep deleted`
    On Linux, files deleted while held open by a process do not free disk space until the process releases the file descriptor — common cause of persistent 'disk full' after apparent cleanup

Resolution path

Prevention

Tools

disk-fulloom-killerresource-exhaustionmemorystoragelinuxwindows-serverincident-responseP1log-managementcontainersdatabase