HTTP 503 Service Unavailable — Web Server Process Down or Upstream Backend Unreachable (IIS / Nginx / Apache / Load Balancer)
HTTP 503 Service Unavailable is returned when the web server or reverse proxy cannot reach a healthy backend to fulfil a request. The failure originates at the web server process, application pool, or load balancer layer — not the client. Common causes include a crashed or stopped application pool (IIS), a failed Linux service, resource exhaustion (CPU, memory, thread pool, file descriptors), a bad deployment causing startup failure, or an unavailable downstream dependency (database, cache, message queue). Resolution follows a layered approach: confirm the serving process state, review logs for crash or startup errors, test backend dependency reachability, address resource exhaustion, then restart or roll back as appropriate.
Indicators
- HTTP 503 status code returned to all clients attempting to reach the service endpoint
- Browser or API client displays 'Service Unavailable' or a custom 503 error page
- Monitoring alert fires on HTTP 5xx error rate spike across the affected endpoint
- Application pool found in Stopped state in IIS Manager or via PowerShell Get-WebConfiguration
- Load balancer health checks report all or some backend nodes as unhealthy
- Windows Event Log Application channel shows Event source 'ASP.NET' or 'W3SVC' with crash or startup failure entries
- Windows Event ID 5002 logged indicating IIS application pool failure and automatic shutdown
Likely causes
- IIS application pool has stopped or crashed due to an unhandled exception, reaching the rapid-fail protection crash threshold (default: 5 failures in 5 minutes), leaving it in Stopped state
- Recent deployment introduced a startup exception or misconfigured connection string causing the application to fail immediately on every request
- Backend dependency (SQL Server, Redis, external API) is unavailable, causing application health checks to fail and the process to reject new connections
- Resource exhaustion (CPU saturation, memory pressure, file descriptor limit, IIS request queue length exceeded, .NET thread pool starvation) preventing the service from accepting new requests
- Linux service process has exited or been OOM-killed and systemd has not restarted it (or restart limit exceeded)
- Load balancer has marked all backend pool members as unhealthy due to failed health check probes
Diagnostic steps
-
Check the status of the web server process and application pool. On IIS (PowerShell): `Get-WebConfiguration system.applicationHost/applicationPools/add | Select-Object name, state` or open IIS Manager → Application Pools and look for Stopped state. On Linux: `systemctl status <service-name>` and `ps aux | grep <process-name>`. For containers: `docker ps -a` or `kubectl get pods -n <namespace>`.Establishes immediately whether the serving process is running at all or has crashed/stopped — the most common cause of 503.
-
Review web server error logs for 503 entries and upstream/backend error messages from around the time of onset. IIS: `%SystemDrive%\inetpub\logs\LogFiles\W3SVC<SiteID>\` — filter for sc-status=503. Nginx: `/var/log/nginx/error.log` — filter for 'connect() failed' or 'upstream'. Apache: `/var/log/apache2/error.log` or `/var/log/httpd/error_log`. Use: `Get-Content <logpath> | Select-String '503'` (Windows) or `grep ' 503 ' /var/log/nginx/access.log` (Linux).Determines whether 503 is generated locally (process down) or proxied from a failing upstream backend, and provides exact timestamps and error context for root cause narrowing.
-
Check application-level and system event logs for startup exceptions, unhandled errors, or crash events coinciding with 503 onset. On Windows: open Event Viewer → Windows Logs → Application, filter for sources 'ASP.NET', 'W3SVC', '.NET Runtime', or 'Application Error'. Look specifically for Event ID 5002 (application pool disabled). On Linux: `journalctl -u <service-name> --since '30 minutes ago'` and application-specific log files.Pinpoints whether an application crash, bad deployment artifact, missing dependency, or configuration error is preventing successful process startup.
-
Test direct connectivity to backend dependencies from the application server. For SQL Server: `Test-NetConnection -ComputerName <db-host> -Port 1433`. For Redis/Memcached: `Test-NetConnection -ComputerName <cache-host> -Port 6379`. For HTTP upstream: `curl -o /dev/null -s -w "%{http_code}" http://<upstream-host>:<port>/health`. Confirm response time is within acceptable bounds.Isolates whether the 503 root cause is on this host or in a downstream dependency — determines whether the fix is here or elsewhere in the stack.
-
Check system resource utilisation on the application host. Windows: `Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, CPU, WorkingSet` and Task Manager Performance tab. Linux: `top` or `htop`, `free -m`, `ulimit -n` (open file descriptors), `ss -s` (socket summary). For IIS specifically, check the request queue length in Performance Monitor: counter 'Web Service\Current Connections' and 'ASP.NET Applications\Requests In Application Queue'.Determines whether CPU saturation, memory exhaustion, file descriptor limits, or request queue overflow is causing the service to refuse new connections even if the process is running.
Resolution path
- If the IIS application pool is in Stopped state: run `Start-WebAppPool -Name '<AppPoolName>'` or right-click → Start in IIS Manager. Monitor for immediate recurrence — if it stops again within seconds, rapid-fail protection has re-triggered; check Event ID 5002 and application logs for the underlying crash before attempting further restarts.
- If a Linux service has stopped: run `systemctl restart <service-name>`. If systemd reports the restart limit has been exceeded, run `systemctl reset-failed <service-name>` first, then `systemctl restart <service-name>`. Monitor with `journalctl -u <service-name> -f`.
- If a recent deployment caused the failure: roll back to the last known-good artifact — previous container image tag, application package version, or code revision — redeploy, then verify the 503 clears with `curl -o /dev/null -s -w "%{http_code}" https://<endpoint>`. Ensure configuration files and connection strings are also rolled back, not just binaries.
- If resource exhaustion is confirmed: scale out additional instances or increase IIS request queue length (`%windir%\system32\inetsrv\appcmd.exe set config -section:system.webServer/serverRuntime /appConcurrentRequestLimit:<value>`) or increase thread pool size; then recycle the affected process.
- If a backend dependency (database, cache) is unavailable: restore that dependency first using its own runbook, then recycle the application pool or restart the service so it can reconnect cleanly rather than inheriting a failed connection state.
- If the 503 is returned by a load balancer due to failed health checks: investigate each backend node individually using the steps above, bring nodes back into service one at a time, and confirm health check probes return 200 before re-enabling each node in the pool.
Prevention
- Configure IIS rapid-fail protection with tuned thresholds and enable auto-restart so crashed application pools are automatically restarted rather than remaining in Stopped state; set Event ID 5002 alerting so on-call engineers are notified before users report widespread 503s.
- Configure load balancer health checks with appropriate failure thresholds and draining periods so that a single failing backend node is removed from the pool before clients experience 503s, enabling zero-downtime single-node failure handling.
- Implement pre-deployment smoke tests and staged rollout (canary or blue/green deployment) to catch application startup failures in a limited traffic slice before they affect 100% of production traffic.
- On Linux, configure systemd service units with `Restart=on-failure` and `RestartSec=5` to ensure the process is automatically restarted after a crash without manual intervention.
Tools
- IIS Manager — inspect and restart application pools and sites on Windows/IIS
- PowerShell Get-WebConfiguration / Start-WebAppPool — manage IIS application pools from CLI
- systemctl — start, stop, restart, and inspect Linux service units
- curl — test HTTP response codes and latency from command line
- Windows Event Viewer — review Application and System logs for process crash events (Event ID 5002)
- top / htop — Linux real-time resource utilisation
- Task Manager / Get-Process — Windows process and resource monitoring
- Test-NetConnection — PowerShell TCP connectivity test to backend dependencies
- journalctl — Linux systemd service log inspection