P2 · Network Infrastructure

HTTP 502 Bad Gateway — Upstream Server Connectivity Failure (Nginx / HAProxy / ALB / IIS ARR)

An HTTP 502 Bad Gateway error is returned when a reverse proxy, load balancer, or API gateway receives an invalid or no response from an upstream backend server. Root causes include the backend process being stopped or crashed, network/firewall blocking gateway-to-backend traffic, misconfigured upstream address or port in the proxy configuration, or the backend being overloaded and dropping connections before sending a valid HTTP response. Resolution requires identifying whether the upstream is down, unreachable, or misconfigured, then restoring the backend service or correcting the proxy configuration. This error is distinct from HTTP 503 (service unavailable) and HTTP 504 (gateway timeout) and demands upstream-side investigation, not client-side remediation.

Indicators

HTTP 502 status code returned to clients by the gateway or proxy layer
Upstream server not responding or returning malformed HTTP responses
Gateway/proxy error logs showing 'connect() failed', 'upstream prematurely closed connection', or 'no live upstreams'
Clients unable to reach the application despite the proxy/gateway itself being reachable and returning a response

Likely causes

Upstream/backend server process is crashed, stopped, or unresponsive (no listener on the expected port)
Backend application returning malformed or incomplete HTTP responses (missing status line, premature connection close) that the proxy cannot relay
Network connectivity failure between the proxy/gateway and the upstream backend — firewall rules or routing issues blocking the backend port
Backend server overloaded and dropping connections before sending a response (worker/thread pool exhaustion, resource saturation)
Misconfigured upstream address or port in the proxy/gateway configuration (e.g. wrong hostname, wrong port after a migration or rename)
Backend process has exhausted available workers, file descriptors, or threads and cannot accept new connections

Diagnostic steps

Check the gateway/proxy error log for the 502 event to identify which upstream host and port is failing and the exact error message. For Nginx: `tail -100 /var/log/nginx/error.log | grep 502`. Look for messages such as 'connect() failed', 'upstream prematurely closed connection', or 'no live upstreams while connecting to upstream'.

Pinpoints which backend target the proxy is blaming and the nature of the failure (connection refused, connection timeout, bad response), narrowing the investigation before touching the backend.
From the gateway server, test direct TCP connectivity to the upstream backend on the expected port: `curl -v http://<backend-host>:<port>/healthcheck` or `telnet <backend-host> <port>` or `nc -zv <backend-host> <port>`.

Determines whether the gateway can reach the backend at the network and TCP level, ruling in or out firewall/routing issues independently of the application layer.
On the upstream backend server, verify that the application process is running and listening: Linux: `systemctl status <service-name>` and `ss -tlnp | grep <port>`. Windows: `Get-Service -Name <ServiceName>` and `netstat -ano | findstr <port>`. Review application logs for crashes, OOM events, or startup errors.

Confirms whether the backend service is running and bound to the expected port, or has crashed/stopped — the most common cause of 502 errors.
Review the proxy/gateway configuration to verify the upstream address, port, and protocol are correctly specified and match the actual backend listener. For Nginx: `cat /etc/nginx/nginx.conf` (or the relevant server block). For HAProxy: `cat /etc/haproxy/haproxy.cfg`. Validate config syntax before reloading: `nginx -t` or `haproxy -c -f /etc/haproxy/haproxy.cfg`.

Rules out configuration drift or deployment errors as the cause of the 502, especially after a recent infrastructure change, server rename, or migration.
Check backend server resource utilisation for signs of exhaustion: `top` or `htop` (CPU/memory), `vmstat 1 5` (memory pressure, swap), `netstat -an | grep <port> | wc -l` or `ss -s` (connection counts), `ulimit -n` and `cat /proc/<pid>/fd | wc -l` (file descriptor limits). On Windows: Task Manager or `Get-Process | Sort-Object CPU -Descending | Select-Object -First 10`.

Identifies if the backend process is alive but overwhelmed — saturated CPU, exhausted worker threads, or file descriptor limits — causing connections to be dropped before a valid HTTP response is formed.

Resolution path

If the backend process is stopped or crashed: restart the upstream service (`systemctl restart <service-name>` on Linux, or restart via Services MMC / `Restart-Service <ServiceName>` on Windows) and confirm it is listening on the expected port (`ss -tlnp | grep <port>` or `netstat -ano | findstr <port>`) before expecting 502s to clear.
If a firewall or security group rule is blocking gateway-to-backend traffic: update the firewall ACL or cloud security group to permit TCP traffic on the required port between the proxy host and the backend hosts. Verify with `nc -zv <backend-host> <port>` from the gateway after the rule change.
If the proxy upstream configuration specifies an incorrect address or port: update the upstream `server` directive (Nginx) or `server` line in the backend block (HAProxy) to match the actual backend listener, validate the configuration (`nginx -t` or `haproxy -c -f /etc/haproxy/haproxy.cfg`), then reload (`systemctl reload nginx` or `systemctl reload haproxy`) — do not restart, which drops active connections.
If the backend is overloaded and dropping connections: scale horizontally by adding backend instances to the upstream pool, increase the worker/thread pool limits in the backend application configuration, and/or temporarily apply rate limiting at the gateway layer to reduce upstream load while scaling completes.
If the backend is returning malformed HTTP responses (missing status line, premature close): review the application error logs for the root cause (dependency failure, uncaught exception, response buffer overflow), fix the application-layer bug, redeploy, and confirm the backend returns valid HTTP 200 responses before removing any rate limits or temporary mitigations.

Prevention

Implement backend health checks in the proxy/load balancer configuration so that unhealthy upstream instances are automatically removed from rotation before clients experience 502 errors (Nginx: `health_check` in upstream block; HAProxy: `option httpchk`; ALB: configure target group health checks).
Configure application-level and infrastructure-level monitoring and alerting on backend service availability, HTTP error rates, and resource utilisation, enabling proactive response before 502s become widespread.
Use connection pooling and graceful shutdown procedures in the backend application to prevent abrupt connection drops that produce 502s during rolling deployments or planned restarts.
Maintain infrastructure-as-code (e.g. Terraform, Ansible) for proxy configuration to prevent manual misconfiguration, enforce configuration review before applying upstream address changes, and enable rapid rollback.

Tools

curl (test HTTP connectivity and response codes from gateway to backend — `curl -v http://<backend>:<port>/health`)
telnet / nc (test raw TCP connectivity to backend port — `nc -zv <host> <port>`)
systemctl / Get-Service (check and control backend service status on Linux and Windows)
nginx -t (validate Nginx configuration syntax before reload)
haproxy -c -f /etc/haproxy/haproxy.cfg (validate HAProxy configuration syntax before reload)
netstat / ss (inspect active connections and listening ports on backend)
top / vmstat / htop (assess backend resource exhaustion — CPU, memory, swap)
Proxy/gateway error log (primary source of 502 cause detail — Nginx: /var/log/nginx/error.log)

References

http-502bad-gatewayreverse-proxyload-balancerupstream-failurenginxhaproxyiis-arrapi-gatewayweb-serverconnectivityincident-responsealbcloudflare