T The Triage ManualTechnical Guides for IT Emergencies
P2 · Network Infrastructure

HTTP 504 Gateway Timeout — Reverse Proxy or API Gateway Cannot Reach Upstream Server

An HTTP 504 Gateway Timeout is returned when a reverse proxy, API gateway, or load balancer (e.g., Nginx, HAProxy, AWS ALB, IIS ARR) fails to receive a timely response from the upstream application server or backend service. The upstream may be down, overloaded, firewalled off, or suffering a network path failure between the proxy and the backend. Resolution requires isolating whether the failure is at the application layer (crashed service, resource exhaustion), the network layer (routing loss, firewall block), or the configuration layer (proxy timeout set too short for legitimate upstream response times), then applying the corresponding fix — service restart, capacity scaling, timeout tuning, or firewall rule correction.

Indicators

Likely causes

Diagnostic steps

  1. Review proxy or gateway access and error logs for the exact upstream address, port, and timeout duration at the time of 504 errors. On Nginx: `tail -100 /var/log/nginx/error.log | grep upstream`. On Apache: `tail -100 /var/log/apache2/error.log | grep proxy`. On AWS ALB: filter access logs in S3 for records with elb_status_code=504 and note the target_status_code and target_processing_time fields.
    Identifies which upstream host and port is failing to respond, and whether the timeout is consistently hitting a specific threshold — distinguishes a single-backend failure from a systemic issue.
  2. From the proxy/gateway server, test direct TCP connectivity to the upstream host and port: `curl -v --max-time 10 http://<upstream-host>:<port>/health` and `telnet <upstream-host> <port>` (or `nc -zv <upstream-host> <port>`). If the upstream uses HTTPS internally: `curl -vk --max-time 10 https://<upstream-host>:<port>/health`.
    Determines whether the upstream is reachable at the TCP layer from the proxy's perspective — isolates network-layer failure (connection refused, timeout) from application-layer failure (connection succeeds but response is slow or malformed).
  3. Check upstream application server health: `systemctl status <service-name>` to confirm the process is running. Check resource utilisation: `top` or `vmstat 1 5` for CPU/memory pressure. Check open file descriptors: `ulimit -n` and `ls /proc/<pid>/fd | wc -l`. Verify the upstream is listening on the expected port: `ss -tlnp | grep <port>`.
    Identifies whether the upstream is running but overwhelmed (resource exhaustion, connection queue full) or has crashed and is not listening — drives the choice between restart vs. scale-out.
  4. Run a traceroute or MTR from the proxy server to the upstream server to identify network path degradation: `traceroute <upstream-host>` or `mtr --report --report-cycles 20 <upstream-host>`. Look for hops with >5% packet loss or latency spikes inconsistent with expected network topology.
    Pinpoints any intermediate network hop introducing latency or packet loss between the proxy and upstream — required before escalating to the network team or cloud provider.
  5. Review the proxy timeout configuration and compare against upstream response time metrics. On Nginx: `grep -E 'proxy_(read|connect|send)_timeout' /etc/nginx/nginx.conf /etc/nginx/conf.d/*.conf`. On Apache: `grep ProxyTimeout /etc/apache2/*.conf`. On AWS ALB: check the target group's 'Idle timeout' setting in the AWS Console or via CLI: `aws elbv2 describe-target-group-attributes --target-group-arn <arn>`. Compare timeout values against the upstream's actual 99th-percentile response time from APM or application logs.
    Determines whether the 504s are caused by a legitimate slow upstream being cut off by an under-configured timeout, rather than a true upstream failure — prevents misdiagnosis and unnecessary service restarts.

Resolution path

Prevention

Tools

References

http-504gateway-timeoutproxyreverse-proxyload-balancerupstreamnginxhaproxyaws-albnetworkingtimeoutavailabilityincident-responseapi-gatewayconnectivity