P2 · Network Infrastructure

Connection Refused / Connection Timeout — TCP Network Connectivity Failure to Remote Service or Host

Connection refused and connection timeout errors indicate TCP connection establishment failures to a target service. Connection refused means the target host is reachable but no process is listening on the port; connection timeout means the host is unreachable or packets are being dropped. Resolution requires systematic diagnosis of service health, port binding, firewall rules, DNS resolution, and network path before applying targeted fixes.

Indicators

Client application logs show 'Connection refused' when attempting to reach a remote host on a specific port
Client application logs show 'Connection timed out' or 'i/o timeout' with no response from the target
curl or telnet to the target host:port returns 'Connection refused' immediately
curl or telnet to the target host:port hangs until the OS-level timeout is reached
Health checks or load balancer probes begin failing for a backend service
Downstream services begin returning errors cascading from an upstream connectivity failure

Likely causes

Target service process has crashed or is not running — nothing is bound to the expected port
Target service is listening on a different port or interface (e.g. localhost only instead of 0.0.0.0)
Firewall rule (host-based or network) is blocking inbound or outbound traffic on the required port
Security group or ACL change has revoked access to the target port
DNS resolution failure causing the client to connect to a wrong or non-existent host
Network routing issue or BGP change making the target host unreachable
Service is overloaded and has exhausted its connection queue (backlog full), causing new connections to be refused
Container or pod has been restarted and is not yet ready, but traffic is already being routed to it
TCP keepalive or idle timeout has caused an established connection to be silently dropped by an intermediate device

Diagnostic steps

Check whether the target service process is running and listening on the expected port. Linux: `ss -tlnp | grep <port>` — Windows: `netstat -ano | findstr <port>`

Determines if the failure is caused by the service not running or not binding to the correct interface/port
Attempt a raw TCP connection from the client host to the target host and port: `nc -zv <target-host> <port>` or `telnet <target-host> <port>`

Distinguishes between 'connection refused' (port closed) and 'connection timeout' (packet dropped or host unreachable)
Verify DNS resolution of the target hostname from the client host: `dig <target-hostname>` (Linux) or `nslookup <target-hostname>` (Windows)

Confirms the client is resolving the target hostname to the correct IP address; a stale DNS record can cause connections to the wrong host
Trace the network path from the client to the target host: `traceroute <target-host>` (Linux/macOS) or `tracert <target-host>` (Windows)

Identifies network hops, firewalls, or routers that are dropping traffic before it reaches the target
Check host-based firewall rules on the target host. Linux: `iptables -L -n -v | grep <port>` — Windows: `Get-NetFirewallRule | Where-Object {$_.Enabled -eq 'True'} | Get-NetFirewallPortFilter | Where-Object {$_.LocalPort -eq '<port>'}`

Determines whether a local firewall (iptables, nftables, Windows Firewall, ufw) is blocking the connection at the target
Review recent logs from the target service for crash messages, bind errors, or listen failures: `journalctl -u <service-name> --since '30 minutes ago'` (Linux systemd) or check application log files directly

Reveals whether the service failed to start, crashed, or explicitly refused connections due to an application-level error

Resolution path

If the service is not running: start or restart the service using the appropriate service manager (`systemctl restart <service>` on Linux or `Restart-Service <ServiceName>` on Windows) and confirm it binds to the expected interface and port
If the service is running but listening on the wrong interface (e.g. 127.0.0.1 instead of 0.0.0.0): update the service's bind address configuration to listen on 0.0.0.0 or the correct network interface IP, restart the service, and verify with `ss -tlnp | grep <port>`
If a host-based firewall is blocking the port: add a rule to permit TCP traffic — Linux: `iptables -A INPUT -p tcp --dport <port> -j ACCEPT` — Windows: `New-NetFirewallRule -DisplayName 'Allow Port <port>' -Direction Inbound -Protocol TCP -LocalPort <port> -Action Allow`
If a cloud security group or network ACL is blocking the port: update the security group inbound rules to allow TCP traffic on the required port from the client's source IP or CIDR range
If DNS is resolving to the wrong IP: update the DNS record to point to the correct target IP, flush the DNS cache on the client (`systemd-resolve --flush-caches` on Linux; `ipconfig /flushdns` on Windows), and retry the connection
If the service is overloaded and refusing connections due to backlog exhaustion: increase the process's listen backlog, scale out additional instances or replicas, and tune the OS TCP backlog parameter (`net.core.somaxconn` on Linux) if appropriate

Prevention

Implement health checks and automated service restart policies (e.g. systemd `Restart=on-failure`, Kubernetes liveness probes) to detect and restart crashed services before timeouts cascade
Use infrastructure-as-code (Terraform, CloudFormation, Ansible) to manage firewall rules and security groups with change review processes to prevent accidental port blockages
Set up proactive port monitoring (e.g. Prometheus blackbox exporter TCP probe, Nagios check_tcp) to alert before clients begin experiencing failures
Configure appropriate TCP keepalive settings on long-lived connections to detect silently dropped connections early rather than waiting for application-level timeouts
Document expected service ports and binding configurations in runbooks to enable rapid verification during incidents

Tools

nc / netcat (test TCP connectivity to a host and port)
telnet (basic TCP port reachability test)
ss / netstat (list listening sockets and bound ports)
traceroute / tracert (trace network path to identify where packets are dropped)
dig / nslookup (DNS resolution verification)
curl (HTTP-level connectivity and health check testing)
iptables / nftables / ufw (Linux host-based firewall inspection and management)
journalctl (Linux systemd service log inspection)
Get-NetFirewallRule / Get-NetFirewallPortFilter (Windows firewall rule inspection)

References

The Triage Manual — Batch Entry: Connection refused / connection timeout

networkingtcpconnection-refusedconnection-timeoutfirewalldnsservice-availabilityincident-responselinuxwindowsmicroservicesconnectivityport-reachabilityload-balancerhealth-check