Fix TCP Retransmission Issues in NAT Network

1. background

The network environments where the server and client are located are both in NAT mode, which can sometimes lead to TCP retransmission issues.

2. Case environment information

Client operating system: Windows 10 Professional

Client browser: Google Chrome

3. Phenomenon

The client will occasionally display the error: Unable to access this website, XXX.XXX takes too long to respond, and the error code is: ERR_CONNECTION_TIMED_OUT
When the client cannot access the website, the same site can be accessed normally when accessed through the mobile phone at the same time, and the target website can be pinged normally.

4. Troubleshooting process

a. When the website cannot be accessed, check whether the server exit and client exit bandwidth are fully occupied. It is found that the traffic is normal and the bandwidth is not fully occupied.
b. A long ping is performed on the client, and the ping is successful with low latency.
c. Using other network environments, it is found that it can be accessed normally.
d. Capture packets on the client side. Analysis of the captured data shows that a TCP retransmission packet exists in the first packet sent: TCP RETRANSMISSION

TCP Retransmission Issues

e. The first data sent has TCP retransmission, which means that the server did not reply within the specified time.
f. After searching for relevant information online, I learned that in a NAT network environment, the Linux server kernel parameters are configured with TCP fast recycling [net.ipv4.tcp_tw_recycle = 1] and TCP timestamps [net.ipv4.tcp_timestamps = 1]. When multiple clients use the same external IP address to access the server, the timestamps of the data packets transmitted over the NAT network may be out of order, causing the server to consider the data packets unreliable and discard them.
g. Check the WAF server and backend server and find that the kernel parameters are configured:
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_timestamps = 1

5. Solution

Cancel TCP fast recycling and TCP timestamp configuration on the WAF server and backend server
a. Modify the sysctl.conf file
vim /etc/sysctl.conf

b. Modify the configuration

net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_timestamps = 0

c. Application takes effect

sysctl -p

6. Verify

Multiple clients performed a large number of access operations on the website, and JMeter was used to conduct a 30-minute stress test. The website became inaccessible due to TCP Retransmission Issues, but the problem was eventually solved.