Recently, I discovered through alarms that the application often experiences Debugging Slow Requests exceeding 1 second. After examining the logs and utilizing tools like tcpdump and Wireshark, I eventually identified the issue as a network problem. Throughout the debugging process, I consulted numerous online articles; however, none provided a comprehensive solution. Now, with some spare time, I want to share my experience in troubleshooting these slow requests, hoping it will assist programmers facing similar challenges. The specific troubleshooting steps are as follows:
Check the program logs. There are log outputs for requests that take longer than 100ms. By checking the logs, we found that there are a large number of database and redis timeouts.
Check the slow request of redis. Redis does not find the slow request log. For the method of checking the slow request of redis, please refer to the redis document: http://doc.redisfans.com/server/slowlog.html
Check the slow query log of the database, and there is no corresponding slow request in the database
Finally, I began to suspect that the time was spent on the network connection between the database and redis. I used the tcpdump command to dump the TCP requests on the system for a certain period of time. The specific command was:
tcpdump -i eth0 -s 0 -w tcp_245.pcap host 192.168.1.245
The -i parameter specifies which network card’s TCP request is to be captured, the -w parameter specifies the dump file directory, and the host parameter specifies the requests sent and received by the IP address to be captured. -s 0: The default length of the captured data packet is 68 bytes. Adding -s 0 will capture the complete data packet. For other specific command parameters, you can view them through tcpdump –help
To analyze the dumped TCP requests through Wireshark, you first need to find the desired request in Wireshark. At this time, you need to filter out all redis requests by IP and port number. The specific screenshot is as follows:
Among them, ip.dst == 10.10.1.1 specifies the server IP where redis is located, and tcp.dstport == 7522 specifies the port number of redis. After filtering, it is obvious that there is TCP Retransmission, that is, network retransmission
In order to confirm that the time loss is caused by retransmission, you need to filter the complete record of the request. The specific steps are: select the row of RetransmissionI – right click – select Track Stream – TCP Stream. The screenshot is as follows:
It can be seen from the figure that the seq and ack in the retransmitted TCP packet are exactly the same as the seq and ack in the above TCP packet, and the size of the data packet is exactly the same. It can be confirmed that the same TCP request has been retried, but it is delayed by 200ms in time. This is just one of the TCP requests. If there are about 10 database and redis requests in an HTTP request, the delay time will reach more than 2s.
Summary
This overview provides insights into diagnosing and resolving the issue of Debugging Slow Requests in application performance. The author highlights a recurring problem of requests exceeding one second, traced ultimately to network issues. By checking program logs, they observed numerous database and Redis timeouts, though no specific slow requests were logged by Redis or the database. Suspicion fell on the network connection between these elements. Using tcpdump, they captured TCP requests, revealing network retransmissions as the cause of delays. Analysis with Wireshark confirmed TCP retransmission issues, suggesting significant time loss due to multiple retries. Strategies included using tcpdump to gather data and Wireshark for in-depth analysis. The goal of the author is to assist programmers facing similar challenges by sharing practical troubleshooting steps for identifying the root causes of slow request issues.