1. Understanding Nagle’s Algorithm
- In nginx optimization, there’s a parameter that often needs to be set, tcp_nodelay
- The core function of this parameter is to combine small packets into larger ones, improving bandwidth utilization, which is the famous Nagle’s algorithm.
- In the TCP protocol, there’s a phenomenon: application layer data might be very small (e.g., 1 byte), while the transport layer overhead is 40 bytes (20 bytes for the IP header + 20 bytes for the TCP header). In such cases, most of the transmission is control packets, which increases bandwidth consumption and reduces bandwidth efficiency.
- Nagle’s algorithm is designed to solve this problem. Before the sent data is acknowledged, or before receiving an ACK from the other end, new small packets are not allowed to be sent. They must wait to fill an MSS or wait for an acknowledgment before sending, until timeout.
Certainly! Here’s the rewritten content:—**2. Environment Preparation**To set up the environment, ensure that all necessary software and tools are installed. This includes configuring network settings to optimize performance, such as disabling Nagle’s Algorithm if low-latency communication is required. Nagle’s Algorithm is a method used to improve the efficiency of TCP/IP networks by reducing the number of packets sent over the network. However, in some cases, especially in real-time applications, it might be beneficial to disable it to reduce latency.
Component |
Version |
---|---|
OS |
Ubuntu 18.04.1 LTS |
docker |
18.06.0-ce |
Client: 192.168.17.171
Server: 192.168.17.173
3. Enabling Nagle’s Algorithm
On 192.168.17.173, first prepare an nginx configuration file and enable Nagle’s algorithm by setting tcp_nodelay off;
Start the container
First, use tcpdump to capture traffic on the local machine’s port 80:
On 192.168.17.171, use the ab benchmarking tool to stress test the port.
Note: You must use the -k parameter to simulate Nagle’s algorithm under keepalived mode.
Filter out a lot of information, and we come to this metric: Time per request. No matter how you test, the average delay is always around 40ms.
Let’s look at the packet capture information using Wireshark.
In a large number of packets, let’s process a packet, randomly select a SYN, and select the TCP stream corresponding to that SYN.
Select a segment to analyze
● In Linux, delayed acknowledgment is enabled by default. Delayed acknowledgment means not sending an ACK for every received request but waiting for a while. If there happens to be a packet to send during this time, it will “hitch a ride” and be sent together; otherwise, it will be sent separately after a timeout. So the client will wait 40ms before sending this ACK.● Since nginx also has Nagle’s algorithm enabled, if it doesn’t receive an ACK, it will wait for the packet to arrive, so it looks like this: (1) 192.168.17.171 first sends an HTTP GET request (packet 677) (2) 192.167.17.173 sends PSH, ACK (packet 999) (3) At this point, since Linux has delayed acknowledgment enabled by default, 192.168.17.171 will wait 40ms to see if there’s a “hitch a ride”; and since nginx on 192.168.17.173 has tcp_nodelay off, it will also wait for the ACK to arrive before responding. (4) After 40ms, 192.168.17.171 doesn’t get a “hitch a ride” and sends an ACK (packet 1109) (5) 192.168.17.173 sends HTTP 200 after receiving the ACK (packet 1118) (6) 192.168.17.171 sends a confirmation ACK after receiving the data (packet 1127)
4. Disabling Nagle’s Algorithm
Simply set tcp_nodelay on;
Test again with ab:
Observe the packet capture results again:
● Since the client still has delayed acknowledgment enabled, 192.168.17.171 still doesn’t respond immediately after receiving the data packet.● However, nginx, with tcp_nodelay on, will immediately respond with an ACK after 192.168.17.173 receives the data packet.● After 192.168.17.171 receives it, there are already two unacknowledged data packets, so it will immediately send an ACK for confirmation: (1) 192.168.17.171 first sends an HTTP GET request (packet 447) (2) 192.168.17.173 immediately responds with PSH, ACK (packet 740) (3) 192.168.17.173 sends HTTP 200 (packet 741) (4) 192.168.17.171 responds with ACK (packet 742)
5. Summary
● This article reproduces the classic 40ms problem.● Two terms are mentioned in this article: Nagle’s algorithm and delayed acknowledgment. They seem similar but are not the same. Nagle’s algorithm requires waiting for the other end’s ACK to arrive or filling an MSS before sending a data packet; delayed acknowledgment is for ACKs, which will wait for a “hitch a ride” and send if available, otherwise send separately after a timeout.● Delayed acknowledgment is a feature enabled by default in Linux, so in the experiment, the client will always have delayed acknowledgment. To disable client delayed acknowledgment, you need to set TCP_QUICKACK in setsockopt.● This article mainly discusses nginx’s Nagle’s algorithm. Nagle’s algorithm is entirely determined by the TCP protocol’s ACK mechanism. If the other end’s ACK response is fast, Nagle’s algorithm won’t actually concatenate too many data packets. Although it avoids network congestion, the overall network utilization is still low.● Nagle’s algorithm, when interacting with delayed acknowledgment, can cause severe delay effects, which need to be watched out for.● Whether to enable Nagle’s algorithm in nginx depends on the business scenario. For example, in the experiment, it was observed: (1) tcp_nodelay off increases communication delay but improves bandwidth utilization. It should have a good effect in high-latency, large-data communication scenarios. (2) tcp_nodelay on increases the number of small packets but can improve response speed. It should have a good effect in scenarios requiring high timeliness.