Understanding the TCP Wave Process: A Deep Dive into Four-Way Handshake and Connection Closure

If you’re not familiar with the handshake and TCP wave process, please first read this blog: “What You Need to Know About the Three-Way Handshake and Four-Way Wave“.

 TCP wave process>

Four-way wave process:

First wave: Host A (which can be a client or server) sets the Sequence Number and Acknowledgment Number, and sends a FIN segment to Host B. At this point, Host A enters the FIN_WAIT_1 state, indicating that Host A has no more data to send to Host B.

Second wave: Host B receives the FIN segment from Host A, sends back an ACK segment to Host A with the Acknowledgment Number being Sequence Number plus 1. Host A enters the FIN_WAIT_2 state. Host B tells Host A that it also has no data to send, and the connection can be closed.

Third wave: Host B sends a FIN segment to Host A to request to close the connection and enters the CLOSE_WAIT state.

Fourth wave: Host A receives the FIN segment from Host B, sends an ACK segment to Host B, and then enters the TIME_WAIT state. After Host B receives the ACK segment from Host A, it closes the connection. At this point, Host A waits for 2MSL and if no reply is received, it confirms that Host B has closed normally, allowing Host A to close the connection too.

High TIME_WAIT

Cause of the Issue

There is an explanation about “the significance of waiting for 2MSL when releasing the connection during the four-way wave” in “What You Need to Know About the Three-Way Handshake and Four-Way Wave“. Because of the existence of 2MSL, a high number of TIME_WAIT states may occur, affecting server performance and even causing the number of sockets to reach the server limit.

In fact, the impact of TIME_WAIT on system resource consumption is relatively small, and the actual concerns due to a high number of TIME_WAIT involve the following factors:

  1. Number of source ports (net.ipv4.ip_local_port_range)
  2. TIME_WAIT bucket count (net.ipv4.tcp_max_tw_buckets)
  3. Number of file descriptors (max open files)

Solutions

Simply optimize the network configuration and connection configuration of the server system, use socket reuse, or promptly release resources. (Due to ongoing system iterations, specific parameter modifications are not provided here.)

High CLOSE_WAIT

Cause of the Issue

Host B continuously not initiating the third wave will lead to a high number of CLOSE_WAIT state connections on Host B. A large number of such cases will affect server performance and may also lead to the socket number reaching the server limit.

Network connections not being released promptly is usually due to the server not closing the connection after an exception occurs or the CLOSE_WAIT configuration time being too long. If it’s a MySQL database, it’s possible that a transaction was started without correctly rolling back or committing.

In short, it is most likely a server-side code or configuration issue.

Solutions

The following methods do not have to be in order, and it is not necessary to use all of them when identifying the problem.

  • Use the top command to check CPU utilization and load status (high CLOSE_WAIT is IO-intensive, causing load to be much higher than CPU utilization).
  • Use netstat to observe the change in the number of CLOSE_WAIT states.
  • Use Wireshark to assist in viewing the sending status of network packets.
  • Use perf or flame graphs to locate hotspot functions.
  • In Java, you can dump server thread stacks to see where a large number of threads is blocked.