Troubleshooting TCP Sliding Window Issues with Wireshark

1. Introduction to TCP Sliding Window Mechanism

In the previous article, we explored how to use Wireshark for troubleshooting TCP duplicate ACKs, particularly addressing the fast retransmission problem caused by these duplicate ACKs.

Practical Network Troubleshooting (V) – Using Wireshark to Troubleshoot TCP Fast Retransmit Issues

However, among the many flow control algorithms of TCP, the sliding window protocol is clearly the most important mechanism.

In this article, we will use Wireshark to troubleshoot and locate issues related to the TCP sliding window protocol.

2. Common Wireshark Sliding Window Issues

I have previously written an article that details the sliding window protocol, which you can refer to:

Nagle algorithm and sliding window protocol

Simply put, TCP’s sliding window mechanism is as follows:

  • In the TCP protocol, both communicating parties have a buffer to cache the data they have sent, which is called a “window”.
  • Bytes 15 and 16 of the TCP header identify the sender’s remaining window size.
  • When the receiver sends an ACK, the sender removes all data with a sequence number less than the ACK from the window.
  • This process continues, the sender fills the window with data, and the receiver sends a confirmation message to clear the data in the sender’s window. This is the TCP sliding window mechanism.

3. Effective Troubleshooting Techniques for Sliding Window Errors

3.1 Zero Window

In the analysis results of Wireshark, the following zero window situations may appear:

  1. TCP ZeroWindow
  2. TCP ZeroWindowProbe
  3. TCP ZeroWindowViolation

Let’s analyze them one by one.

3.1.1 TCP ZeroWindow

The TCP ZeroWindow message is used to tell the receiver to stop sending data because the sender’s buffer is full.

This indicates that the sender process has insufficient memory, or it may be caused by the receiver’s untimely ACK.

3.1.2 TCP ZeroWindowProbe

When one party in communication receives a TCP ZeroWindow message, it will periodically send a TCP ZeroWindowProbe message for detection.

The probe message is the next byte of data that needs to be sent. Through the receiver’s response, it can be determined whether the receiver’s window is still 0. If the receiver replies that the window size is still zero, the sender’s probe timer is doubled.

3.1.3 TCP ZeroWindowViolation

This message is received after the receiver has sent a zero window message to the other end, which means that the other end has violated the TCP sliding window protocol.

At this time, you need to check the TCP implementation of the message sender.

3.2 TCP WindowUpdate

The TCP protocol allows the window size to be changed at any time, and notifies the peer end by sending a message marked with TCP WindowUpdate.

There are two situations that may result in receiving this message:

  1. The TCP receiver recovers from the zero window and tells the sender to resend the data. In this case, no further processing is required, except to check what caused the previous zero window problem.
  2. The TCP receiver frequently changes the window size. In this case, check the reason why the receiver is disturbed. It may be an application problem, a memory problem, or other problems on the terminal device.

3.3 TCP WindowFull

When Wireshark recognizes that a message will completely fill the receiver’s window after being sent, the message will be marked as TCP WindowFull.

After that, the receiver will generally send a TCP ZeroWindow message to the sender to allow the sender to suspend sending.

4. Troubleshooting

As shown in the figure below, this is a typical example of the empty window problem:

TCP Sliding Window Issues
  1. Message 183816 is the last piece of data from 192.168.2.138 before the window of 192.168.1.58 is full, so it is marked as TCP WindowFull.
  2. Then, 192.168.1.58 sends a message to 192.168.2.138, telling it to stop sending data. This is a zero window signal.
  3. 192.168.2.138 repeatedly sends TCP ZeroWindowProbe packets to detect the window of 192.168.1.58.
  4. When the continuous no data time on the connection reaches the threshold, 192.168.2.138 sends a RST message to disconnect.

In addition to checking memory allocation, it is very likely that the problem lies in insufficient processing power on the receiving end, which can be further investigated based on actual business.

In addition, you can open the TCP throughput chart of Wireshark to view the throughput:

As shown in the figure, the upper line shows the window size, and the distance from the lower line indicates the remaining size of the window. If the two lines overlap, it means that a zero window problem has occurred. Maintaining a fixed distance between the two lines indicates that the receiver is working well.

5. RST forced disconnection

We know that TCP connections are usually disconnected through four waves:

  1. The party that actively disconnects sends a FIN message;
  2. The passive disconnecting party sends an ACK message;
  3. After the passive disconnecting party completes the final processing, it sends a FIN message;
  4. The party that actively disconnects sends an ACK message to complete the disconnection.

This is standard practice, but when you open a web page, there may be dozens of connections open at the same time (home page, news, ads, regularly updated pictures, etc.), and closing all of them sometimes requires hundreds of FIN and FIN-ACK messages.

To prevent this from happening, the web server will in many cases force a disconnect by sending a RST message after sending the request data, but in more cases, the RST message indicates that a fault has occurred:

5.1 Reset sent by the firewall

As we have introduced in previous articles, if only RST packets are received after each SYN packet is sent, this is a typical case of the firewall forcibly disconnecting the connection.

5.2 RST caused by fault

There are many situations where faults cause RST messages, the most common ones are:

  1. When the sender sends five consecutive retransmissions without receiving an ACK reply, it will send a RST to force the connection to be disconnected.
  2. If there is no data on the connection for several minutes, the party that opened the connection will usually send a RST. The specific time threshold and behavior depend on the specific system implementation.

6. Summary

The article provides an in-depth exploration of troubleshooting TCP Sliding Window Issues using Wireshark. It begins by highlighting the significance of the sliding window protocol within TCP’s flow control mechanisms and its role in ensuring effective data transmission between communicating parties. The article outlines common issues detected by Wireshark like TCP ZeroWindow, TCP WindowUpdate, and TCP WindowFull, offering detailed explanations and guidance on troubleshooting these problems. It illustrates scenarios like zero window state, where the receiver’s buffer is full, causing data transmission to pause, and strategies to diagnose these errors. Additionally, it covers TCP forced disconnections using RST messages, often used by web servers to manage multiple simultaneous connections or issues caused by firewalls. With practical troubleshooting techniques, the article aids network professionals in maintaining efficient network communication by identifying and resolving TCP sliding window complications.