Understanding Network Jitter: Diagnosing and Resolving TCP Issues with Wireshark

Sure, here is the translation of the plain text content while keeping the original formatting:

1. Introduction to Network Jitter

In the previous article, we provided a detailed introduction to the usage of Wireshark, focusing on how it can be used to analyze Network Jitter.

Network Troubleshooting in Practice (3) — Detailed Explanation of Wireshark Usage

The most important usage of Wireshark is, of course, diagnosing network issues.

In this article, we’ll use Wireshark to see how to approach these types of network problems.

This article mainly references Chapter 9, the first four sections of “Network Analysis Using Wireshark Cookbook”.

2. Viewing TCP Connection Information and Network Jitter with Wireshark

The process of TCP connection establishment and communication should already be familiar to you, and you can also refer to an article I wrote previously:

Transmission Control Protocol — TCP

2.1 Connection Establishment with Network Jitter

As shown in the diagram, these three lines represent the process of TCP’s three-way handshake:

First, the client’s TCP process sends a SYN packet with an initial sequence number Seq of 0. Additionally, in Wireshark, we can see more detailed information such as MSS, Selective ACK, etc.:

Among this information, you might be more interested in:

  • Maximum Segment Size (MSS) — the maximum length of a single TCP packet.
  • Windows Size (WSopt) — the window size.
  • SACK — or Selective ACK, which allows for the retransmission of only lost packets during retransmission, only enabled if both ends support this feature.
  • Timestamps options (TSopt) — the delay between the client and server.

The second line of the message is the server’s ACK to the client’s SYN packet, and it also contains the server’s SYN information.

The packet contains the server’s initial sequence number and the server’s window size information.

In the third line, apart from the sequence number of the client’s packet, the client’s ACK packet specifies the client’s window size again.

2.2 Troubleshooting Issues

Quite simply, if you see in the packet capture results that after the client sends a SYN packet, the server has no reply or replies with an RST packet, it is obvious that the corresponding port on the server might not be listening, actively rejecting, or blocked by a firewall.

After confirming both the client and server are running properly, check the firewall configuration, verify if the username and password you transmitted are correct, and confirm if the IP address and port you’re trying to access are correct.

You might use the ping command to check if the server is online, but in many cases, the server will block ICMP packets via a firewall, so you can’t ping the server, but this doesn’t mean the server is down.

3. TCP Retransmission

One of the most common issues during TCP communication is TCP retransmission.

TCP retransmission is an important mechanism used by TCP to recover from damaged, lost, duplicated, or out-of-order packets. If the sender does not receive an acknowledgment of the sent packet within a certain time, it will trigger a retransmission.

During communication, if the retransmission rate reaches 0.5%, it will seriously affect performance. If it reaches 5%, the TCP connection will be interrupted.

In Wireshark, retransmitted packets are marked as TCP Retransmission.

To configure the display filter to obtain all the retransmitted packets in the current packet capture results:

expert.message == “Retransmission (suspected)”

As shown in the figure:

3.1 Case1. Retransmission to Multiple Destinations

As depicted in the figure above, you will find that the Destination is not concentrated but spread across multiple destination servers, which is usually a link issue, perhaps due to high network card load.

Through the IO Graph option under the Statistics menu in Wireshark, you can open Wireshark’s IO load monitoring, thus you can see whether the communication on the current machine has reached the load bottleneck of the network card.

If, like the figure above, the network card load is not high, it could be due to a fault in the network card or link, or other high-load links occupying bandwidth.

You can log in to the communication device in the link to check packet loss rate.

3.2 Case2. Retransmission Only to the Same Destination

In a situation like this, where all retransmissions are concentrated on the same destination, it’s usually caused by the low processing performance of the application itself.

To further confirm if this is the cause, you can check by following these steps:

  1. As introduced in the previous section, use the IO Graph provided by Wireshark to check whether the network load is too high.
  2. Through the Conversation option under the Statistics menu, open the network session window. In the IPv4 tab, check the Limit to display filter box to see all sessions where retransmissions occurred for further confirmation.
  3. In the network session window, click the TCP tab, similarly checking Limit to display filter to view the specific retransmission port, confirm which application it is, and thus pinpoint the specific issue.

Pay special attention to whether the retransmission timing follows a certain periodicity or is event-triggered, for instance, in the image below, a retransmission occurs approximately every 30 ms, which coincides with the client performing a certain operation in the software, indicating this operation likely triggered the slow request.

3.3 Case3. Application Unresponsiveness Leading to Retransmission

If multiple retransmissions occur immediately after sending SYN or ACK packets when establishing a connection, and the intervals between retransmissions grow longer, this is usually due to application unresponsiveness.

In such circumstances, troubleshoot the reasons for application unresponsiveness. After 15 to 20 seconds, the application may attempt to re-establish the connection, or you can manually restart the application to retry connection establishment.

3.4 Case4. Retransmission Caused by Network Jitter

The TCP protocol itself has mechanisms like the Nagle Algorithm, sliding window protocol, slow start, congestion avoidance, and fast recovery to prevent network congestion.

However, network jitter poses a significant problem for the TCP protocol and often triggers TCP retransmissions.

To confirm this issue, you can execute a ping to the destination address, observing fluctuations in the time value for variation.

You can check:

  1. If the link is congested and the link’s status is stable.
  2. If the server hosting the application lacks resources, has hardware faults, or is inadequately configured.
  3. If any devices in the network link are overloaded or resource deficient.

4. Summary

Overall, the problems mentioned above can be approached with the following considerations:

  1. Summarization: Is the problem associated with a particular host, a specific TCP connection, or a particular behavior?
  2. Step-by-step Investigation: Is the link overloaded? Are there packet losses in the link? Are there performance issues on the server or client host? Are there performance issues with the application?
  3. Final Issue: Is the problem caused by network jitter?

From my experience, most performance problems are caused by issues at the business layer, which means application code is causing them. Therefore, the first thing to check is whether the application code underwent any modifications that could lead to these performance issues during the problem period. Only after thoroughly ruling this out, should you invest effort into capturing and analyzing network link issues with tools. Otherwise, it might be a futile effort in the wrong direction.

Typically, problems are not caused by network jitter, although it is often the easiest attribution, more often than not attributing issues to network jitter is merely a sign of laziness.