Troubleshooting Asymmetric Network Path and Packet Loss with Wireshark

Background

The network path is inconsistent, or the network path is inconsistent back and forth. More professionally, it can be said that the network path is asymmetric. Engineers working in the network field will definitely understand the above statements better. To put it simply, it is:

A and B communicate, C and D represent N different devices that may exist in the middle path

The direction A -> B follows the path A — C — B

The direction B -> A follows the path: B — D — A

The above network scenarios are actually quite common, and there are no problems with normal communication.

As mentioned at the outset, this case is a packet loss problem in the above scenario. The cause is clear, and I will briefly share the analysis process.

Case study from SharkFest 2011 “Packet Trace Whispering”

Problem Information

The basic information of the packet trace file is as follows:

The trace file was captured by tcpdump on Linux. The number of packets is not large, only 71, the length is truncated to 67 bytes, the file data size is 13K bytes, the capture time is 11.64 seconds, and the average rate is 9135 bps.

In the session statistics, we can see 1 TCP flow, client 192.168.1.1 -> server 10.10.10.10.

The expert information is as follows. It can be seen that there are a certain number of (suspected) retransmissions and (suspected) false retransmissions, which is consistent with packet loss.

Problem Analysis

Expand the packet trace file and the packet details are as follows:

Troubleshooting Asymmetric Network Path and Packet Loss with Wireshark

It can be seen that TCP Stream 0 did not capture the data packets in the TCP three-way handshake stage, but the TTL field value of 128 can be judged that the capture point is on the server side or close to the server side, and the RTT is about 0.1ms, and the law of data transmission is one data segment and one ACK confirmation in continuous interaction.

By clicking the black area in the lower right corner, you can quickly jump to the problem, where you can see problems such as TCP retransmission and suspected retransmission.

You can also use the following display filter expression to quickly filter out abnormal problems in TCP analysis, which is also a commonly used technique.

We can see that there are 10 matching packets in total, including TCP retransmissions from the server 10.10.10.10 and TCP false retransmissions from the client 192.168.1.1 . Why are there such distinct retransmissions?

Expand TCP detailed analysis, the main points are as follows:

1. TCP retransmission on server 10.10.10.10

It can be seen that the data packets including No.47-48 and before are interacting normally. However, starting from No.49 Seq 2904 , since ACK has not been received, a timeout retransmission of No.50 occurred at about 300ms . After that, ACK has not been received, and a continuous timeout retransmission phenomenon has occurred, with intervals of 300ms, 600ms, 1.2s, 1.2s, 1.2s and 2.4s.

The special thing is that each time the timeout retransmission occurs, a new data segment is sometimes included, and the TCP Len keeps increasing, but no confirmation is received.

2. TCP false retransmission of client 192.168.1.1

Different from the initial transmission pattern of one data segment and one ACK confirmation, after the server 10.10.10.10 continued to transmit data in one direction without response, the client 192.168.1.1 sent a data segment Len 11 at No.58, and it can be seen that the server 10.10.10.10 responded with ACK confirmation normally, but after 200ms, the client 192.168.1.1 still had a timeout retransmission phenomenon, and the phenomenon remained the same afterwards, with continuous retransmission at intervals of 200ms, 400ms, 800ms and 1.6s.

Why is it a TCP false retransmission? This is because in the packet trace file, there are data segments and ACK confirmations, so Wireshark judges based on the context that the retransmission is a TCP false retransmission phenomenon.

In fact, if you think about the problem of inconsistent network paths mentioned at the beginning, you can understand the whole process.

a. Since the data segment sent by the server cannot receive ACK confirmation normally, TCP timeout retransmission occurs. Note that the data segment lost here is the data segment sent by the server ;

b. In the client->server transmission direction, data segments can be sent and received normally, but the ACK data packet returned by the server cannot be returned to the client, so the client generates a TCP timeout retransmission. Note that the ACK sent by the server is lost here ;

c. Therefore, the root cause is in the direction of transmission from the server to the client. At a certain point in time, any data packets transmitted cannot reach the client normally.

After a long period of continuous tracking, we finally found out that the problem was caused by a switch engine software BUG on the unidirectional path.

Summary of the problem

We may not be able to identify the root cause, but packet analysis can point us in the right direction.