MTU Troubleshooting in WAN Environments: A Comparative Wireshark Analysis

Preface

Here’s another instance of web page access failure, analyzed in Case 2 from Wireshark Sharkfest 2018 – Point And ShootPacket. The case, titled “Cannot See Homepage,” highlights complaints from users in OSAKA who experienced issues where some web pages, such as Google, were accessible, while others, like Apple, were not. Interestingly, users from TOKYO reported no such problems. This case serves as a clear example of MTU troubleshooting, shedding light on the differences in network behavior across regions.

First of all, let me give you the conclusion. This problem is caused by the MTU problem in the WAN environment . In previous case analysis , we often encounter MTU-related problems. Generally speaking, such cases are easier to analyze. Why do I write another article? The main reason is that this case is based on the idea of ​​comparative analysis, and multiple points of relevant data packets are captured. Since the data packet tracking files are complete, the essence of the problem can be clearly seen.

Problem Information

For troubleshooting, the user first captured a pcap file in OSAKA and TOKYO LAN respectively, 2_OSAKA_FAIL_LAN.pcap and 2_TOKYO_SUCCESS_LAN.pcap.

The basic information of the packet trace file is as follows:

2_OSAKA_FAIL_LAN.pcap, the file is captured by Tcpdump, without truncation, the number of packets captured is only 5, the capture duration is only 0.1 seconds, and the average rate is 46 kbps. Considering the small number of packets and the extremely short capture duration, it is inferred that the user accurately filtered and retained only the necessary problem packets according to the problem.

2_TOKYO_SUCCESS_LAN.pcap, the file is also captured by Tcpdump, without truncation, the number of packets captured is 233, the capture duration is 1.3 seconds, and the average rate is 1263 kbps.

The expert information is as follows: there is no obvious problem in the two packet trace files in general, and actual packet analysis is needed .

Problem Analysis

Directly expand the data packet information of the two LANs OSAKA and TOKYO, and compare them as follows:

MTU Troubleshooting in WAN Environments

1. OSAKA LAN

The packet trace file was captured on the client side. It can be seen that after the TCP three-way handshake, the client’s GET request, the server only responded with ACK confirmation, and ended with only these 5 packets, and no actual data response packet was received from the server. Considering that the number of packets is very small and there are no other phenomena, it is impossible to determine the problem based on this packet alone, and it is necessary to compare the successful packet trace file.

2.TOKYO LAN

The packet trace file is captured on the client side. Similarly, after the TCP three-way handshake, the server not only replies to the client’s GET request with ACK, but also replies with segment No.6-7 (TCP Len 1414) with data. The HTTP interaction result is 200 OK.

3. Comparative Analysis

During the TCP three-way handshake, the client and server respectively announced the support of options such as MSS, WS factor, and SACK. By comparison, as the client side, the MSS in the SYN sent by OSAKA and TOKYO is 1460 , but in the SYN/ACK received from the server side , the OSAKA MSS is 1460, while the TOKYO MSS is 1414. The difference here causes the subsequent failure of sending some data segments.

OSAKA : The MSS announced by the client and the server is 1460, which makes both parties believe that the maximum TCP segment length that can be transmitted in the subsequent TCP interaction is 1460. Compared with TOKYO, after receiving the client’s GET request, the server will send a data response, which is No.6-7 that the client expects to receive, with a Frame Length of 1514 (14 Ethernet headers + 20 IPv4 headers + 20 TCP headers + 1460 TCP Len ), but due to the minimum MTU limit of the intermediate path of the WAN , No.6-7 is discarded in the middle, so the client does not receive subsequent data packets, and only falls silent after receiving the server No.5 ACK.

TOKYO : The MSS of the client SYN notification is 1460, and the MSS of the server SYN/ACK notification is 1414, which makes both parties think that the maximum TCP segment length that can be transmitted in the subsequent TCP interaction is 1414. Therefore, after receiving the client’s GET request, the server will send a data response, that is, No.6-7, with a Frame Length of 1468 (14 Ethernet headers + 20 IPv4 headers + 20 TCP headers + 1414 TCP Len ), which meets the minimum MTU limit of the intermediate path of the WAN . Therefore, No.6-7 is transmitted successfully, and the client receives it normally to complete the interaction.

So far, by comparing the packet trace files of OSAKA LAN failure and TOKYO LAN success , it is clear that the failure to access the web page is caused by the MTU problem . But what is the root cause of the problem? For example, when accessing the Google server, OSAKA users fail, while TOKYO users succeed. Could it be that the Google server is seeing the food and responds to the OSAKA user with an MSS of 1460, while responding to the TOKYO user with an MSS of 1414? Is the problem on the server? Why are they treated differently? ! It cannot be said that it is 100% impossible, but it is true that in most such cases, the problem will occur in the middle network environment.

In order to solve the transmission problem on the WAN, the user captured relevant data packets on the OSAKA WAN and TOKYO WAN , which are 2_OSAKA_FAIL_WAN.pcap and 2_TOKYO_SUCCESS_WAN.pcap. The capture points are as follows:

Client (LAN capture point) — Local router (WAN capture point) — WAN — Server

1. OSAKA LAN and WAN

It can be seen that the MSS actually advertised by the client and the server in both directions is 1460. No modification is made to it when passing through the router. In the end, the MSS chosen by both parties is naturally 1460 .

2.TOKYO LAN and WAN

Once again, through comparative analysis, we can clearly find the problem. The following describes the complete process:

a. The MSS announced by the client locally is 1460 (LAN SYN). When passing through the local router, it is modified by the router to MSS 1414 (WAN SYN). Naturally, the client SYN MSS received by the server is also 1414 ;

b. The MSS announced by the server is 1460 (WAN SYN/ACK). When passing through the local router , it is modified by the router to MSS 1414 (LAN SYN/ACK). Finally, the server SYN/ACK MSS received by the client is 1414 .

The key point here is to understand that TCP OPTIONS in the TCP three-way handshake is a notification, not a negotiation .Because if it is a negotiation, after receiving the client’s SYN MSS 1414, the server compares it with the local MSS 1460, and the smaller one is preferred. Theoretically, the SYN/ACK MSS sent by the server should be 1414, but the actual situation is 1460, which means it is a notification, and each party notifies its own options.

c. In the final result, the MSS of both the client and the server are 1414. The client selects based on the SYN/ACK MSS 1414, and the server selects based on the SYN MSS 1414, following the principle of smaller preference.

Summary of the problem

The cause of the problem lies in MTU, and the root cause is that OSAKA’s router did not adjust the MSS size, causing MTU to exceed the limit during transmission and cause packet loss. The final application phenomenon reflected is the failure to access the web page. Therefore, in network environments such as PPPOE, IPSEC VPN, etc. that require adding headers, it is necessary to adjust the MSS or MTU value according to the actual situation to ensure normal transmission.