DHCP Troubleshooting: Analyzing DHCP Failures with Wireshark

Preface

A straightforward DHCP troubleshooting case from Wireshark Sharkfest 2018’s “Point and ShootPacket” session. In Case 1, CATV Box, some hotel customers experienced normal startup of their CATV boxes, while others faced issues with boxes failing to start properly. This case highlights how DHCP problems can impact device connectivity in network environments.

For troubleshooting, the user captured two pcap files, one DHCP_SUCCESS and one DHCP_FAIL.

Problem Information

The basic information of the packet trace file is as follows:

The DHCP FAIL file was captured by Tcpdump without truncation, with only 6 packets captured, 4.4 seconds captured, and an average rate of 4791 bps; the DHCP SUCCESS file was also captured by Tcpdump without truncation, with 8 packets captured, 29.4 seconds captured, and an average rate of 1025 bps. In addition, there is no information in the expert information. In general, the packet tracking files should have been anonymized and accurately filtered, and only the packets related to the DHCP problem were retained.

Problem Analysis

In network protocol fault analysis, a frequently used method is the comparison method. By comparing the packet capture files during normal operation and those during the problem period , the problem can sometimes be quickly discovered. This case uses this method.

First, expand the DHCP SUCCESS packet information, as follows:

DHCP Troubleshooting

We can see that there are two complete and successful DHCP working processes, both of which went through the standard four stages of DHCP: Discover – Offer – Request – ACK.

The minor issue with the client and server IPs can be ignored here . The server is sometimes 192.168.2.1, sometimes 192.168.2.3, and the client IP is also 192.168.2.2. This may be because the packet trace file is not anonymized well enough.

At the same time, in the Info column, it is marked Transaction ID, what is Transaction ID?

In the DHCP message format, there is a 4-byte xid field, described as Transaction ID is a random number chosen by the client and used by the client and server to associate messages and responses between the two.

RFC 2131 The client generates and records a random transaction identifier and inserts that identifier into the ‘xid’ field. The server inserts the ‘xid’ field from the DHCPDISCOVER message into the ‘xid’ field of the DHCPOFFER message and sends the DHCPOFFER message to the requesting client.

It can be seen that each standard DHCP interaction process has the same DHCP Transaction ID.

λ tshark -r 1_DHCP_SUCCESS.pcap -T fields -e dhcp.id | uniq
0xfb0c077c
0xe1cb5e63

Now let’s look at the DHCP FAIL packet information, as follows:

We can see two failed DHCP interaction processes. After analyzing the successful cases, we can see that the problem obviously occurs in the DHCP Transaction ID . The Transaction ID carried in the two DHCP Offer messages returned by the DHCP server 192.168.100.1 is different from that generated by the client. This will cause the client to silently discard these data packets, and naturally it will not be able to obtain an IP address for communication.

If the ‘xid’ of an arriving DHCPOFFER message does not match the ‘xid’ of the most recent DHCPDISCOVER message, the DHCPOFFER message must be silently discarded. Any arriving DHCPACK messages must be silently discarded.

Summary of the problem

Of course, the ultimate cause of the problem is the incorrect behavior of the DHCP server (broadband router), such as some IOT devices using imperfect DHCP software implementation.