Analysis Load Balancer Communication Issue in Wireshark

Problem Description

Recently, a colleague shared a case regarding a Load Balancer Communication Issue, where the primary and backup load balancing devices failed to establish high availability (HA) when both units were activated in a test environment. Upon investigation, it was discovered that the primary and backup load balancers could not ping each other. This was particularly puzzling since both devices were directly connected to a core stacking switch within the same VLAN. In contrast, other hosts in the same VLAN or across different VLANs were able to ping both load balancers without issue. The problem appeared to be isolated between the primary and backup devices.

The network topology is as follows

equipmentIPMAC
LB-0110.0.0.14b:00:05:92:01:02
LB-0210.0.0.24b:00:05:92:00:09

Problem Analysis

Considering that the network topology, network configuration and fault phenomena are extremely simple, the verification method is also very clear, and it only requires confirming where the data packet is lost.

LB-01

The data packet file is as follows, briefly analyzed:

  1. The ARP information of LB-02 on the LB-01 device is normal;
  2. LB-01 initiates an ICMP Request, but no ICMP Response is returned, so the Ping result is unsuccessful;
  3. LB-01 receives an ARP broadcast request from LB-02 and queries the MAC address of LB-01. LB-01 returns an ARP response normally. However, this process continues, and it is suspected that LB-02 did not receive a response.
Analysis Load Balancer Communication Issue

LB-02

The data packet file is as follows, briefly analyzed:

  1. There is no ARP information of LB-01 on the LB-02 device;
  2. LB-02 continuously initiates ARP broadcast requests to query the MAC address of LB-01, but cannot receive ARP responses from LB-01;
  3. LB-02 device also cannot receive ICMP Request packets from LB-01.

Combining the data packet file analysis of LB-01 and LB-02, it can be basically inferred that the data packets are lost on the intermediate switch, and the data packets suspected to be from LB-01 cannot be forwarded to LB-02 normally. Further packet capture analysis is performed on the switch .

Switch

The switch is H3C S6800 model, two switches use IRF stacking, mirroring source ports Te1/0/42 and Te2/0/42, and after a Ping operation is performed on LB-02, the captured data packet file is briefly analyzed as follows:

  1. Data packet 1 is the ARP broadcast request initiated by LB-02 captured in the inbound direction of the port connected to LB-02, which queries the MAC address of LB-01, proving that the switch can receive it normally;
  2. Data packet 2 is the ARP broadcast request initiated by LB-02 captured in the outbound direction of the port connected to LB-01, which queries the MAC address of LB-01 and proves that the switch can forward normally;
  3. Data packet 3 is the unicast data packet of LB-01 responding to LB-02’s ARP request, captured in the inbound direction of the switch’s port connected to LB-01, proving that the switch can receive it normally;
  4. But then the switch did not forward packet 3 to LB-02 normally;
  5. This process is repeated 4 times, and the Ping operation continues to request ARP information.

After opening a case with H3C, the same phenomenon was observed through the following flow system configuration. Only four ARP response packets were matched in the inbound direction of Te1/0/42 port, but no ARP response packets were matched in the outbound direction of Te2/0/42 port.

Summary of the Load Balancer Communication Issue

After H3C TAC + R&D conducted traffic statistics and fault diagnosis, it was initially determined that the switch software version was a bug. After replacing a common stable version and rebooting, the primary and standby load balancing devices resumed normal communication.

This problem is rare, but it is relatively simple to troubleshoot and locate. The main thing is to reasonably determine the packet loss point. The source and destination need to prove that they can send and respond, and the intermediate equipment needs to prove that it can receive and forward.