Preface
When analyzing network fault, we usually connect to mirror port of switch to analyze all traffic on the switch. If there are abnormal traffic in the network, then the result will be obvious. This analysis method is very effective for these network faults which are caused by viruses and worms. But the method isnât effective for the network fault that a few days ago I encountered, thatâs a university campus network failures.
Fault Description
Irregular packets loss, it leads to a large number of TCP retransmission. Itâs very slow to visit a web. Sometimes can not open page. When ping an internet host (e.g. www.ids-sax2.com), there is delay or packets loss.
Network topology
The router is connected to ISP with gigabit fiber, it provides NAT and is connected the firewall. Two backbone switches are connected to the firewall. They support communication and VRRP. A TRUNK connection is made between two backbone switches with 10G fiber. All aggregation switches are separately connected to two backbone switches with 10G fiber, Other corridor switches are connected them, please see the figure below:
Figure 1: Network topology
Comparative Analysis to Resolve Network Slow Faults
We didnât found abnormal network traffic by the analysis result of single point. But there are a lot of TCP retransmission. Typically , TCP retransmission means network packet loss. Packet loss occurs in normal traffic condition, generally is the network congestion, however, ICMP packets are unimpeded, anytime ping internet IP, delay is not significant changes. This network has not QoS mechanism, if the network is congested, All packets loss probability should be the same, itâs impossible that only a kind of packets are dropped. We also checked load of switches and routers , They were within the normal range.
Large networks can not use the replacement method , therefore how to determine the location of the point of failure is very tricky. Finally, we separately made mirror in the WAN port of router and uplink port of core switches and analyzed traffic with Ax3soft Unicorn. We found that packets are dropped between the LAN port of router and uplink port of core switches, therefore, the router or the firewall was faulty. Next, We changed the NAT configuration of the router, fault
Key of comparative analysis
1. For network failure, but traffic is normal.
2. Deploy multiple sets of Ax3soft Unicorn at key point of network (e.g. WAN port of router, uplink port of switch and so on ).
3. Try to use a filter to reduce the captured traffic.
4. All analysis project try to sync start and stop.
5. Look for packets to be dropped by comparing ID field of IP packets.
conclusion
Ax3soft Unicorn is a powerful analyzer tool, the key to maximize the effectiveness with analysis tool is proper deployment. If the deployment is not appropriate, then even the best tool also can not play its role.