Network Failure Analysis: How to Fix Network Failure Quickly

What is Network Failure Analysis?

As a network engineer, network failures happen frequently. Handling these issues efficiently requires not only troubleshooting skills but also effective methods for network failure analysis. In this guide, we’ll explore the steps to analyze network failures and write comprehensive network failure analysis reports.

Many people are so busy dealing with work that they don’t even have enough time to sleep, let alone review their work.

However, your leader will ask you to write some reports, summaries, and presentations, forcing you to have intimate contact with the review.

How to analyze network failures and how to write an analysis report are what I want to talk to you about today.

Steps for Effective Network Failure Analysis

Generally speaking, for network troubleshooting ideas, you can refer to this picture:

Network Failure Analysis

1. Locate the fault scope

① Network-wide network failure: The source of the failure can be located at the exit or core area;

②Small-scale network failure: The fault source can be located in the corresponding device or link closest to the fault source;

③Single-point network failure: The source of the failure can be located at the failure source itself.

2. Troubleshooting Network Failure

① The overall idea is “link” à “configuration”.

②First, confirm whether there are any human changes in the network or related equipment;

③ Secondly, check whether the physical link and equipment are normal;

④Finally, check the relevant properties or configurations of the network device.

How to Write a Network Failure Analysis Report?

There are many templates for network fault analysis reports on the Internet. If you just search, you will find a wide variety of them.

But from now on, when you write a report, you can think about two questions:

1. Are you working for a state-owned enterprise or a private enterprise?

2. Do you want to just deal with it or do you want to write something valuable?

If you work in a state-owned enterprise, the reports you write are more formal documents, and you need to pay special attention to the format and wording. At this time, you don’t have much room to play.

If you work in a private company and want to write something of some value so that you can show off in front of your superiors and also find something to look forward to when you review yourself at the end of the year, you can write something casually.

The following network failure report can be used as a reference. It is written vividly and can also be used as a reference and learning tool for others.

1. Fault description and deployment location

On Friday morning, I went to the user’s site to understand the fault phenomenon and asked about the basic network situation. The situation was as follows:

As shown in the figure above, the user network export bandwidth is 20M, and two switches are connected to more than 30 user hosts and servers. Since Monday this week, users in the network have experienced intermittent access to the Internet, the speed of opening pages is very slow, and there are often situations where web pages cannot be opened.

Configure mirror ports on switch 1 and switch 2 respectively, deploy the Cole Network Analysis System and capture the traffic of the uplink interface for analysis.

2. Network Failure Analysis

Switch 1: After capturing data on Switch 1 for 20 minutes, no abnormalities or traffic bursts were found, so it is suspected that the problem may be caused by a problem with the host connected to Switch 2.

Switch 2: Capture the data packets from 10:55:28 to 10:55:38. In just ten seconds, we discovered the problem in the network, as shown below:

It is not difficult to see that the total traffic in just ten seconds reached 272MB, which was all 512-1023 byte data packets, and the number of TCP synchronization packets reached more than 500,000. No TCP synchronization confirmation packets were received, which is an obvious abnormality.

Checking the TC sessions, we found that all TCP sessions behaved in the same way, with 111.xx.xx.xx sending TCP packets to port 80 of 183.xx.xx.xx.

The synchronization position of the data packet:

The SYN packet is a handshake request packet used when TCP/IP establishes a connection. There should not be any application layer data in it. However, in the figure above, we can see that there are 512 bytes of HTTP data in the packet, and the data content is all 0. This packet is forged.

Because the forged data packet is an Internet address, it will be sent out through the Internet exit. Since the Internet exit of this network is 20Mbps, the forged data packet reaches 261Mbps, which exceeds the maximum processing capacity.

At this time, the intranet host will experience a very slow connection when accessing the Internet, or even be unable to access the Internet.

3. Network Failure Location

As shown in the figure above, by checking the MAC address, we know that the MAC address that sends a large number of data packets is XX:XX:XX:XX:11:57. After mastering the MAC address that launches the attack, we can find the corresponding port by checking the switch MAC address table, as shown in the following figure:

The switch port 2 corresponding to the MAC address is G1/0/18. The problem is solved by disconnecting the port. After disconnecting the port, the network returns to normal, and web pages can be browsed smoothly.

Conclusion

Through network failure analysis, it was determined that high volumes of forged packets led to a DoS attack. The network failure analysis identified the compromised port, and further actions included recommending security checks for network resilience.

Through troubleshooting, it was found that port G1/0/18 is the mail gateway connection port. It is recommended that users contact the mail gateway device manufacturer to troubleshoot the device.

If you have read this far, you must be a real fan. Finally, I wants to remind you that to solve network problems, you need to master three dimensions of technology:

1. Familiar with the OSI seven-layer model and TCP/IP protocol stack

2. Understand the basic equipment of network communication and its corresponding OSI layer

3. Clearly understand an important principle of network troubleshooting – data direction

You should have some understanding of the most basic network devices such as switches, layer 3 switches, routers, and firewalls, especially their corresponding OSI layers and functions. These are all based on basics and experience.

Insiders know that only when your network foundation is solid enough can you improve and progress better in your network engineering career.