1. Symptoms
A well-known large telecommunications product developer recently upgraded its network. Grace, the IT manager responsible for communication and computer networks, reported to the Network Hospital today. Several newly installed servers are mostly unusable, and other servers occasionally experience data errors and access speed interruptions, some of which are noticeable, while others are less so. When the network has few users, Ping tests on the servers generally pass. However, when the number of users increases slightly, there is a 10% to 30% Ping test loss. These servers struggle to log in and access even when there are very few users. Strangely, the login process is sometimes smooth, but at other times, it is impossible to log in, with waiting times of up to 5 minutes to gain access.
The original plan for the backbone network was to use ATM architecture, but it was later changed to Gigabit Ethernet switches as backbone switches. The company’s headquarters, in a 28-story building with nearly 3000 users, uses a Gigabit Ethernet switch as a core switch on each floor, with a one-level 100 Mbps workgroup switch below, providing a direct 100 Mbps connection to the desktop. The servers are equipped with Gigabit Ethernet cards and are directly connected to the Gigabit Ethernet switches distributed on each floor. Network maintenance personnel thoroughly inspected the server workstations and reinstalled the work platforms multiple times, but the issues persisted. They initially suspected cable problems and tested all server connection cables with Fluke’s DSP100 cable tester, and the results were within specifications. Changing some cables didn’t resolve the issue. Observations of these servers revealed that most of the time, the access traffic was less than 1%. The reason was unclear.
2. Diagnostic Process
Server access was blocked, and several servers were affected simultaneously. There had to be common causes for these issues. Grace informed us that there were 17 servers newly installed this time, of which 7 had clear issues, and 10 were mostly functioning normally. All the installations were carried out by the same person, Mr. Pan, a senior network engineer in the company, so installation differences couldn’t be the cause of partial server issues. We connected a network tester to the user side to get a preliminary understanding of the network’s status. We observed the connection ports between the seven servers with obvious connection issues and the switches. Traffic was consistently below 1%, but the proportion of delayed data packets was high, accounting for about 86% to 93%. The proportion of FCS frame errors was also not low, at approximately 5% to 11%. This indicated that a large number of data packets were being directed at the servers, but the servers were not acknowledging them. An additional 5% to 11% of FCS error packets may have been originating from the servers. When conducting ICMP Ping tests on the servers, the loss was between 90% and 100%. These issues suggested problems with the cables and the physical performance of the cable interfaces on the servers and switches. Testing the hard jumpers between servers and switches using a DSP-4000 cable analyzer showed that the seven servers with problems all exhibited RL (Return Loss) parameters that were out of specification. Continuing the testing with ten more servers, their RL parameters were also out of specification. The RL out-of-specification points, as identified by the cable analyzer, were at the ends of the jumper cables. Even after re-making the connectors and retesting, the results remained out of specification. When we switched to our own soft jumpers, the connection to one of the servers was immediately restored. It became clear that the issue was with the jumper cables. We produced a new jumper using our provided compliant connectors, but it still failed to meet specifications. From this, it was evident that the problem lay with the jumper cables. We connected our only four soft jumpers to four of the servers, and all of these servers immediately functioned correctly. When testing the cable according to Cat5e standards, all results were within specification.
3. Conclusion
We know that Ethernet cables have four pairs of twisted wires. In Gigabit Ethernet links, because they use 4-pair full-duplex 5-level encoding, each pair carries 250 Mbps of bidirectional data flow. The actual equivalent physical bandwidth of the signal is 100 MHz, meaning that Cat5e cables should be able to meet Gigabit Ethernet link requirements. However, in practical use, Gigabit Ethernet has higher requirements for other parameters, so it is generally recommended to use Category 6 or higher cables to support Gigabit Ethernet applications. Cat5e cables are typically limited to rates up to 100 Mbps, such as ATM155. If you plan to use Cat5e cables for Gigabit Ethernet, you must meet additional testing parameters. Grace mentioned that they were using Cat6 cables, but DSP4000 cable analysis demonstrated that they were using counterfeit Cat5e cables. When they tested according to Cat5e standards, it still did not meet specifications. This indicated that the quality of the Cat5e cable cores they had used was poor and could not pass Cat5e testing for Gigabit applications. It’s common for products from legitimate manufacturers to have a failure rate of no more than 20% when tested against the Cat5e standard for Gigabit applications. The DSP100 cable tester can only test Category 5 cables, so all results were within specifications. However, the engineering design called for Category 6 cables, and the counterfeit Category 5e cables were deemed out of specification when tested using the DSP4000 cable analyzer. The four non-compliant jumpers were all less than 2 meters in length, while the ten poorly functioning servers had cables longer than 15 meters connecting them to the switches. This is a typical indication of RL non-compliance. In non-compliant RL links, the shorter the cable, the more severe the fault symptoms. This is because RL non-compliance leads to increased signal reflection. Short cables have less attenuation, so most of the reflected energy will overlap with the normal data signals at the other end of the link, causing significant signal distortion, resulting in FCS errors. On the other hand, the accumulation of delayed frames occurs as access traffic cannot be transferred to the server due to RL non-compliance. In longer non-compliant RL links, due to greater signal attenuation, most of the reflected energy cannot effectively overlap with the normal signal. Therefore, the fault symptoms are less severe, showing higher errors or intermittent interruptions, especially when the traffic is high, the error frames are high, and the interruptions are frequent. However, not all data packets are completely blocked from passing through the link. User login to the network is highly affected by the current average traffic and instantaneous traffic. This manifests as significant fluctuations in login times. Sometimes it goes smoothly because the current instantaneous traffic and average traffic are low, while at other times, long waiting periods occur because the average traffic or instantaneous traffic is high. There are numerous errors and repeated operations during this time.
4. Diagnostic Recommendations
Considering that Grace used counterfeit Category 5e cables and other servers occasionally exhibited data errors and interruptions, it is recommended that she thoroughly inspect all servers with Category 5e cables to ensure network quality.
5. Afterword
The next day, Grace called to report that they tested a total of 200 cables, including all server connections. The cables laid in the early stages of the project were