1. Symptoms
The director of a large household appliance manufacturing company’s computer center reported a persistent network issue with frustration. The issue, though peculiar, followed a pattern: the network in the company’s main office building slowed down significantly when employees were at work and returned to normal speed after hours, sometimes even returning to its usual performance. This problem had been ongoing for about three months, and the exact onset date was difficult to pinpoint. It occurred around 8:00 AM daily, affecting the entire third floor. Symptoms included sudden speed reduction, slow internet file downloads, email interruptions, and frequent errors. Users on this floor experienced significant delays when transferring files, both within the floor and with users on other floors, while users on different floors did not face the same issues. The problem initially persisted for three days, with no clear cause identified. To avoid delaying product development, as the third floor was dedicated to the company’s design and development department, the work hours were temporarily shifted to start at 6:00 PM. However, two weeks later, the situation remained unchanged. The company had to temporarily swap the roles of the development department with the administrative department on the second floor to ensure that the employees on the third floor could work during their adjusted hours. This “temporary” solution lasted for three months. The network team thoroughly checked and replaced cabling systems, network platforms, all hosts and servers, and routers but could not identify the root cause. Based on the suggestion of a renowned systems integrator, the cabling system underwent certification testing, revealing significant issues. The existing Category 5 cabling system turned out to be non-compliant, as it was made of counterfeit Category 5 cables and only met Category 3 standards according to on-site tests. Moreover, most connectors and modules failed the Category 5 cable standard tests. A further inspection of the entire building’s cabling showed the same situation as on the third floor. The company’s network mostly operated at 10 Mbps, which had been stable until the issue arose. The cabling installation was done three years prior, and the original systems integrator was no longer in contact. The company’s board of directors decided to update the entire cabling system. After a month of intensive construction, the project concluded two days ago, but the problem persisted during the system’s startup and testing, disappointing all the staff at the computer center who had worked tirelessly to resolve the issue.
2. Diagnostic Process
Based on previous statistics, stubborn network issues tend to have a simple cause. Given the symptoms provided by the “patient,” issues with the cabling system appeared less likely. All network equipment had undergone multiple inspections, making it unlikely that the issue originated from the devices. If the issue related to platform installation, software applications, and routing channels, other floors would be affected as well. Analyzing the specific characteristics of the problem, it became evident that as the issue occurred during work hours, it was likely associated with devices or environmental factors that operated on a schedule. Since the problem affected the entire floor, it was most likely related to common areas. The director mentioned that every device on all floors was individually turned off and tested, and power sources were replaced to ensure they were all functional and compliant. Analyzing the network’s topology, each floor was equipped with traditional 10Base-T networks built using hubs. These floors and adjacent buildings were interconnected using a core switch with 10 Mbps ports, with routers connected to the core switch via 128k frame relay links to access the Internet. Remote branches and offices were connected using DDN, ISDN, or VPN connections. A network management system was available in the computer center, although it was never configured or used. The issue affected only one floor, suggesting it was a problem within a collision domain. After analyzing the network statistics from the center director, we found a significant number of collisions and error frames recorded on the port connecting to the third floor. The data showed a 2% utilization rate, 35% errors, with 83% being CRC errors, 96% transmission delays, and 10% collisions. The center director mentioned that similar data was seen on the network management system, but the significance of this data was unclear, and it was unrelated to the diagnostics. To pinpoint the origin of this data, we decided to conduct on-site testing the following day. Connecting a network tester (F683) to the third-floor network revealed network traffic fluctuating between 67% and 95%, with error traffic fluctuating between 60% and 90%. Most of the errors were Ghost errors, accounting for 77% of the error traffic, followed by collisions and FCS frame errors, making up the remaining 23%. Ghost errors usually indicate severe interference within the network. Since interference bits do not follow the Ethernet frame structure and can roam freely within the collision domain, diagnosing such issues without proper testing tools can be challenging. We tested the power quality using an F43 power quality analyzer, which indicated some harmonic issues but not exceeding the standards. The electric field strength was measured with a field strength meter up to 970 MHz, and it was within acceptable limits. The most likely source of interference was from nearby large power-consuming equipment. Since the problem occurred regularly at specific times during office hours, the issue was likely associated with scheduled equipment or the work environment. The entire floor was affected by the issue, making it more probable that it was related to shared areas. The director reported that every device on all floors, not just the third floor, was individually turned off and tested. Each power supply source was replaced to ensure the devices were in working order and compliant.
3. Diagnostic Recommendations
Standardized design, construction, and acceptance testing (certification testing) are essential methods for ensuring the quality of network projects. One critical aspect is to separate power lines and computer network cabling. If using metal conduits for short-distance shielding, the metal conduit must have a good grounding system. Otherwise, the risk of a “pyrrhic victory” is high. The testing statistics showed that not all power lines had significant harmonic content at present, with most of them having relatively low content. However, the trend in the changing power environment is towards an increase in nonlinear power-consuming devices. As a result, harmonic pollution is expected to worsen and escalate. To avoid potential problems, it is essential to adopt a more cautious approach.
4. Afterword
A week later, the user was contacted, and it was found that other floors also had cables laid alongside the cabling systems. While the cables on these floors had fewer harmonic issues, the situation had been resolved. They have all been changed, and plans are in place to inspect the cabling of other relevant buildings.