1. Symptoms
In the afternoon, the Information Center of a city’s Administration for Industry and Commerce received a call reporting connectivity issues between the county-level Administration for Industry and Commerce and the city’s central network. The network was notably slower than usual, with computers on the seventh floor of the county’s Administration for Industry and Commerce experiencing significant data exchange problems. However, computers on other floors could still maintain slow but somewhat normal data communication. The city did not adequately plan for future self-maintenance when devising the wide-area networking solution and simply entrusted the maintenance work to the project contractor, without equipping themselves with specialized tools or training personnel for network maintenance. The individuals responsible for this project at the contractor had already left the company, and thus, they could not assist in resolving today’s issues. With the help of a contact, they reached out to a network specialist.
2. Diagnostic Process
We arrived in the city that night and started searching for the fault overnight. The city’s network was quite extensive, including 87 branches in 7 counties and 6 districts, interconnected with 64K DDN links between the city and the county offices, and telephone lines connecting the county offices with the local offices. We conducted channel testing using an F683 network tester from the city’s central office to the troubled county office. At 4K speed, the test did not complete, with an 804ms response time, and ICMP Ping indicated that the connection success rate of the county office’s router was about 1/7. We disconnected all network equipment at the county office, unplugged all line connectors to the router, leaving only the router, a hub, and a laptop connected. With this setup, the channel test was successful at 54K, with a response time of 46ms and a 100% ICMP Ping success rate. This proved that the issue was not with the DDN link but within the county office’s network itself.
We then drove to the county’s Administration for Industry and Commerce building, restored power to the network equipment in the building, reconnected all cable connectors, and used Fluke’s F683 network tester to scan the network segments. After 30 seconds, it displayed that the IP address of the backup router was incorrect, accompanied by a few FCS frame errors. Clearly, the fault was related to this router with the incorrect address. However, the network management personnel had no information about the source of this router, and there was no documentation for it. After repeated inquiries with the network management personnel, they finally remembered that there was an abandoned backup router that had not been in use for the past six months. Although it had not been physically removed from the rack, it had remained unpowered and the cable connections were not detached. Upon inspecting the router, it was found to be powered on!! The cause of this willful act was not investigated further, but we immediately unplugged the power cord from the router, resulting in the network speed returning to normal after one minute. At this point, the F683 network tester showed that the duplicate IP addresses had disappeared, but there were still some FCS frame errors, indicating network issues, primarily related to cabling and link equipment. The difficulty in data exchange on the seventh floor compared to other floors was consistent with the symptoms of the Daisy Chain effect. As per network management personnel, the network had occasionally experienced slower connectivity between the seventh floor, the city office, and the first and second floors, and occasional disruptions. Checking the project drawings, only layouts and network equipment distributions for floors one to five were marked. The equipment for the sixth and seventh floors was added by the county office six months prior and was not documented. Therefore, we had to trace the network connections along the hub cabling direction. A simple count revealed that there were a total of five hubs connecting the equipment on the seventh floor, first floor, and second floor (the router was on the second floor), which could easily lead to data packet delays and collisions (resulting in FCS errors in a 10Base-T network).
3. Diagnostic Conclusion
The Daisy Chain effect refers to a network error phenomenon that occurs when the number of hubs between any two stations in a local area network (10M network) exceeds four, resulting in excessively long data transmission times. In this case, the sixth and seventh floors were added later, and network management personnel, without proper network planning, connected the hubs in a cascading manner, leading to the Daisy Chain effect. If it hadn’t been for someone accidentally connecting the backup router to the network, causing a wide-area network issue, the Daisy Chain effect could have remained a lurking problem for an extended period.
In general, address conflicts on routers can cause severe routing bottlenecks. Address conflicts with servers, switches, and routers can also lead to significant bandwidth balance issues. Address conflicts with workstations are relatively less problematic. The network maintenance and management of the city’s Administration for Industry and Commerce were practically nonexistent, which is a typical situation in many domestic network maintenance management cases. While the focus was previously on network construction in recent years, it is now crucial to prioritize network health maintenance. With the increasing scale, speed, and complexity of networks, potential problems can become increasingly severe.
4. Diagnostic Recommendations
Alter the hub connection method on the sixth and seventh floors or redo the cabling properly. Designate a responsible individual for managing the backup router. Train network maintenance and management personnel, equip them with appropriate maintenance tools, and conduct necessary periodic tests and documentation of the network’s operational status. Proper documentation of the network is essential, and hardware documentation should include the MAC addresses of machines.
5. Afterword
Three weeks later, the city’s Administration for Industry and Commerce conducted a comprehensive cleanup of the entire network system. Following this, we were invited to perform a random check of their network, which was in fairly good condition. As for the previous “backup router powering-on incident,” no one has “confessed” to this day.