1. Symptoms
It’s the weekend, and I’m planning my holiday when I receive a call from a bank reporting a network outage at the West Area Branch it oversees. All 33 ATMs managed by the branch are unable to provide withdrawal services, and customers are expressing strong dissatisfaction. It has been two days, and the issue remains unresolved. They are requesting immediate assistance from the Network Hospital.
The West Area Branch and the main branch are located in two separate buildings within the same compound, connected by a pair of 90-meter fiber optic cables that link the branch’s network to the main branch’s network. Routers, servers, and other equipment are all located in the main branch’s data center (100BaseT Ethernet). The network structure at the West Area Branch is 10BaseT Ethernet. Five days ago, they noticed a slowdown in network speed, and users complained about long waiting times for ATM withdrawals.
Since the West Area Branch lacks any network testing and maintenance tools, they reached out to the network administrators at the main branch for assistance in diagnosing the issue. Network monitoring from the main branch’s end showed that everything was normal. When they accessed the West Area Branch’s switch MIB from the data center, traffic appeared normal at 5%, with only a small number of CRC/FCS errors. No significant issues were found after observing captured data packets with a protocol analyzer, leading them to suspect a virus affecting the West Area Branch’s subnet.
Last night, virus scans, system reinstallation, and data recovery work were carried out, significantly alleviating the symptoms. However, the network failed to withstand the storm that occurred last night, ending up completely offline this morning.
To facilitate troubleshooting, the main branch’s network administrators temporarily replaced the switch connecting to the West Area Branch with a hub in the data center, which resulted in a slowdown in the main branch’s network speed as well. Examination of data transfers within the West Area Branch revealed no issues, leading to the conclusion that the problem lies in the transmission channel. Removing the fiber optic cable restored normal speed in the main branch, but reattaching it caused the previously described issues to reoccur. Additional tests on the fiber optic link showed that both the connection and attenuation met requirements. Troubleshooting efforts came to a standstill.
According to the information provided by the network administrators, the fiber optic cables and switches have undergone preliminary testing and appear to be functioning correctly. It can be preliminarily determined that the problem lies within the link channel. Connecting an F683 network tester to the West Area Branch’s switch, the network appears to be functioning normally. Conducting channel tests, the ICMP Ping tests from the West Area Branch to the main branch show a success rate of approximately 0.8%, and the route tracing to the main branch’s server yields a success rate of about 0.5%. Observing from the main branch’s hub, traffic is at 18%, which is within the normal range, but a significant amount of “phantom interference” errors, labeled as “Ghosts” (16%), is detected. Removing the fiber optic cable results in 0% errors. At this point, it can be affirmed that the issue is related to the West Area Branch’s network and its channel.
To further investigate, the switch interface connecting the West Area Branch to the main branch was replaced with a 4-port hub. Using the F683 network tester to observe the network, traffic is at 5%, but a high degree of phantom interference (97%) is detected. Once the fiber optic cable is removed, the errors disappear. During the search for the fiber optic junction box, it was discovered that the outer casing of the box on the main branch side had been deformed and damaged due to impact (reportedly caused by a crane arm during air conditioning installation six months ago), and rainwater had completely saturated Junction #3 (used to connect to the West Area Branch). Cleaning all the fiber optic connectors in the junction box, heating and drying the fiber optic plug sockets with a hairdryer, replacing and sealing the junction box, completely resolved the issue.
2.Diagnostic Comments
Fiber optic links are often overlooked. In this case, the fiber optic connectors were corroded and contaminated by rainwater, resulting in a significant reflection of the signals sent from the West Area Branch. Testing the physical performance of the fiber optic link alone would appear to be satisfactory. However, since this segment of fiber optic cable is only 90 meters long, strong reflected signals, after minimal attenuation, overlapped with the normal signals, disrupting the data structure (including data frame header signal formats). The network tester interpreted this as phantom interference rather than normal data signals. At this point, only a few signals might pass through by chance. Because hubs and switches do not have collision detection capabilities, network administrators could only observe a small number of FCS/CRC-type errors caused by the disruption of the latter half of the data frame, a problem often overlooked by people.
After the system was reinstalled yesterday, the weather improved, leading to some improvement in the performance of the fiber optic connectors, and the symptoms were alleviated. However, heavy rainfall last night once again plunged the network into a disastrous state. Combined with today’s test results showing normal fiber optic link performance, troubleshooting efforts came to a standstill, leaving no solution.
3. Recommendations
Switches have a good effect on balancing network loads and isolating the impact of faulty segments on the network, but this can also make them “black holes” in network management monitoring. Regular network testing with a network tester can help eliminate issues in their early stages. There are various types of regular testing, and we will introduce them in upcoming installments. If this issue is not addressed promptly, other networks connected to fiber optic junctions will also experience severe problems one after another.