1. Symptoms
Today, the Information Center of a certain city’s Industrial and Commercial Bureau reported a case to the Internet Hospital. They reported that their critical enterprise data server frequently experiences “blocking” issues. The cause of this problem is that employees from various business acceptance offices and other locations frequently complain to the Information Center. They encounter “obstructions,” slowed speeds, or temporary business interruptions when performing enterprise data queries, verifications, and new business registration operations. Since the issue is not persistent and despite multiple checks, attempts to remove “malware,” and server replacements for improved speed, the problem remains unsolved at its root. They need help finding the “culprit.”
Walking into the Information Center of the Industrial and Commercial Bureau reveals a bright and brand-new server room with a huge network topology diagram on the front wall. The diagram clearly shows various online devices and network equipment models, names, locations, speeds, link types, and connection relationships, among other details. Initially, it seems like their network management level is decent.
However, upon investigation, it is discovered that the actual network structure is quite different from what’s shown on the topology diagram. Most of the machines for business use are still located in the old information center server room, with only critical equipment like the enterprise data server installed in the new Industrial and Commercial Building’s information center server room, which is connected to the office network. The new building and the old information center are approximately 2000 meters apart, connected through optical cables and routers, with a firewall on the office network side. Most office network users can access the international Internet through a WAN link. The explanation provided by the Information Center Director is that as per the engineering plan, all equipment and personnel from the original information center server room should have been relocated to the new building’s information center server room. However, due to issues with the new building’s construction quality, only a portion of the equipment and most of the personnel were moved two months ago. To avoid disrupting operations, temporary rearrangements of the equipment were made, and the network continued to operate. Everything seemed fine until about a month ago when the problems started occurring.
This Information Center is responsible for eight Industrial and Commercial Sub-Bureaus and 76 Commercial Offices, managing their network connections and business support. Sub-bureaus are connected using frame relay links, and Commercial Offices are connected to the Sub-bureaus via DDN, ISDN, or dial-up connections. A firewall isolates the business network from the office network, and, as per design requirements, most business network users should not have access to the Internet.
2. Diagnostic Process
Observations from the network management system installed in the office network indicate that the traffic on the enterprise data server is at 28%, which is normal. A connectivity test using a network testing tool (F683) from the office network shows a 0% packet loss, indicating that the server is currently functioning correctly. By sending 10% of traffic to the server using a network assistant, it is observed that the server responds to users in the office network but has a few “unresponsive” records for users in the original business network. This suggests that the majority of the problem likely lies within the original business network.
Moving the network testing tool to the old building in the Information Center for testing yields the following results: network traffic is at 45% (slightly high), collision rate is 3%, error rate is 0%, and broadcast rate is at 7% (slightly high). Overall, the network appears to be functioning normally. Further observation of the network protocol distribution also appears normal. However, looking at the data packet exchange matrix reveals that there are “unresponsive” records for all access data packets to the enterprise data server. This issue affects a wide range of devices, with almost 40% of workstations experiencing issues.
To verify if this is a data link problem, ICMP Ping and ICMP Monitor tests were conducted. The former reported two MAC addresses responding, while the latter reported numerous “destination unreachable,” “redirect,” and “congestion” alerts. This indicates duplicate IP addresses in the network’s data links and issues with data frame routing calculations. Enabling the network testing tool’s automatic network search function revealed redundant route resolution (Proxy) operations, but no duplicate IP addresses were found in this network segment (indicating duplicate IP addresses were in the data access channel).
As the network management team did not have MAC address backup documentation, it was recommended to shut down all local workstations in the old building. This immediately restored the network to normal operation. To identify workstations with the same name as the server, workstations were gradually powered on, revealing that one of the two workstations of the staff responsible for the old building had an IP address identical to the enterprise data server. Further investigation of this workstation uncovered the presence of an unauthorized Proxy running on it, consistent with the network search results.
3. Conclusion
There are three reasons for the issue. The first is duplicate IP addresses, the second is the operation of an unauthorized routing proxy. When business network users request further address resolution analysis, conflicts occur between the staff workstation and the data server, leading to significant disruption in data flow (note that the data frame structure remains normal). This results in user access obstacles, and application software frequently requests reconnection and data retransmission, causing higher traffic and slower business processes. As the conflict primarily occurs within the original information center network, the enterprise data server’s traffic appears normal, and the network management system reports no error data packets.
The third reason is a lack of control over the staff remaining in the old building. Due to “boredom” (as self-described by employees), they desired “unauthorized” Internet access, rapidly becoming “daytime web worms” and disrupting regular business processes. Their actions were not necessarily persistent, which is why the problem persisted for over a month without resolution.
In reality, Internet users in the office network were also somewhat affected, but it went unnoticed due to the lower frequency of daytime user activity.
4. Diagnostic Recommendations
Most network management vulnerabilities originate from internal staff. Establishing a strict internal management mechanism is crucial. Additionally, it is recommended to include MAC address backups in essential documentation. Furthermore, daily automatic searches of network status will help quickly identify and eliminate unauthorized users.
In a healthy network maintenance plan, regular testing (including daily and cyclical tests) has long been a standard practice. Consistently conducting necessary daily tests and checks will ensure that 99.9% of network problems can be resolved within two days or less.
5. Afterword
A month later, users reported that all equipment had been moved to the new location. They now maintain regular testing and recording of network operations, and the network is working well. The nerve-wracking days have finally come to an end, and everyone can breathe a sigh of relief.