1. Symptoms
At a telecommunications mobile billing center, users reported that the total number of mobile users had increased by nearly 30% over the past three months. However, the revenue from mobile billing had only increased by 5%. This raised concerns about the billing systemâs performance. A check of the billing serverâs charge records did not reveal any issues. The billing server software appeared to be functioning correctly. An examination from the financial serverâs side revealed that the billing data displayed internally on the financial server matched the data from the billing server. Checking the records at the local telephone exchange revealed that the record count exceeded the mobile billing records. After conducting on-site tests by dialing mobile phones 50 times, it was observed that 45 records were generated, and 30 of them matched the actual call durations. Even after a week, the source of the problem could not be determined.
2. Diagnostic Process
The billing server was connected to the 5th port of the 1st slot of a 16-port switch, Bay28115. The 6th port was linked to a 100Mbps Ethernet, and an HP Open View network management system was also set up on this switch. When network administrators attempted to monitor the performance of the 5th port, they realized that they couldnât access the data records for that port. Upon questioning the network management staff, they explained that three months ago, they had replaced the faulty Bay28115 switch with a backup, which was working fine since the replacement. Checking the maintenance work records and logs did not reveal any information about the Bay28115 switch, nor any records of network performance parameters. When asked why they hadnât enabled SNMP support for the switch and its management information base (Mib), the response was that the network management system had been installed a year ago and was primarily used to check if the system equipment was connected and if there were any alarm signals. The previous network administrator had already been reassigned, and no one knew how to use and set up the network management system. Since the systemâs installation was the responsibility of a system contractor, no issues were initially detected after they changed the switch, and no further detailed inspections were carried out.
Using a network tester with protocol dialog analysis capability, they observed the billing serverâs performance from the network segment where the network management system was located. They discovered that the server did not respond to approximately one-third of the data packets. To avoid disrupting the system during the day when mobile users were active, they conducted a simulation of the server using an F683 network tester at 3:00 AM, which showed the link operating at a 10Mbps rate (the original records indicated that this port should have been operating at 100Mbps). Since the switch did not support SNMP, they temporarily installed a 10Mbps hub on the 5th port and connected it to the server. Using the network tester, they sent data to the billing server from any port on this hub and observed the serverâs data flow. This revealed a significant number of collisions and FCS frame errors. When the traffic was at 30%, the collision and error rates accounted for 21%. An inspection of the serverâs cable revealed severe near-end crosstalk (NEXT) issues on the end near the switch. After replacing the plug and wiring it correctly, the collision rate dropped to 0.5%, and the error rate was reduced to 0%. The temporary hub was removed, and SNMP support on the switch was reactivated. They sent traffic from an idle switch port to the server to observe the performance of the 5th port on the billing server. At 40Mbps of traffic, all parameters including collision rate, error rate, and broadcast rate exhibited excellent performance. The server adapted itself to a 100Mbps link speed. Two sets of 50 actual dialing tests were repeated, and the billing data was now entirely accurate. It was reasonably certain that the billing system had fully recovered.
3. Conclusion
The cause of this issue was quite simple (a plug problem), but the manifestations appeared to be more complex. The server utilized a 10/100Mbps adaptive Ethernet card, designed for a 100Mbps link speed. When the network administrators replaced the switch, they accidentally damaged the plug, and although they replaced it, the potential issue remained. However, the maintenance personnel did not promptly identify any speed-related anomalies. Consequently, the serverâs actual working speed had decreased to 10Mbps. The new switch did not support SNMP, so the network management system couldnât monitor the billing serverâs port performance. The maintenance staff at the billing center did not typically use the network management system to periodically observe and record network performance parameters. When the fault occurred, they could not detect any changes in the serverâs working speed. Interestingly, even if the cabling had no problems, the billing server should have still functioned correctly when set to a 10Mbps link speed (as billing information network traffic is typically not very high). In this case, the server was unable to process a portion of the data packets due to the high collision and error rates during peak billing times, leading to inaccurate billing data.
4. Diagnostic Recommendations
Wiring systems should be regularly tested during routine maintenance (testing every one or two years is essential). After replacing network components, testing the network is crucial, especially for 100Mbps links; they should be tested with a cable tester. Network management systems should be managed and used by designated personnel. Generally, network management systems can cover around 35% of network issues, so itâs strongly advised that important networks install SNMP or RMON protocol support. Existing network devices should be activated, and the SNMP and RMON features should be initiated. Otherwise, the network management system would be virtually useless. Timely and complete documentation is necessary for effective maintenance and fault resolution.
5. Afterword
After a month of operation, the billing system contributed 35% of the total revenue, which was a great delight. Having tasted some success, the billing center decided last week to send two maintenance personnel to attend a one-week âNetwork Maintenance and Fault Diagnosis Technologyâ training course at the âNetwork Academy.â