Prepare to Troubleshoot Network Performance Issues
It’s essential to conduct network tests to pinpoint the problem’s source and then troubleshoot network performance issues like packer loss or latency. The following steps can help distinguish between network and application issues. Benchmarking performance results beforehand allows for easy comparison when observing performance issues.
Before diving into troubleshooting, ensure to check the following:
- Be sure that the network utilities are installed on both endpoints (on the EC2 instance and the on-premises host).
- Use an EC2 instance that supports enhanced networking, and be sure that the drivers are up to date. Enhanced networking provides higher I/O with low CPU utilization, which helps avoid instance-level issues when running performance tests. If enhanced networking isn’t turned on, see Enhanced networking on Linux or Enhanced networking on Windows.
- Connect to your EC2 instance to access the instances and be sure that there’s end-to-end connectivity between your EC2 instance and your on-premises host.
Troubleshooting Network Performance Issues on Linux
Install the following tools and follow the steps to help troubleshoot network performance problems and test your network:
- AWSSupport-SetupIPMonitoringFromVPC to collect network metrics such as packet loss, latency, MTR, tcptraceroute, and tracepath.
- MTR to check for ICMP or TCP packet loss and latency problems.
- Traceroute to determine latency or routing problems.
- Hping3 to determine end-to-end TCP packet loss and latency problems.
- Tcpdump to analyze packet capture samples.
Step 1. Examine traceroute or MTR reports by starting from the bottom and working upward
Begin by checking for loss or latency at the final hop or destination, then review the preceding hops. If packet loss or latency persists through the final hop, it could indicate a network or routing problem. If packet loss or latency occurs at one hop along the path, it might be due to control plane rate limiting on that node. Ensure that the last reported hop matches the intended destination in the command; if not, there may be issues caused by a restrictive security group.
Utilize the AWSSupport-SetupIPMonitoringFromVPC tool to assess performance, as it collects crucial metrics to troubleshoot network performance problems. For detailed guidance, refer to Amazon VPC’s Debugging tool for network connectivity.
Troubleshoot network performance issues on Linux by checking its performance statistics. Assess CPU, memory utilization, and load average if you have access to the source or destination instance.
Use the MTR command on Linux for continuous, real-time network performance analysis. MTR combines traceroute and ping functionalities and comes preinstalled on most Linux distributions. Alternatively, you can install it from your distribution’s software package manager.
To install MTR, follow these commands:
Amazon Linux:
sudo yum install mtr
Ubuntu:
sudo apt-get install mtr-tiny
To assess your network’s performance using MTR, conduct bidirectional tests between the public IP addresses of your EC2 instances and your on-premises host. Paths on a TCP/IP network may differ when the direction is reversed, so it’s crucial to gather MTR results for both directions. Consider using TCP-based tracing instead of ICMP, as many internet devices prioritize ICMP-based trace requests.
Examine packet loss carefully. Single-hop packet loss typically isn’t concerning and might result from a control plane policy dropping “ICMP time exceeded” messages. However, sustained packet loss to the destination hop or across multiple hops could indicate a problem.
Note: It’s common to see a few requests time out.
ICMP-based MTR:
mtr -n -c 200 <Public IP EC2 instance/on-premises host> --report
TCP-based MTR:
mtr -n -T -c 200 <Public IP EC2 instance/on-premises host> --report
The -T argument enables TCP-based MTR, while the –report option sets MTR to report mode. MTR will run for the specified number of cycles with the -c option, printing statistics before exiting.
Please note that TCP-based MTR tests the destination TCP port 80 by default. To specify a different destination TCP port, use the -P option followed by the port number. For example, to MTR destination TCP port 443, use the following command:
mtr -n -T -c 200 <Public IP EC2 instance/on-premises host> -P 443 --report
Step 2. Test performance using traceroute
The Linux traceroute tool maps the path from a client node to a destination node, recording the response time in milliseconds for each router along the way. It also calculates the time each hop takes to reach its destination.
To install traceroute, use the following commands:
For Amazon Linux:
sudo yum install traceroute
Ubuntu:
sudo apt-get update
sudo apt-get install traceroute
Note: Traceroute isn’t necessary if you run an MTR report. MTR provides latency and packet loss statistics to a destination.
Ensure that port 22 or the port you’re testing is open in both directions. When troubleshooting network connectivity with traceroute, execute the command from the client to the server and from the server back to the client. Paths between nodes on a TCP/IP network can vary if the direction is reversed. Opt for a TCP-based trace (using your application port) instead of ICMP, as most internet devices deprioritize ICMP-based trace requests.
For ICMP-based traceroute:
sudo traceroute -I <Public IP of EC2 instance/on-premises host>
TCP-based traceroute:
sudo traceroute -n -T -p 22 <Public IP of EC2 instance/on-premises host>
The argument -T -p 22 -n performs a TCP-based trace on port 22.
Note: You can use your application specific port for testing. Use the specific port to understand if there are any intermediate devices in the path dropping your application traffic.
Step 3. Test performance using hping3
Hping3 is a versatile command-line TCP/IP packet assembler and analyzer that measures packet loss and latency over a TCP connection. Unlike MTRs and traceroute, hping3 supports ICMP echo requests, TCP, UDP, and RAW-IP protocols. It also includes a traceroute mode for sending files between covered channels. Hping3 is valuable for scanning hosts, assisting with penetration testing, testing intrusion detection systems, and transferring files between hosts.
Unlike MTRs and traceroute, which capture per-hop latency, hping3 provides end-to-end min/avg/max latency over TCP, along with packet loss statistics. To install hping3, follow these commands:
For Amazon Linux 2, install the EPEL release package for RHEL 7, then activate the EPEL repository.
sudo amazon-linux-extras install epel -y
Amazon Linux 2:
sudo yum --enablerepo=epel install hping3
Ubuntu:
sudo apt-get install hping3
The following command sends 50 TCP SYN packets over port 0. By default, hping3 sends TCP headers to the target host’s port 0, with a window size of 64 and without a TCP flag:
sudo hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host>
The following command sends 50 TCP SYN packets over port 22:
sudo hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host> -p 22
Note: Be sure that port 22 or the port that you’re testing is open.
Step 4. Test packet capture samples using tcpdump
When diagnosing packet loss or latency issues, it’s advisable to conduct simultaneous packet captures on both your EC2 instance and on-premises host. This approach enables the identification of request and response packets, aiding in the isolation of issues at the networking and application layers. To ensure comprehensive packet capture, it’s recommended to start the packet capture before initiating the traffic flow.
To install tcpdump, execute the following commands:
Amazon Linux:
sudo yum install tcpdump
Ubuntu:
sudo apt-get install tcpdump
After tcpdump is installed, you can run the following command to capture the tcp port 22 traffic and save it in a pcap file.
sudo tcpdump -i eth0 port 22 -s0 -w samplecapture.pcap
Note: The tcpdump flag -i specifies the interface on the instance where tcpdump captures the traffic. You might need to change the interface from eth0 to the configured interface in your environment.
How to Troubleshoot Network Performance on Windows
Step 1. Check for ECN capability
1. Run the following command to determine if Explicit Congestion Notification (ECN) capability is turned on:
netsh interface tcp show global
2. If ECN capability is activated, run the following command to deactivate it:
- netsh interface tcp set global ecncapability=disabled
3. If you don’t see an improvement in performance, you can re-activate ECN capability using the following command:
netsh interface tcp set global ecncapability=enabled
Step 2. Review hops and troubleshoot TCP port connectivity
First, use MTR or tracert to review hops:
MTR method:
1. Download and install WinMTR.
2. Enter the destination IP in the Host section, and then choose Start.
3. Let the test run for a minute, and then choose Stop.
4. Choose Copy text to clipboard and paste the output in a text file.
5. Look for any losses in the % column that are propagated to the destination.
Note: Ignore any hops with the No response from host message. This message indicates that those particular hops aren’t responding to the ICMP probes.
6. Review hops on the MTR reports using a bottom-up approach. For example, check for loss on the last hop or destination, and then review the preceding hops.
Tracert method:
If you don’t want to install MTR, you can use the tracert command utility tool.
1. Perform a tracert to the destination URL or IP address.
2. Look for any hop that shows an abrupt spike in round-trip time (RTT). An abrupt spike in RTT might indicate that there’s a node under high load, which in turn induces latency or packet drops in your traffic.
Note: The -d option doesn’t resolve IP addresses to hostnames. Remove -d if IP to hostname resolution is required.
tracert -d <Public IP of EC2 instance/on-premises host>
Then, check TCP port connectivity.
Note: Because WinMTR and tracert are both ICMP-based, you can use tracetcp to troubleshoot TCP port connectivity.
1. Download WinPcap and tracetcp.
2. Extract the tracetcp ZIP file.
3. Copy tracetcp.exe to your C drive.
4. Install WinPcap.
5. Open the command prompt and root WinPcap to your C drive using the *C:\Users\username>cd * command.
6. Run tracetcp using the following commands: tracetcp.exehostname:port or tracetcp.exe ip:port.
Step 3. Check the Windows Task Manager
If you can access the source or destination instance, check the Windows Task Manager for any issues related to CPU and memory utilization or load average.
Step 4. Take a packet capture
Note: When diagnosing packet loss or latency issues, it’s recommended to perform simultaneous packet captures on both your EC2 instance and your on-premises host. This allows you to identify request and response packets, isolating the issue at the networking and application layers. It’s also advisable to start the packet capture before initiating traffic to ensure all packets are captured for the flow.
- Install Wireshark and initiate a packet capture.
- Apply the following filter to isolate traffic between specific sources: (ip.addr eq source_IP) && (tcp.flags.syn == 1). This will display all TCP streams initiated by the specified source IP.
- Select the row corresponding to the relevant source and destination IP.
- Right-click and choose “Follow” > “TCP Stream” from the context menu. This will show the TCP flow between the source and destination IPs for investigation.
- Look for retransmissions, duplicate packets, or TCP window size notifications such as “TCP window full” or “Window size zero.” These indications may suggest that TCP buffers are running out of space.
If packet loss is detected or if the number of hops changes significantly from your benchmarks, consult your networking equipment vendor documentation. In multi-homed network environments, conduct these tests using a different ISP.