Linux Network Troubleshooting Commands [Full Guide]

One of the biggest challenges in my time working in Linux network troubleshooting has always been bridging the gap between networking and systems engineering. System administrators often lack visibility into the Linux network and frequently blame it for outages or peculiar problems. Conversely, network administrators, who have no control over the servers, suffer from a constant state of “guilty under suspicion” fatigue concerning the Linux network and frequently attribute issues to the network endpoints.

Next, I will cover the basics of Linux network troubleshooting through the Linux command line.

TCP/IP Model of Linux Command Line

Linux Network Troubleshooting

Linux Network Troubleshooting: Physical Layer

Let’s start with the most basic question: How do you tell if a physical interface is up? Use the command: IP link show

Note the DOWN indication for the ens192 interface in the output above. This means that the physical layer is not up. First make sure the interface is not disabled. You can then try to do Linux network troubleshooting by checking the cabling or the remote end of the connection (such as a switch).

The output of the IP link show can be difficult to parse at a quick glance. The -br switch prints this output in a more readable tabular format:

Use the command IP link set ens192 up to solve the problem. ens192 has resumed normal operation.

These commands are great for Linux network troubleshooting obvious physical problems, but what about more subtle problems? The interface may be negotiating at the wrong speed, or collisions and physical layer issues may be causing packets to be lost or corrupted, resulting in expensive retransmissions. How do we start Linux network troubleshooting these problems?

We can use the -s flag of the IP command to print additional statistics about the interface. The following output shows a mostly clean interface with only a small amount of received packet loss and no other signs of physical layer problems:

For more advanced physical layer troubleshooting, the ethtool utility is an excellent choice. A particularly good use case for this command is to check if an interface has negotiated the correct speed. An interface that has negotiated the wrong speed (e.g., a 10Gbps interface that only reports a 1Gbps speed) could be an indication of a hardware/cabling problem, or a misconfigured negotiation on one side of the link (e.g., a misconfigured switch port).

The above output shows a link that has been properly negotiated to 1000Mbps speed and full-duplex mode.

Linux Network Troubleshooting: Data Link Layer

The data link layer is the second layer in the OSI model and is responsible for providing reliable data transmission between directly connected nodes. It defines the format and method of transmitting data frames on physical connections, as well as logical addressing and control access between nodes. The data link layer usually consists of two sublayers:

  1. Logical Link Control (LLC) Sublayer: The LLC sublayer is responsible for establishing and maintaining logical links, handling functions such as flow control, error detection, and correction to ensure reliable data transmission.
  2. Medium Access Control (MAC) Sublayer: The MAC sublayer is responsible for implementing medium access control, managing data transmission when multiple nodes share the same physical medium, and processing the physical addresses of the nodes.

The Data Link layer is responsible for local network connectivity, primarily the communication of frames between hosts in the same Layer 2 domain (often called a LAN). The most relevant Layer 2 protocol to most system administrators is the Address Resolution Protocol (ARP), which maps Layer 3 IP addresses to Layer 2 Ethernet MAC addresses. When a host tries to contact another host on its local network (such as the default gateway), it may have the other host’s IP address, but it does not know the other host’s MAC address. ARP solves this problem and figures out the MAC address for us.

A common problem you’ll run into is a failure to populate an ARP entry, especially for the host’s default gateway. If your local host cannot successfully resolve the layer 2 MAC address of its gateway, it will not be able to send any traffic to the remote network. This problem could be caused by having the wrong IP address configured for the gateway, or it could be another problem, such as a misconfigured switch port.

We can use the IP neighbor command to check the entries in our ARP table:

The MAC address of the gateway is already filled in. If there is a problem with ARP, then we will see the resolution fail:

Another common use case for the IP neighbor command involves manipulating the ARP table. Imagine that your networking team has just replaced the upstream router (i.e., the default gateway for your servers). The MAC address may have changed as well since MAC addresses are hardware addresses assigned at the factory.
Linux caches ARP entries for some time, so you may not be able to send traffic to the default gateway until the ARP entry times out. For very important systems, this outcome is undesirable. Fortunately, you can manually delete the ARP entry, which will force a new ARP discovery process:

Linux Network Troubleshooting: Network Layer

Layer 3 involves the use of IP addresses, which should be familiar to any system administrator. IP addresses provide a way for a host to reach other hosts outside of the local network (although we usually use them within the local network as well). One of the first steps in diagnosing a problem is to check the local IP address of the machine, which can be done using the IP address command, again utilizing the -br flag to simplify the output:

The ens192 interface has an IPv4 address of 10.6.80.202. If we do not have an IP address, then we need to fix this problem for Linux network troubleshooting. The lack of an IP address could be caused by a local configuration error, such as an incorrect network interface configuration file, or a problem with DHCP could cause it.

The frontline tool most system administrators use to diagnose layer 3 problems is the ping utility. Ping sends ICMP Echo Request packets to a remote host and expects an ICMP Echo reply. If you are having connectivity issues with a remote host, ping is a common utility to start Linux network troubleshooting. A simple ping command from the command line will send ICMP echoes to the remote host indefinitely; you will need to press CTRL+C to end the ping command, or pass the -c <num pings> flag, for example:

The ping command includes the time it took to receive the response. While the ping command makes it easy to tell if a host is alive and responding, it is not accurate. Many network operators block ICMP packets for security reasons, although many people disagree with this practice. Another common problem is relying on the time field as an accurate indicator of network latency. Intermediate network devices can rate-limit ICMP packets and cannot be relied upon to provide a true representation of application latency.

The next tool in your layer 3 Linux network troubleshooting toolkit is the traceroute command. Traceroute utilizes the Time to Live (TTL) field in IP packets to determine the path that traffic takes to reach its destination. Traceroute will send packets one by one, starting with a TTL of 1. As packets expire in transit, the upstream router sends an ICMP Time to Live Expired packet. Traceroute then increments the TTL to determine the next hop. The resulting output is a list of intermediate routers that the packet traveled through on its way to its destination:

Traceroute may seem like a great tool, but it’s important to understand its limitations. As with ICMP, intermediate routers may filter packets that Traceroute relies on, such as ICMP Time-to-Live Expired messages. But more importantly, the paths that traffic takes to and from a destination are not necessarily symmetrical and are not always the same. Traceroute may fool you into thinking that your traffic follows a nice linear path both to and from its destination. However, this is rarely the case. Traffic may follow different return paths, and paths can change dynamically for many reasons. While Traceroute may provide an accurate representation of the path in a small business network, it is generally not accurate when trying to trace across a large network or the Internet.

Another common problem you run into is when the upstream gateway for a particular route is missing, or when the default route is missing. When an IP packet is sent to a different network, it must be sent to a gateway for further processing. The gateway should know how to route the packet to its final destination. The list of gateways for different routes is stored in the routing table, which can be inspected and manipulated using the IP route command.

Simple topologies usually have only one default gateway configured, indicated by the “default” entry at the top of the table. A missing or incorrect default gateway is a common problem.

If our topology is more complex and we need to set up different routes for different networks, we can check the routes for a specific prefix:

A clear sign of a DNS problem is being able to connect to a remote host by its IP address, but not by its hostname. A quick nslookup query on the hostname can tell us a lot (nslookup is part of the bind-utils package on Red Hat Enterprise Linux systems):

Note: DNS is not a Layer 3 protocol, but it is worth mentioning when talking about IP addresses.

Linux Network Troubleshooting: Transport Layer

The transport layer consists of TCP and UDP protocols, where TCP is a connection-oriented protocol and UDP is connectionless. Applications listen on sockets, which consist of an IP address and a port. Traffic sent to an IP address on a specific port will be routed by the kernel to the listening application.

View what ports are listening on the local host. This result can be useful if you are unable to connect to a specific service on the machine, such as a web or SSH server. Another common problem is when a daemon or service fails to start because something else is listening on the port. The ss command is very valuable for performing these types of operations.

The meaning of each parameter

  • -t – Display TCP ports.
  • -u – Display UDP ports.
  • -n – Do not attempt to resolve hostnames.
  • -l – Show only listening ports.
  • -p – Show processes using a specific socket.
  • -4 – Show only IPv4 sockets.

Looking at the output, we can see several listening services. The sshd application is listening on port 22 on all IP addresses, indicated by the *:22 output. You can use Telnet or Netcat to test the TCP connection.

To test UDP, you can use Netcat.

The same netstat command can also be achieved

Conclusion

The above are the basic tools commonly used for Linux network troubleshooting; I hope it will be helpful to you.