Linux Network Troubleshooting Commands [Full Guide]

Contents hide

Linux Network Troubleshooting: Physical Layer

Linux Network Troubleshooting: Data Link Layer

Linux Network Troubleshooting: Network Layer

Linux Network Troubleshooting: Transport Layer

One of the biggest challenges in my time working in Linux network troubleshooting has always been bridging the gap between networking and systems engineering. System administrators often lack visibility into the Linux network and frequently blame it for outages or peculiar problems. Conversely, network administrators, who have no control over the servers, suffer from a constant state of “guilty under suspicion” fatigue concerning the Linux network and frequently attribute issues to the network endpoints.

Next, I will cover the basics of Linux network troubleshooting through the Linux command line.

TCP/IP Model of Linux Command Line

Linux Network Troubleshooting: Physical Layer

Let’s start with the most basic question: How do you tell if a physical interface is up? Use the command: IP link show

# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b6:e3:71 brd ff:ff:ff:ff:ff:ff

Note the DOWN indication for the ens192 interface in the output above. This means that the physical layer is not up. First make sure the interface is not disabled. You can then try to do Linux network troubleshooting by checking the cabling or the remote end of the connection (such as a switch).

# ip link set ens192 up

The output of the IP link show can be difficult to parse at a quick glance. The -br switch prints this output in a more readable tabular format:

# ip -br link show
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
ens192           UP             00:0c:29:b6:e3:71 <BROADCAST,MULTICAST,UP,LOWER_UP>

Use the command IP link set ens192 up to solve the problem. ens192 has resumed normal operation.

These commands are great for Linux network troubleshooting obvious physical problems, but what about more subtle problems? The interface may be negotiating at the wrong speed, or collisions and physical layer issues may be causing packets to be lost or corrupted, resulting in expensive retransmissions. How do we start Linux network troubleshooting these problems?

We can use the -s flag of the IP command to print additional statistics about the interface. The following output shows a mostly clean interface with only a small amount of received packet loss and no other signs of physical layer problems:

# ip -s link show ens192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:b6:e3:71 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
34107919 5808 0 6 0 0
TX: bytes packets errors dropped carrier collsns
434573 4487 0 0 0 0

For more advanced physical layer troubleshooting, the ethtool utility is an excellent choice. A particularly good use case for this command is to check if an interface has negotiated the correct speed. An interface that has negotiated the wrong speed (e.g., a 10Gbps interface that only reports a 1Gbps speed) could be an indication of a hardware/cabling problem, or a misconfigured negotiation on one side of the link (e.g., a misconfigured switch port).

# ethtool ens192
Settings for ens192:
	Supported ports: [ TP ]
	Supported link modes:   1000baseT/Full
	                        10000baseT/Full
	Supported pause frame use: No
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  Not reported
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: off
	MDI-X: Unknown
	Supports Wake-on: uag
	Wake-on: d
	Link detected: yes

The above output shows a link that has been properly negotiated to 1000Mbps speed and full-duplex mode.

Linux Network Troubleshooting: Data Link Layer

The data link layer is the second layer in the OSI model and is responsible for providing reliable data transmission between directly connected nodes. It defines the format and method of transmitting data frames on physical connections, as well as logical addressing and control access between nodes. The data link layer usually consists of two sublayers:

Logical Link Control (LLC) Sublayer: The LLC sublayer is responsible for establishing and maintaining logical links, handling functions such as flow control, error detection, and correction to ensure reliable data transmission.
Medium Access Control (MAC) Sublayer: The MAC sublayer is responsible for implementing medium access control, managing data transmission when multiple nodes share the same physical medium, and processing the physical addresses of the nodes.

The Data Link layer is responsible for local network connectivity, primarily the communication of frames between hosts in the same Layer 2 domain (often called a LAN). The most relevant Layer 2 protocol to most system administrators is the Address Resolution Protocol (ARP), which maps Layer 3 IP addresses to Layer 2 Ethernet MAC addresses. When a host tries to contact another host on its local network (such as the default gateway), it may have the other host’s IP address, but it does not know the other host’s MAC address. ARP solves this problem and figures out the MAC address for us.

A common problem you’ll run into is a failure to populate an ARP entry, especially for the host’s default gateway. If your local host cannot successfully resolve the layer 2 MAC address of its gateway, it will not be able to send any traffic to the remote network. This problem could be caused by having the wrong IP address configured for the gateway, or it could be another problem, such as a misconfigured switch port.

We can use the IP neighbor command to check the entries in our ARP table:

# ip neighbor show
10.6.80.1 dev ens192 lladdr 7c:1e:06:25:d2:d9 DREACHABLELAY

The MAC address of the gateway is already filled in. If there is a problem with ARP, then we will see the resolution fail:

# ip neighbor show
10.6.80.1 dev ens192 FAILED

Another common use case for the IP neighbor command involves manipulating the ARP table. Imagine that your networking team has just replaced the upstream router (i.e., the default gateway for your servers). The MAC address may have changed as well since MAC addresses are hardware addresses assigned at the factory.
Linux caches ARP entries for some time, so you may not be able to send traffic to the default gateway until the ARP entry times out. For very important systems, this outcome is undesirable. Fortunately, you can manually delete the ARP entry, which will force a new ARP discovery process:

# ip neighbor show
10.6.80.1 dev ens192 lladdr 7c:1e:06:25:d2:d9 DREACHABLELAY
10.6.80.100 dev ens192 lladdr ac:1f:6b:d2:3e:bb REACHABLE

# ip neighbor delete 10.6.80.100 dev ens192
# ip neighbor show
10.6.80.1 dev ens192 lladdr 7c:1e:06:25:d2:d9 DREACHABLELAY

Linux Network Troubleshooting: Network Layer

Layer 3 involves the use of IP addresses, which should be familiar to any system administrator. IP addresses provide a way for a host to reach other hosts outside of the local network (although we usually use them within the local network as well). One of the first steps in diagnosing a problem is to check the local IP address of the machine, which can be done using the IP address command, again utilizing the -br flag to simplify the output:

# ip -br address show
lo               UNKNOWN        127.0.0.1/8 ::1/128
ens192           UP             10.6.80.202/24 fe80::20c:29ff:feb6:e371/64
tun0             UNKNOWN        10.88.0.1 peer 10.88.0.2/32 fe80::6435:83ad:f6c6:2b59/64

The ens192 interface has an IPv4 address of 10.6.80.202. If we do not have an IP address, then we need to fix this problem for Linux network troubleshooting. The lack of an IP address could be caused by a local configuration error, such as an incorrect network interface configuration file, or a problem with DHCP could cause it.

The frontline tool most system administrators use to diagnose layer 3 problems is the ping utility. Ping sends ICMP Echo Request packets to a remote host and expects an ICMP Echo reply. If you are having connectivity issues with a remote host, ping is a common utility to start Linux network troubleshooting. A simple ping command from the command line will send ICMP echoes to the remote host indefinitely; you will need to press CTRL+C to end the ping command, or pass the -c <num pings> flag, for example:

# ping www.baidu.com
PING www.a.shifen.com (180.101.50.188) 56(84) bytes of data.
64 bytes from 180.101.50.188 (180.101.50.188): icmp_seq=1 ttl=50 time=20.5 ms
64 bytes from 180.101.50.188 (180.101.50.188): icmp_seq=2 ttl=50 time=20.1 ms
64 bytes from 180.101.50.188 (180.101.50.188): icmp_seq=3 ttl=50 time=20.5 ms
64 bytes from 180.101.50.188 (180.101.50.188): icmp_seq=4 ttl=50 time=20.9 ms
^C
--- www.a.shifen.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 20.198/20.570/20.957/0.268 ms

The ping command includes the time it took to receive the response. While the ping command makes it easy to tell if a host is alive and responding, it is not accurate. Many network operators block ICMP packets for security reasons, although many people disagree with this practice. Another common problem is relying on the time field as an accurate indicator of network latency. Intermediate network devices can rate-limit ICMP packets and cannot be relied upon to provide a true representation of application latency.

The next tool in your layer 3 Linux network troubleshooting toolkit is the traceroute command. Traceroute utilizes the Time to Live (TTL) field in IP packets to determine the path that traffic takes to reach its destination. Traceroute will send packets one by one, starting with a TTL of 1. As packets expire in transit, the upstream router sends an ICMP Time to Live Expired packet. Traceroute then increments the TTL to determine the next hop. The resulting output is a list of intermediate routers that the packet traveled through on its way to its destination:

# traceroute www.baidu.com
traceroute to www.baidu.com (180.101.50.242), 30 hops max, 60 byte packets
 1  * * *
 2  10.6.1.1 (10.6.1.1)  0.729 ms  1.493 ms  0.682 ms
 3  171.84.1.145 (171.84.1.145)  5.554 ms  6.539 ms  4.377 ms
 4  10.250.0.9 (10.250.0.9)  7.654 ms  9.507 ms  8.480 ms
 5  * * 10.34.0.2 (10.34.0.2)  1.435 ms
 6  * * *
 7  * * *
 8  36.110.63.237 (36.110.63.237)  3.988 ms  4.979 ms  3.848 ms
 9  219.141.140.73 (219.141.140.73)  3.992 ms  2.958 ms  3.919 ms
10  * 36.112.241.89 (36.112.241.89)  2.862 ms bj141-152-73.bjtelecom.net (219.141.152.73)  3.952 ms
11  202.97.92.198 (202.97.92.198)  24.938 ms 202.97.98.2 (202.97.98.2)  20.983 ms *
12  180.110.207.14 (180.110.207.14)  20.995 ms 180.110.207.2 (180.110.207.2)  21.953 ms 180.110.207.26 (180.110.207.26)  18.918 ms
13  58.213.95.210 (58.213.95.210)  21.881 ms  23.892 ms  19.861 ms
14  180.101.50.242 (58.213.96.50)  35.925 ms  34.898 ms  30.966 ms

Traceroute may seem like a great tool, but it’s important to understand its limitations. As with ICMP, intermediate routers may filter packets that Traceroute relies on, such as ICMP Time-to-Live Expired messages. But more importantly, the paths that traffic takes to and from a destination are not necessarily symmetrical and are not always the same. Traceroute may fool you into thinking that your traffic follows a nice linear path both to and from its destination. However, this is rarely the case. Traffic may follow different return paths, and paths can change dynamically for many reasons. While Traceroute may provide an accurate representation of the path in a small business network, it is generally not accurate when trying to trace across a large network or the Internet.

Another common problem you run into is when the upstream gateway for a particular route is missing, or when the default route is missing. When an IP packet is sent to a different network, it must be sent to a gateway for further processing. The gateway should know how to route the packet to its final destination. The list of gateways for different routes is stored in the routing table, which can be inspected and manipulated using the IP route command.

# ip route show
default via 10.6.80.1 dev ens192 proto static metric 100
10.6.80.0/24 dev ens192 proto kernel scope link src 10.6.80.202 metric 100

Simple topologies usually have only one default gateway configured, indicated by the “default” entry at the top of the table. A missing or incorrect default gateway is a common problem.

If our topology is more complex and we need to set up different routes for different networks, we can check the routes for a specific prefix:

# ip route show 10.6.80.0/24
10.6.80.0/24 dev ens192 proto kernel scope link src 10.6.80.202 metric 100

A clear sign of a DNS problem is being able to connect to a remote host by its IP address, but not by its hostname. A quick nslookup query on the hostname can tell us a lot (nslookup is part of the bind-utils package on Red Hat Enterprise Linux systems):

Note: DNS is not a Layer 3 protocol, but it is worth mentioning when talking about IP addresses.

# nslookup www.baidu.com
Server:		114.114.114.114
Address:	114.114.114.114#53

Non-authoritative answer:
www.baidu.com	canonical name = www.a.shifen.com.
Name:	www.a.shifen.com
Address: 180.101.50.242
Name:	www.a.shifen.com
Address: 180.101.50.188
Name:	www.a.shifen.com
Address: 240e:e9:6002:15a:0:ff:b05c:1278
Name:	www.a.shifen.com
Address: 240e:e9:6002:15c:0:ff:b015:146f

Linux Network Troubleshooting: Transport Layer

The transport layer consists of TCP and UDP protocols, where TCP is a connection-oriented protocol and UDP is connectionless. Applications listen on sockets, which consist of an IP address and a port. Traffic sent to an IP address on a specific port will be routed by the kernel to the listening application.

View what ports are listening on the local host. This result can be useful if you are unable to connect to a specific service on the machine, such as a web or SSH server. Another common problem is when a daemon or service fails to start because something else is listening on the port. The ss command is very valuable for performing these types of operations.

# ss -tunlp4
Netid State      Recv-Q Send-Q    Local Address:Port           Peer Address:Port
udp   UNCONN     0      0                     *:1194                      *:*                   users:(("openvpn",pid=1023,fd=8))
udp   UNCONN     0      0                     *:69                        *:*                   users:(("xinetd",pid=3232,fd=5))
tcp   LISTEN     0      128                   *:80                        *:*                   users:(("nginx",pid=16373,fd=6),("nginx",pid=16372,fd=6))
tcp   LISTEN     0      128                   *:22                        *:*                   users:(("sshd",pid=1008,fd=3))

The meaning of each parameter

-t – Display TCP ports.
-u – Display UDP ports.
-n – Do not attempt to resolve hostnames.
-l – Show only listening ports.
-p – Show processes using a specific socket.
-4 – Show only IPv4 sockets.

Looking at the output, we can see several listening services. The sshd application is listening on port 22 on all IP addresses, indicated by the *:22 output. You can use Telnet or Netcat to test the TCP connection.

# telnet 127.0.0.1 3306
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
J
-=3t24d?
    !?.S=`pmysql_native_password^CConnection closed by foreign host.

# nc -v 127.0.0.1 3306
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 127.0.0.1:3306.
J
j89=;Y3x//.mysql_native_password

To test UDP, you can use Netcat.

# nc 127.0.0.1 -u 80
Ncat: Connection refused.

The same netstat command can also be achieved

# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      16372/nginx: master
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1008/sshd
tcp6       0      0 :::3306                 :::*                    LISTEN      1473/mysqld
tcp6       0      0 :::8080                 :::*                    LISTEN      12503/httpd
tcp6       0      0 :::80                   :::*                    LISTEN      16372/nginx: master
tcp6       0      0 :::22                   :::*                    LISTEN      1008/sshd

Conclusion

The above are the basic tools commonly used for Linux network troubleshooting; I hope it will be helpful to you.

Unicorn Network Threat Analyzer

Linux Network Troubleshooting Commands [Full Guide]

TCP/IP Model of Linux Command Line

Linux Network Troubleshooting: Physical Layer

Linux Network Troubleshooting: Data Link Layer

Linux Network Troubleshooting: Network Layer

Linux Network Troubleshooting: Transport Layer

Conclusion

Real-time,Accuracy and Efficiency

Products

Quick Links

Company

Unicorn Network Threat Analyzer

Linux Network Troubleshooting Commands [Full Guide]

TCP/IP Model of Linux Command Line

Linux Network Troubleshooting: Physical Layer

Linux Network Troubleshooting: Data Link Layer

Linux Network Troubleshooting: Network Layer

Linux Network Troubleshooting: Transport Layer

Conclusion

Related posts:

Real-time,Accuracy and Efficiency

Products

Quick Links

Company