The Des Moines metro is committed to creating an environment that…
It’s been a long weary day and the new server finally has Linux installed. You watch with childlike anticipation waiting for the login prompt to appear as it boots. Your senses are heightened and you’re keenly aware of the scent that only new hardware has, when you finally see the login. Your fingers shake as you quickly type in ‘root’ and the password. As the command prompt finally appears on the screen your mind scrambles over every step you took to get to this point. Did I configure it right? Is DHCP setup correctly? The melancholic “showdown” harmonica music is ringing through your head as you type ‘ping server2′ As your finger presses the ‘enter’ key you see the commands ‘No route to host’. The next logical step is to call your spouse and tell them you won’t be home for supper. Or is it?
As a Senior Infrastructure Engineer at a consulting company with 35+ years of experience, we are well versed in navigating the IT space. This article offers tricks and commands to quickly debug some of the most common networking issues.
Getting Started
Before we get started- the commands we will be going over are normally part of a stock install or can be easily added with a ‘yum’, ‘dnf’, or ‘apt-get’ command and we will be using them to run tests at varied communication layers to gather information on the issue. This article will be most helpful if:
- You’re familiar with Linux command line (bash, etc ).
- You have Basic knowledge of networking.
- You made sure the network cable is plugged in.
- You made sure the network cable is plugged into the correct port.
- You made sure the “network” cable really is a network cable.
- Both computers are on the same network so no routing is required.
- You’re trying to SSH ( port 22 ) to the other server.
I’ll also make some assumptions that these tools are installed on your Linux installation:
- ip – show / manipulate routing, devices, policy routing and tunnels
- ss – utility to investigate sockets
- ping – send ICMP ECHO_REQUEST to network hosts
- dig – DNS lookup utility
- nc – ( AKA: NetCat or NCat ) feature-packed networking utility
Network Communication Overview
Now that we are well equipped to get started, let’s dive into a network communication overview. It’s considered “common knowledge” that network traffic communication is done via IP addresses. Although the IP plays an important role it also has some similarities to the idea that there’s no gravity in orbit, it serves to simplify the explanation of observations however it’s technically wrong. All network interface cards (NIC’s) are coded with a unique media access control (MAC) address at the factory and the communication between computers is actually done via MAC address. The IP address is used to route the traffic to the correct network where an address resolution protocol (ARP) is used to find the MAC address, and further communication is done via MAC. This information can be useful in debugging network issues.
If you really want to wallow in the weeds then you will want to check out the OSI model showing the 7 layers of the communication model. For what it’s worth the IP address lives on layer 3 (Network) and the ARP is on layer 2 (Data Link). It’s also handy to know the difference between an IP error (eg. Connection Refused) and an ARP error (eg: No Route To Host).
Test Environment Setup
For the purpose of testing I have two servers setup, one server will be online while the other is shutdown. Ill also toss in a non-existent server too. This will allow me to show how the debugging commands behave based on both successful and unsuccessful network communication. Here is the environment:
- server1 – online and networked
- server2 – offline
- server3 – doesn’t exist
It’s usually a good idea to start with this step although most people start with the ping tests. I’m sure there are some folks who will argue with the most efficient order of doing things but it would have more to do with a passion for IT instead of the actual problem solving.
[root@ol7u5 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:1d:3d:e6 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3 valid_lft 64402sec preferred_lft 64402sec inet6 fe80::aebc:36a9:a533:48d/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:c1:a8:d1 brd ff:ff:ff:ff:ff:ff inet 192.168.56.103/24 brd 192.168.56.255 scope global noprefixroute dynamic enp0s8 valid_lft 441sec preferred_lft 441sec inet6 fe80::c609:8734:a26:d5cc/64 scope link noprefixroute valid_lft forever preferred_lft forever |
Here we can verify our IP address is 192.168.56.103 on device enp0s8. The device will be needed for the arping test later on. If you don’t see your expected IP address then your NIC isn’t configured correctly which is beyond the scope of this blog.
Simple ping (icmp) test
This test is a low level test of connectivity and usually the first thing most people will try when testing a network connection. It will send a simple icmp packet out and wait for a reply from the destination server and, with any luck, will receive what it’s waiting for. The icmp packet is routable however the caveat to icmp ping is that firewalls can block it so if you’re on the same network as the destination you may want to choose Simple ping (arp) test: instead.
[root@ol7u5 ~]# ping server1
PING server1 (192.168.56.108) 56(84) bytes of data. 64 bytes from server1 (192.168.56.108): icmp_seq=1 ttl=64 time=0.177 ms 64 bytes from server1 (192.168.56.108): icmp_seq=2 ttl=64 time=0.419 ms 64 bytes from server1 (192.168.56.108): icmp_seq=3 ttl=64 time=0.484 ms ^C — server1 ping statistics — 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.177/0.360/0.484/0.132 ms |
As you can see, the output line shows a complete round trip time of an icmp packet. For this line to appear server1 received our icmp request and responded with a reply icmp packet.
[root@ol7u5 ~]# ping server2
PING server2 (192.168.56.109) 56(84) bytes of data. From ol7u5.localdomain (192.168.56.103) icmp_seq=1 Destination Host Unreachable From ol7u5.localdomain (192.168.56.103) icmp_seq=2 Destination Host Unreachable From ol7u5.localdomain (192.168.56.103) icmp_seq=3 Destination Host Unreachable From ol7u5.localdomain (192.168.56.103) icmp_seq=4 Destination Host Unreachable ^C — server2 ping statistics — 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3000ms pipe 4 |
The “Destination Host Unreachable” error occurs when an arp request goes unanswered, which tells us that there are 4 possibilities, here is a list based on the most probable cause to least probable cause:
- The server ‘server2’ is offline or disconnected from the network
- The DNS has the incorrect IP address for ‘server2’
- The server ‘server2’ doesn’t exist
- The routing is incorrectly configured ( beyond the scope of this blog )
[root@ol7u5 ~]# ping server3
ping: server3: Name or service not known |
This error is pretty easy, it has 2 causes:
- DNS doesn’t have a name resolution for this ( see DNS Testing: )
- DNS name resolution on your server isn’t configured correctly
Simple ping (arp) test
This test is the most basic test of connectivity however you must be on the same network as the destination as the arp operates below the network layer so its not routable. The good news is since it’s at a lower layer than the network you won’t have to worry about a firewall blocking it. What this does is sends an ARP request asking for information on the server IP address then the server monitoring that IP address will respond. It’s like yelling in a crowded room for your friend Jim, you need to be in the same room as Jim to get a reply but you don’t have to know exactly where in the room Jim is. You just yell and he responds back.
[root@ol7u5 ~]# arping -f -I enp0s8 server1
ARPING 192.168.56.108 from 192.168.56.103 enp0s8 Unicast reply from 192.168.56.108 [08:00:27:39:F6:FD] 0.799ms Sent 1 probes (1 broadcast(s)) Received 1 response(s) |
As you can see in this example server2 replied with its MAC address ( 08:00:27:39:F6:FD ).
DNS Testing
If you’re seeing one of the DNS failures then you’ll want to narrow down the issue a bit more. First let’s examine the DNS thats configured and what is returned
[root@ol7u5 ~]# dig server3
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> server3 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57737 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4000 ;; QUESTION SECTION: ;server3. IN A ;; Query time: 1 msec ;; SERVER: 192.168.10.180#53(192.168.10.180) ;; WHEN: Tue Mar 16 11:55:30 CDT 2021 ;; MSG SIZE rcvd: 36 |
The first thing you’ll notice at the end of the output is the server that ‘dig’ is using to resolve the server name, in this case it’s 192.168.10.180 . If the output doesn’t show it’s able to connect then the DNS configuration will need to be debugged and corrected, unfortunately that’s beyond the scope of this blog.
The second thing you’ll notice is that there is a ‘QUESTION SECTION’ but no ‘ANSWER SECTION’. What does this mean? It means that there isn’t a DNS entry for the server we’re requesting. If there was a DNS entry found then you would also see the following:
[root@ol7u5 ~]# dig server2
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> server3 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57737 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4000 ;; QUESTION SECTION: ;server3. IN A ;; ANSWER SECTION: server2. 3600 IN A 192.168.56.109 ;; Query time: 1 msec ;; SERVER: 192.168.10.180#53(192.168.10.180) ;; WHEN: Tue Mar 16 11:56:53 CDT 2021 ;; MSG SIZE rcvd: 36 |
Here is an example of dig output where a DNS record was found, which is what you want to see.
Listening Check
After going through the usual preliminary basics we should check and make sure the target server is listening on the port. In our example we’re trying to connect via ssh from server1 to server2 so let’s make sure server2 is listening on the ssh port ( port 22 ) using the ‘ss’ command.
[root@server2 ~]# ss -tlnp | grep “:22”
LISTEN 0 128 *:22 *:* users:((“sshd”,pid=974,fd=3)) LISTEN 0 128 [::]:22 [::]:* users:((“sshd”,pid=974,fd=4)) |
Here we see that port “*:22” is listening via the “sshd” process on all ipv4 network addresses. This is what we want to see so now we know that it is available.
Although debugging a firewall configuration issue is way beyond the scope of this blog we can still check the firewall status.
[root@server2 ~]# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) |
Since the firewall is ‘inactive’ then it’s not going to be an issue.
[root@server2 ~]# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: active (running) since Thu 2021-03-18 11:20:13 EDT; 18s ago Docs: man:firewalld(1) Main PID: 3010 (firewalld) Tasks: 2 CGroup: /system.slice/firewalld.service └─3010 /usr/bin/python2 -Es /usr/sbin/firewalld –nofork –nopid |
If you see that the firewall is ‘active’ then you can consider stopping it while debugging the network connection. This will simplify the debugging however it also opens up that server to the network. That may not be a big deal if the server is internal, however I highly recommend against stopping the firewall if the server has a public facing network interface.
Simple Connection Test
In this test we will actually try to establish a tcp connection with the server/port.
[root@ol7u5 ~]# nc -zv server1 22
Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 192.168.56.108:22. Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds. |
Well, this is certainly good news! This shows that a tcp connection to server2 port 22 was successful. Unfortunately it doesn’t always work this way, so let’s take a peek at some errors you may also see and what they mean.
[root@ol7u5 ~]# nc -zv server1 22
Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connection refused. |
Although this seems serious it’s also easy to identify and correct. A “Connection refused” is an active error response from the target server (server1) that basically says “I’m not listening on that port”. You’ll want to reference Are its ears on?
[root@ol7u5 ~]# nc -zv server2 22
Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connection timed out. |
The “Connection timed out” is a bit more ominous and there are a couple possible causes:
- the server is shutdown or not on the network
- a firewall is blocking traffic
Resolve a Unique Issue
While we have covered a multitude of problems and techniques in debugging network issues, I realize most issues are unique. No matter your issue or where you are in your IT project, I am confident Zirous can help. No matter where you are in your process, we offer technical guidance for unique projects. Zirous experts have 35+ years of experience navigating the ever-changing technological landscape. If you are looking for a partner that cares about your immediate needs as well as your long-term success, contact us.
This Post Has 0 Comments