The article describes the principles and methods for diagnosing network problems.
To follow the instructions in this article, connect to the server to be diagnosed via SSH.
Types of network configuration and how they work
The diagnostic procedure depends on the type of network configuration.
Switching
The physical interface on the host server and the virtual interfaces of the VMs are connected to the same bridge. There is no routing within the server node. For packets to VMs arriving on a physical interface, the host server OS determines the target virtual interface using an ARP request.
Routing
Virtual interfaces of the VMs are connected to the bridge. Packet forwarding between the physical interface and the bridge is done by having routes in the server node OS:
- route to the subnet from which IP addresses are issued to VMs;
- host routes to specific IP addresses.
Then the packet in the bridge is sent to the virtual interface.
IP fabric
There is no bridge on the node server. Packet forwarding between physical and virtual interfaces is done by having host routes to specific IP addresses in the host server OS. Routes are kept up to date by the FRR service.
Toolkit
This section contains a list of utilities needed to diagnose the network availability of VMs.
Linux
Diagnostics utilities:
- arping - to detect hosts on a computer network;
- bridge - to manage bridge interfaces;
- curl - to create network requests for different data protocols: FTP, FTPS, HTTP, HTTPS, TFTP etc.;
- dig - to check the DNS records of a domain;
- ip - to check and configure network interfaces;
- iperf3 - to measure the bandwidth of the network;
- mtr - to check the network connection;
- nc - to send and receive data via TCP and UDP;
- nmap - to explore network and perform security checks;
- ping - to check if the remote host is available;
- ss - to analyze the network connections of the system;
- tcpdump - to analyze network traffic;
- tracepath and traceroute - to display possible packet routes.
To get detailed information about the features, syntax and keys of the Linux utility, run the command:
man <utility>
If the required utility is not available on the server to be diagnosed, install the appropriate package using the commands:
yum install <packet_name>
dnf install <packet_name>
apt install <packet_name>
Windows
Diagnostics utilities:
- ipconfig - to manage network interfaces;
- netsh - to display or edit the network configuration;
- netstat - to display information about the system's UDP and TCP connections;
- nslookup - to run DNS queries;
- ping - to check if the remote host is available;
- route - to view, delete and add static routes to the system routing table;
- tracert - for network diagnostics.
Preliminary diagnostics
To rule out underlying problems, do a preliminary diagnosis:
- Determine the type of network configuration in the cluster. To do this, in the web interface of the platform, enter Clusters → Network type.
-
Check the availability of the node server:
docker exec -it vm_box bash
vmssh -p <SSH_port> <IP_address_node>
The node server must be accessible via SSH from the platform server from the vm_box container. If the connection fails, follow the recommendations in the If the cluster node is unavailable article.
- Make sure that the server configuration matches the selected type of network configuration.
- switching:
-
Check for a bridge:
ip address show
-
Check the network interfaces connected to the bridge:
bridge link show
-
- routing:
-
Check for a bridge:
ip address show
-
Check the network interfaces connected to the bridge:
bridge link show
-
Check for routes:
to show IPv4 routesip route show
to show IPv6 routesip -6 route show
-
- IP fabric:
-
Check for host routes:
to show IPv4 routesip route show
to show IPv6 routesip -6 route show
-
- switching:
Match the configuration to the selected type of network configuration. In complicated cases (for example, if there is no required bridge), the most reliable way is to delete the node from the cluster, reinstall the OS on the server and reconnect the node. If this is not possible (the node is already in operation), adjust the settings in the configuration files:
- /etc/sysconfig/networks-scripts для CentOS 7, AlmaLinux 8;
- /etc/network/interfaces для Ubuntu 20.04.
Diagnosing network problems on the host server
- Determine the type of the network problem: guest OS does not respond to ping command, any of TCP/UDP ports is unavailable, etc. The choice of utilities for troubleshooting depends on your network equipment settings and hosting provider's limitations. For example, if the ICMP protocol is completely blocked, the ping utility will be useless for diagnostics.
-
Run the tcpdump utility on the physical interface of the node server:
tcpdump -i <interface_name> -enn -vvv host <VM_IP_address>
-
Send packets to the problematic VM, depending on the type of problem identified:
-
If you are checking via ICMP:
ping <VM_IP_address>
-
if you are checking the availability of the IPv4 TCP port:
nmap -Pn -sS <IPv4_IP_address> -p <port_TCP>
-
if you are checking the availability of the IPv4 UDP port:
nmap -Pn -sU <IPv4_IP_address> -p <port_TCP>
-
if you are checking the availability of the IPv6 TCP port:
nmap -6 -Pn -sS <IPv6_IP_address> -p <port_TCP>
-
if you are checking the availability of the IPv6 UDP port:
nmap -6 -Pn -sU <IPv6_IP_address> -p <port_TCP>
-
If tcpdump on the physical interface does not show sent packets, check for ARP requests (IPv4 traffic) or NDP requests (IPv6 traffic) to VMs:
tcpdump -i <interface_name> -enn -vvv arp | grep '<IPv4_IP_address>'
tcpdump -i <interface_name> -enn -vvv icmp6 | grep '<IPv6_IP_address>' | grep -P 'advertisement|solicitaton'
As a result, two options are possible:
- there are no ARP/NDP requests on the physical interface. This means that the problem is not related to the operation of the node. Consult your network engineer to solve the problem;
- there are ARP/NDP requests without responses on the physical interface. In this case, check that the requests are forwarded to the virtual interface of the VM:
-
Get the name of the virtual interface:
virsh domiflist <VM_libvirt-domain_ID_or_name>
-
Check the queries with the commands:
ARPtcpdump -i <interface_name> -enn -vvv arp | grep '<IPv4_IP_address>'
NDPtcpdump -i <interface_name> -enn -vvv icmp6 | grep '<IPv6_IP_address>' | grep -P 'advertisement|solicitaton'
If ARP/NDP requests are also displayed on the virtual interface without responses, further diagnostics is performed in the guest OS.
Diagnosing network problems in the guest operating system
The principles of diagnosing network problems in the guest OS are the same as on the node server. For diagnostics:
- Check IPv4 or IPv6 settings in the guest OS against those specified for the problematic VM in the platform. Run the commands:
- Linux:
-
to display the network interfaces:
ip address show
-
to display the IPv4 routes list:
ip route show
-
to display the IPv6 routes list:
ip -6 route show
-
- Windows:
-
to display the network interfaces list:
ipconfig /all
-
to display the routes list:
route print
-
- Linux:
-
On the nodeserver, run the tcpdump utility on the virtual interface of the problem VM:
-
Get the name of the virtual interface:
virsh domiflist <VM_libvirt-domain_ID_or_name>
-
Run the command:
tcpdump -i <interface_name> -enn -vvv
-
- Send ICMP (ping utility) or TCP/UDP (nmap utility) packets to an external address. The response options are as follows:
- the sent packets are not visible on the virtual interface of the problematic VM. In this case, the problem is most likely in the firewall settings in the guest OS;
- the sent packets are visible on the virtual interface of the problematic VM. In this case, run tcpdump on the physical interface of the node server. If packets are visible there as well, the problem is somewhere outside the node. Consult a network engineer to solve the problem;
- packets are lost between the virtual and physical interface on the node server. In this case, check again the network and firewall settings on the node.