VMmanager Knowledge Base

Basic diagnostics

This article contains commands to identify causes of incorrect operation of the platform, as well as commands to restart containers and some services to restore their work.

Some commands may require superuser privileges.

General diagnostics

The section contains a list of commands that you can run as a first diagnostic step. These commands will eliminate basic problems and reduce troubleshooting time.

Operating system (OS) version

If the master or server OS for the node is not supported by the platform, the installation or connection will end with an error. To determine the OS version, run the command:

cat /etc/*release

See VMmanager documentation for a list of supported operating systems:

Server date and time

During periodic synchronization with the license server, the date and time are checked. If an incorrect date or time is set on the server with the platform, the platform will be blocked or its operation will be incorrect. To determine the current date and time on the server, run the command:

date -R

Disk space and RAM usage

For the correct operation of the platform, free disk space and RAM must meet the requirements specified in the Server requirements  article in the VMmanager documentation. In addition, if there is not enough free space or RAM, virtual machines and backups will not be created. To check the amount of used disk space and file system data, run the command:

df -hT

To check the information about RAM, run the command:

free -h

Inodes

Inodes is the structure in which file metadata is stored. The platform will not work correctly if the server is out of inodes, even if there is free disk space. Typical behavior when there is a shortage of inodes includes reduced performance, inability to create files, incorrect output of information in the platform interface. To check the number and proportion of inodes used in the filesystem, run the command:


df -i

Troubleshooting Linux systems

Examine the system logs to troubleshoot your Linux system. Below are tools for troubleshooting and searching for errors in the server logs. For more information about the product logs, see the VMmanager documentation Platform Logs article.

Circular kernel buffer

One way to find out if the system is working incorrectly is to look at the kernel log using the dmesg utility.  The kernel records all events in a circular buffer while the system is booting and running. dmesg will allow you to examine the kernel messages and identify hardware-related problems. To search for problems, run this command:

dmesg | grep -i -E 'error|failed|critical|bug|panic'

The journalctl utility

You can use the journalctl utility to analyze the logs and detect system problems. The utility displays the logs of the Linux system services. To detect abnormal behavior of a Linux system, run the command:

journalctl | grep -i -E 'error|failed|critical|bug|panic'

CPU

CPU architecture

Reduced performance of a platform node may be due to the technical characteristics of the CPU. In addition, information about the CPU architecture can be useful in diagnosing fine-tuning problems with virtual machines. To display the information, run the command:

lscpu

CPU count on nodes

VMmanager 6 licensing only takes into account physical cores. To specify the exact CPU value when ordering a license, count the number of cores on the nodes with the command:

dmidecode --type processor | grep -i "core count" | grep -Eo "[0-9]+?"

CPU count is also necessary when platform control is blocked due to the CPU cores number on node exceeds license limit error. The error occurs if the number of physical cores in the connected node exceeds the limit. To check if the limit is exceeded, verify the output of the dmidecode command against the VMmanager license settings.

System load

If there is an increased load on the system, the performance of the nodes will decrease. With the command for CPU count, you can determine the load on the system. To do this, compare the number of physical cores with the Load Average parameter. Run the command:

uptime

The Load Average parameter value must be less than the number of cores obtained by the command for CPU count.

Virtual machines (VMs)

VM status

You can use the virsh utility to display the status of all virtual machines for troubleshooting.

To execute virsh commands, first connect to the node:

docker exec --tty --interactive vm_box ssh -i /opt/ispsystem/vm/etc/.ssh/vmmgr.1 <IP_address> -p 22
Comment

To display the status of all VMs, run the command:

virsh list --all

To display the status of a particular VM, run the command:

virsh list --all | grep <название ВМ>

The libvirt virtualization daemon

Libvirt is a toolkit for virtualization management. Without Libvirt (libvirtd) running, the platform will not work correctly. Check the status of the service with the command:

systemctl status libvirtd

If the service is stopped or inactive, start it:

systemctl start libvirtd

If libvirt is not installed, the output of the libvirtd systemctl status command will contain a message:

Unit libvirtd.service could not be found

In this case:

  1.  Install libvirt manually with an OS-dependent command:

    For RHEL-based operating systems (CentOS, AlmaLinux)
    yum install libvirt
    For Deb-based operating systems (Ubuntu)
    apt install libvirt
  2. Start the service:

    systemctl start libvirtd
  3. Add libvirtd to the autostart:

    systemctl enable libvirtd
  4. Re-check the status of the service to make sure that it is running.

Containers

The docker service

The Docker daemon is a service that manages containers as well as other docker entities: networks, storage and images. If this service is not running, the platform will not work. To check the status of docker, run the command:

systemctl status docker

 If the service is stopped, start it with the command:

systemctl start docker

 To check the version of docker, run the command:

docker version

Restarting the docker service

If the docker service is not working properly, restarting the service helps fix it. To do this, run the command:

systemctl restart docker.service

Перезапуск службы помогает исправить ряд ошибок, которые могут возникнуть при запуске, перезапуске или выключении платформы:

  • error while removing network: network <network_name> has active endpoints

    Пример ошибки
    error while removing network: network vm_vm_box_net id 88888ggggg has active endpoints
    exit status 1 
  • ERROR: for <service_name> Cannot start service <service_name>: endpoint with name <container_name> already exists in network <network_name> 

    Пример ошибки
    ERROR: for auth_back Cannot start service auth_back: endpoint with name vm_auth_back_1 already exists in network vm_vm_box_net

    In the above example, the vm_auth_back_1 container failed to start.

  • ERROR: for input Cannot start service input: driver failed programming external connectivity on endpoint vm_input_1

To correct the above errors, restart the docker service with the command above.

If the problem could not be resolved, contact technical support through your client area under SupportSupport ticketsAdd.

Status of containers

To diagnose possible problems, display a list of containers and their statuses. To display a list of all running containers, run the command:

docker ps

To get a list of all containers, including stopped ones, run the command:

docker ps -a

If you want to check the status of a specific container, run the command:

docker ps | grep <container_name>

Restarting the container

If the container does not work correctly, restarting it may help. To do this, run the command:

docker restart <container_name>

Restarting taskmgr

If the Task Manager does not work correctly, for example, there are frozen tasks, restarting the taskmgr container may help. To do this, run the command:

docker exec -it vm_box supervisorctl restart taskmgr

Restarting monitor

You may need to restart the monitoring service if no statistics are displayed on the nodes. To do this, run the command:

docker exec -it vm_box supervisorctl restart monitor

Logging

To analyze a container's events, examine its log. To display the last 100 lines of the container log, run the command:

docker logs --tail 100 <container_name>

Firewall

If there are no rules for the docker service in the firewall, there may be problems with the platform and network. The necessary rules are created automatically when you start the docker service, we do not recommend that you modify or delete them manually.

Service status and configuration

To check the status of the firewall service, run the command depending on the OS:

For Ubuntu, Astra Linux
systemctl status nftables
For CentOS, AlmaLinux
systemctl status firewalld

To display the service configuration, run the command depending on the operating system:

For Ubuntu
nft list ruleset
For CentOS, AlmaLinux
firewall-cmd --list-ports

Restarting the service

Restarting the service is necessary if it does not work correctly, as well as to restore the default rules. To restart the service, run the command:

For CentOS, AlmaLinux
systemctl restart firewalld.service
For Ubuntu
systemctl restart nftables.service

To restore the default rules:

  1. Restart the firewall service with one of the commands presented above.
  2. Restart docker with the command:

    systemctl restart docker.service
  3. Restart the platform with the command from the section Restarting the platform of this article.

Searching for information in the database

There are potential risks involved in tampering with the DB. We do not recommend making manual edits to the database, as it can disrupt the correct operation of the platform.

Any actions with the database should be performed only after backing up the platform. 

Using queries to the database, you can see information about the state of VMs, nodes and other platform objects. Below is a list of queries for retrieving data from the database.

To run queries, connect to the MySQL container:

docker exec -it mysql bash -c "mysql isp -p\$MYSQL_ROOT_PASSWORD"

VM info

Information about the VM will display all its status parameters, internal name, and node data.

To get information about the virtual machine, run the query:

select * from vm_host where id=<id_vm>\G;
Comment

To display information about the node and the internal VM name, run the query:

select id,internal_name,node from vm_host where id=<id_vm>\G;
Comment

Information about the node

Information about the node will display the selected node parameters.

To check the information about the node, run the query:

select id,name,ip_addr,ssh_port from vm_node where id=<id_node>;
Comment

To check the network on the node, run the query:

select * from vm_node_interfaces where node=<id_node> \G;
Comment

VM virtual disks

Information about the virtual disk will help to diagnose problems associated with it. For example, the virt-resize: error, which can occur if the value of expand_part (partition to expand) and the size of the virtual disk in the database is incorrect.

To view full information about the disk, run this query:

select * from vm_disk where name = 'example_name' \G;
Comment

To check the actual disk size, run the command on the node:

virsh domblkinfo --human 1111_example_name vda
Comment

Backups

By querying the database, you can check the backup schedule to identify possible problems. To get the information, run the query:

select * from vm_schedule;