The high-availability feature is based on the following components:
- Cluster node quorum;
- Virtual disk network storage;
- Cluster node accessibility diagnostics;
- Control panel fault-tolerance.
Cluster node quorum
A quorum is a number of nodes required for VMmanager Cloud cluster. The quorum is calculated as follows:
quorum = total number of nodes/2 + 1
A node is considered operational if it is included into quorum. If more than half the nodes are active +1 the cluster has a quorum. If only half the nodes (or fewer) are active the cluster does not have a quorum, and cloud functions will be disabled.
For example, the cluster includes 8 servers and the quorum is built of 5 servers. The server is included into quorum if it can communicate with 4 or more servers. If the server communicates only with 3 nodes it is considered not included into quorum. corolistener checks the status.
Virtual disk network storage
When the cluster node fails, all the virtual machines running on that node will be restored on an available node. To allow for this feature, virtual disks of virtual machines must be located in network storages.
Cluster node accessibility diagnostics
The diagnostics system checks that cluster nodes are accessible and restores virtual machines on available nodes in case of the failure. More information can be found under Cluster node accessibility diagnostics. If a cluster node does not respond for more than 1 second, the system will consider it has failed.
Control panel fault-tolerance
When the cluster node where VMmanager Cloud is installed fails, the control panel will start automatically on one of the running nodes.
VMmanager Cloud fault-tolerance is based on the replication of the VMmanager database and required files.
The following cron job performs replication of VMmanager files:
*/5 * * * * /usr/local/mgr5/sbin/cron-vmmgr sbin/nodereplication -c mgrfiles >/dev/null 2>&1
The system uses the rsync utility for file synchronization. It synchronizes files on all of the cluster nodes.
Standard mysql replication is used for database replication. It is performed on the cluster nodes where "Mysql replication" is enabled (Cluster nodes -> Roles)