How to avoid VM disk corruption in Ceph v.16 (Pacific) or higher?

Description

The instructions in this article are relevant if in platform version 2024.10.2 or lower:

you have connected a storage Ceph v.16 (Pacific) or higher to the cluster;
you have connected a Ceph v.15 (Octopus) or lower storage to the cluster and plan to upgrade it to Ceph v.16 (Pacific) or higher.

When using one of these configurations, there is a risk of virtual machine (VM) disks becoming corrupted after the VMmanager cluster node reboots. This is due to the renaming of the Ceph blacklist parameter to blocklist starting from version 16 (Pacific). This article provides instructions on how to avoid VM disk corruption.

Diagnostics

Connect to the Ceph cluster node with MDS role via SSH. For more information about connecting via SSH, see Workstation setup.
Check the version of Ceph you are using:
```
ceph version
```

Check if the blocklist parameter is used in the Ceph configuration:

ceph auth get <ceph_user>

Comments to the command

Response example

[ceph_user]
    key = secret==
    caps mds = "allow rw"
    caps mon = "allow command \"osd blacklist\", allow r"
    caps osd = " allow rwx pool=libvirt-pool-1,allow class-read object_prefix rbd_children,allow rw pool=cephfs.test_vol.data"

If the response does not contain the osd blocklist text , there is a risk of VM corruption.

Solution

Connect to the Ceph cluster node with MDS role via SSH. For more information about connecting via SSH, see Workstation setup.
If you plan to upgrade Ceph, follow the instructions in the official Ceph documentation.

Execute the command:

ceph auth caps <ceph_user> mon 'allow r, allow command "osd blocklist"' osd 'allow class-read object_prefix rbd_children, allow rwx pool=<ceph_pool>'

Comments to the command

Response example

[ceph_user]
    key = secret==
    caps mds = "allow rw"
    caps mon = "allow command \"osd blocklist\", allow r"
    caps osd = " allow rwx pool=libvirt-pool-1,allow class-read object_prefix rbd_children,allow rw pool=cephfs.test_vol.data"

If the response contains the text osd blocklist, then the problem is fixed.

The article was last updated on 11.08.2024. The article was prepared by technical writers of ISPsystem