「PVE」- 超融合 Ceph 集群,Hyper-Converged Ceph Cluster

服务部署

Deploy Hyper-Converged Ceph Cluster – Proxmox VE

通过命令行安装 Ceph 存储:

apt-get install --download-only ceph ceph-base ceph-mds ceph-mgr ceph-mgr-modules-core ceph-mon ceph-osd ceph-volume cryptsetup-bin libdouble-conversion3 libfmt7 libparted2 libpcre2-16-0 libqt5core5a libqt5dbus5 libqt5network5 libsqlite3-mod-ceph libthrift-0.13.0 nvme-cli parted python-pastedeploy-tpl python3-bcrypt python3-bs4 python3-cffi-backend python3-cherrypy3 python3-cryptography python3-dateutil python3-distutils python3-lib2to3 python3-logutils python3-mako python3-markupsafe python3-natsort python3-openssl python3-paste python3-pastedeploy python3-pecan python3-simplegeneric python3-singledispatch python3-soupsieve python3-tempita python3-waitress python3-webob python3-webtest python3-werkzeug shared-mime-info sudo uuid-runtime

# pveceph install

创建 OSD 实例

需要选择节点,然后再添加磁盘;

删簇 OSD 实例

Proxmox Ceph remove OSD – How to do it via Proxmox VE GUI and CLI?

通过图形化界面删除:
1)Firstly, we select the Proxmox VE node in the tree.
2)Next, we go to Ceph >> OSD panel. Then we select the OSD to remove. And click the OUT button.
3)When the status is OUT, we click the STOP button. This changes the status from up to down.
4)Finally, we select the More drop-down and click Destroy.

通过命令行删除:
1)ceph osd out {osd-num}
2)查看 OSD 数据迁移状态:ceph -w
3)systemctl stop ceph-osd@{osd-num}
4)ceph osd crush remove {name}
5)ceph auth del osd.{osd-num}
6)ceph osd rm {osd-num}

清理为删除的 Monitor 实例

[SOLVED] – CEPH how to delete dead monitor? | Proxmox Support Forum

如果为清理 Ceph 而直接删除节点,则会导致 Monitor 状态异常;

删除未使用的 Monitor 实例:

# ceph -s
...

# ceph mon remove xxxxxxx

问题:虽然 Monitor 已删,但是仍旧在 GUI 中显示;

[SOLVED] – Ghost monitor in CEPH cluster | Page 2 | Proxmox Support Forum
linux – How to remove systemd services – Super User
Re-adding Ceph Node | Proxmox Support Forum
[SOLVED] – CEPH how to delete dead monitor? | Proxmox Support Forum
Reinstall/remove dead monitor | Proxmox Support Forum

systemctl disable ceph-mon@<name-of-mon>.service
rm -rf /etc/systemd/system/ceph-mon.target.wants/ceph-mon@<name-of-mon>.service

rm -rf /var/lib/ceph/mon/<name of monitor>

vim /etc/pve/ceph.conf
...(1)删除对应的 Monitor 条目;
...(2)删除 [global].mon_host 中对应地址;

问题:创建 OSD 失败:… unable to find a keyring … entity osd.0 exists but key does not match …

[SOLVED] – PVE7 unable to create OSD | Proxmox Support Forum
entity osd.5 exists but key does not match | Proxmox Support Forum

在添加 OSD 设备时,提示如下错误:

create OSD on /dev/sdi (bluestore)
wiping block device /dev/sdi
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.464955 s, 451 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 435258e3-31c8-4fc7-958e-3d37f881e388
 stderr: 2022-10-13T01:37:26.795+0800 7f9276af5700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2022-10-13T01:37:26.795+0800 7f9276af5700 -1 AuthRegistry(0x7f927005b868) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: Error EEXIST: entity osd.0 exists but key does not match
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid fc27b60a-41ba-4540-987e-0cd0f7d1b1f4 --data /dev/sdi' failed: exit code 1

按照 [SOLVED] – PVE7 unable to create OSD | Proxmox Support Forum 提示,进行修复:

cp /etc/pve/priv/ceph.client.bootstrap-osd.keyring /etc/pve/priv/ceph.client.bootstrap-osd.keyring.backup # 该文件应该是不存在的
ceph auth get client.bootstrap-osd > /etc/pve/priv/ceph.client.bootstrap-osd.keyring

但是,再次添加 OSD 设备,提示如下错误:

create OSD on /dev/sdi (bluestore)
wiping block device /dev/sdi
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.47554 s, 441 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new a2f8be83-dfc5-4843-a984-a511f815f96b
 stderr: Error EEXIST: entity osd.0 exists but key does not match
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid fc27b60a-41ba-4540-987e-0cd0f7d1b1f4 --data /dev/sdi' failed: exit code 1

按照 entity osd.5 exists but key does not match | Proxmox Support Forum 提示,我们发现是操作不规范:旧的 OSD 没有正确删除;

解决方案:ceph auth del osd.0

服务升级

参考 Proxmox VE / Ceph Pacific to Quincy 文档,以实现 Pacific 到 Quincy 的升级;