「ROOK-CEPH」- 常见问题处理

问题描述

该笔记将记录：与 Rook-Ceph 有关的问题，以及常见问题的解决办法；

解决方案

常见问题，参考 Rook Ceph Documentation/Troubleshooting 文档；

[SOLVED] …/globalmount: permission denied

cephfs mount failure.permission denied · Issue #9782 · rook/rook

问题描述：

# ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-9b3e9500-1903-41dd-8abc-4052b74d450b
ls: cannot access 'globalmount': Permission denied
total 4
d????????? ? ?    ?      ?            ? globalmount
-rw-r--r-- 1 root root 138 Mar  1 09:44 vol_data.json

解决方案：

# umount -lf /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-9b3e9500-1903-41dd-8abc-4052b74d450b/globalmount

[SOLVED] OSD Init Container 启动失败

ceph: failed to initialize OSD · Issue #8023 · rook/rook · GitHub
Cluster unavailable after node reboot, symlink already exist · Issue #10860 · rook/rook · GitHub

问题描述

在 Rook Ceph 中，当节点重启后，OSD-<ID> Pod 的 Init Container 无法正常启动，提示如下错误：

# kubectl logs rook-ceph-osd-5-7f759955bc-9bqt4 -c activate 
...
Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --dev /dev/sdb --path /var/lib/ceph/osd/ceph-5 --no-mon-config
Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-5/block
 stderr: ln: failed to create symbolic link '/var/lib/ceph/osd/ceph-5/block': File exists
Traceback (most recent call last):
  File "/usr/sbin/ceph-volume", line 11, in <module>
    load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
...

解决方案

查看 activate 所挂载的 activate-osd 存储目录，删除其中的 blcok 文件（其为软链接）。

[SOLVED] cephosd: skipping device “xxx” because it contains a filesystem “ceph_bluestore”

OSD and MON memory consumption · Issue #5811 · rook/rook · GitHub
Ceph Common Issues – Rook Ceph Documentation

问题描述

磁盘无法无法加载成为 OSD，并且提示如下错误信息：

cephosd: skipping device "sdb" because it contains a filesystem "ceph_bluestore"

原因分析

通过对 rook-ceph-osd-prepare-xxx Pod 日志的观察，我们发现 sdb 磁盘已经成为 ceph_bluestore，即 ceph 已经进行处理；
然后在进一步观察时我们发现，rook-ceph-osd-prepare-xxx，在执行的过程中出现 OOMKilled 错误信息；

解决方案

1）修改 helm charts 里的 limit 限制，增加大 10 倍资源，而 request 保留不动；
2）然后，参照 Cleanup 文档，对磁盘进行重置；
3）最后，重新启动 Operator 服务，以探测磁盘：kubectl -n rook-ceph delete pod -l app=rook-ceph-operator

[SOLVED] mon q is low on available space

Rookio Ceph cluster : mon c is low on available space message

This alert is for your monitor disk space that is stored normally in /var/lib/ceph/mon.This warn is raised when this path has less than 30% available space (see mon_data_avail_warn which is 30 by default).

[SOLVED] MountVolume.MountDevice failed for volume … Volume ID … already exists

MountVolume.MountDevice failed for volume “pvc“ …问题解决_-小末的博客-CSDN 博客
 MountDevice failed for volume pvc-f631… An operation with the given Volume ID already exists

问题描述

# kubectl describe pods xxxx
...
MountVolume.MountDevice failed for volume "pvc-9aad698e-ef82-495b-a1c5-e09d07d0e072" :
rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-
ceph-0000000000000001-89d24230-0571-11ea-a584-ce38896d0bb2 already exists

原因分析

存储插件的 BUG，需要重启相关组件；

# 11/30/2022 在我们的场景中，某个节点存在问题，导致调度到该节点的 Pod 无法正常工作；

解决方案

kubectl delete -n rook-ceph pods -l app=csi-cephfsplugin-provisioner
kubectl delete -n rook-ceph pods -l app=csi-cephfsplugin

# kubectl delete pods -l app=csi-rbdplugin-provisioner
# kubectl delete pods -l app=csi-rbdplugin

[WIP] unable to list block devices from: /dev/mapper

[ceph_volume.util.disk][ERROR ] unable to list block devices from: /dev/mapper

[WIP] timeout expired waiting for volumes to attach or mount for pod “xxxxxxxxx”

Unable to mount volumes for pod "kube-registry-646bc578d9-vwdfd_rook-ceph(4877e2
f4-ea8c-11e9-b6c3-005056814b85)": timeout expired waiting for volumes to attach
or mount for pod "rook-ceph"/"kube-registry-646bc578d9-vwdfd". list of unmounted
volumes=[image-store]. list of unattached volumes=[image-store default-token-dnwrv]

[WIP] [errno 110] error connecting to the cluster

在 Rook-Ceph 中，当执行 ceph status 命令时，命令挂起，在一段时间之后，产生如下错误：

[errno 110] error connecting to the cluster

[WIP] PVC is always Pending

问题描述

通过 ceph-filesystem StorageClass 创建，但 PVC 出于 Pending 状态，无法自动分配并绑定 PV；

原因分析

产生该问题的原因有很多，我们并没有找到具体的原因；

TODO Rook Ceph is Pending

解决方案

# 07/25/2022 根据反馈，是因为时间差导致的集群 unhealthy 而无法正常运行；

Filed under: K4NZDROID - @ 2:23 AM

NOTE

/ 记录问题 / 解决问题 / 技术博客 / 工作笔记 /

Table of Contents

Categories

Recent Posts

Archives

「ROOK-CEPH」- 常见问题处理

问题描述

解决方案

[SOLVED] …/globalmount: permission denied

[SOLVED] OSD Init Container 启动失败

问题描述

解决方案

[SOLVED] cephosd: skipping device “xxx” because it contains a filesystem “ceph_bluestore”

问题描述

原因分析

解决方案

[SOLVED] mon q is low on available space

[SOLVED] MountVolume.MountDevice failed for volume … Volume ID … already exists

问题描述

原因分析

解决方案

[WIP] unable to list block devices from: /dev/mapper

[WIP] timeout expired waiting for volumes to attach or mount for pod “xxxxxxxxx”

[WIP] [errno 110] error connecting to the cluster

[WIP] PVC is always Pending

问题描述

原因分析

解决方案

#ezw_tco-9 .ez-toc-title{ font-size: 120%; font-weight: 500; color: #000; } #ezw_tco-9 .ez-toc-widget-container ul.ez-toc-list li a{ font-size: 120%; font-weight: 500; color: #000; } #ezw_tco-9 .ez-toc-widget-container ul.ez-toc-list li.active{ background-color: #ededed; } Table of Contents

Categories

Recent Posts

Archives

问题描述

解决方案

[SOLVED] …/globalmount: permission denied

[SOLVED] OSD Init Container 启动失败

问题描述

解决方案

[SOLVED] cephosd: skipping device “xxx” because it contains a filesystem “ceph_bluestore”

问题描述

原因分析

解决方案

[SOLVED] mon q is low on available space

[SOLVED] MountVolume.MountDevice failed for volume … Volume ID … already exists

问题描述

原因分析

解决方案

[WIP] unable to list block devices from: /dev/mapper

[WIP] timeout expired waiting for volumes to attach or mount for pod “xxxxxxxxx”

[WIP] [errno 110] error connecting to the cluster

[WIP] PVC is always Pending

问题描述

原因分析

解决方案

Table of Contents