「Kubernetes」- 集群证书续期(重新申请证书,Part of the existing bootstrap client certificate is expired)

问题描述

集群无法访问,使用 journalctl -f -u kubelet.service 查看日志,提示如下错误:

11月 24 14:26:59 k8s-master1 kubelet[1768]: I1124 14:26:59.608288    1768 server.go:408] Version: v1.12.1
11月 24 14:26:59 k8s-master1 kubelet[1768]: I1124 14:26:59.608812    1768 plugins.go:99] No cloud provider specified.
11月 24 14:26:59 k8s-master1 kubelet[1768]: E1124 14:26:59.616261    1768 bootstrap.go:205] Part of the existing bootstrap client certificate is expired: 2019-11-23 12:18:53 +0000 UTC

原因分析

在 Kubernetes Cluster 中,当集群初时化时创建的证书一年到期。当到期后,集群的各个组件之间将无法访问,需要重新续期证书才能解决。

# kubeadm  alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Feb 21, 2023 03:37 UTC   335d                                    no      
apiserver                  Feb 21, 2023 03:37 UTC   335d            ca                      no      
apiserver-etcd-client      Feb 21, 2023 03:37 UTC   335d            etcd-ca                 no      
apiserver-kubelet-client   Feb 21, 2023 03:37 UTC   335d            ca                      no      
controller-manager.conf    Feb 21, 2023 03:37 UTC   335d                                    no      
etcd-healthcheck-client    Feb 21, 2023 03:37 UTC   335d            etcd-ca                 no      
etcd-peer                  Feb 21, 2023 03:37 UTC   335d            etcd-ca                 no      
etcd-server                Feb 21, 2023 03:37 UTC   335d            etcd-ca                 no      
front-proxy-client         Feb 21, 2023 03:37 UTC   335d            front-proxy-ca          no      
scheduler.conf             Feb 21, 2023 03:37 UTC   335d                                    no      

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Feb 19, 2032 03:25 UTC   9y              no      
etcd-ca                 Feb 19, 2032 03:25 UTC   9y              no      
front-proxy-ca          Feb 19, 2032 03:25 UTC   9y              no

通过 OpenSSL 查看证书过期情况:

openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '

解决办法

重新生成证书……集群根证书:

/etc/kubernetes/pki/ca.crt

/etc/kubernetes/pki/ca.key

由集群根证书颁发的证书:

由kube-apiserver组件持有的服务端证书:
/etc/kubernetes/pki/apiserver.crt

/etc/kubernetes/pki/apiserver.key

由kubelet组件持有的客户端证书:
/etc/kubernetes/pki/apiserver-kubelet-client.crt

/etc/kubernetes/pki/apiserver-kubelet-client.key

这些证书全部需要升级。

环境信息

操作系统:CentOS Linux release 7.4.1708 (Core)
集群环境:Kubernetes Cluster v1.18.20, Master * 3, Worker * n

备份配置文件(Master)

配置原始配置文件:

rsync -avz --delete /etc/kubernetes/ /etc/kubernetes.backup/
rsync -avz --delete ~/.kube/ ~/.kube.backup/ 

延长证书期限(Master)

# 执行如下命令,以续期全部证书:
kubeadm -v 10 alpha certs renew all

# 在 CentOS 7.4 and kubeadm 1.12.1 下,上述命令产生栈溢出错误,
# 所以只能手动执行升级每个组件的证书
# kubeadm -v 10 alpha phase certs renew apiserver --config /etc/kubernetes/kubeadm-config.yaml
# kubeadm -v 10 alpha phase certs renew apiserver-etcd-client --config /etc/kubernetes/kubeadm-config.yaml
# kubeadm -v 10 alpha phase certs renew apiserver-kubelet-client --config /etc/kubernetes/kubeadm-config.yaml
# kubeadm -v 10 alpha phase certs renew etcd-healthcheck-client --config /etc/kubernetes/kubeadm-config.yaml
# kubeadm -v 10 alpha phase certs renew etcd-peer --config /etc/kubernetes/kubeadm-config.yaml
# kubeadm -v 10 alpha phase certs renew etcd-server  --config /etc/kubernetes/kubeadm-config.yaml
# kubeadm -v 10 alpha phase certs renew front-proxy-client --config /etc/kubernetes/kubeadm-config.yaml

更新配置文件(Master)

重启 Static Pod(根据官方文档):
1)将 /etc/kubernetes/manifests/ 里的配置移动到其他目录(不要删除);
2)等待 20s(fileCheckFrequency 配置的 kubelet 文件扫描周期);
3)再将 Static Pod 配置 移入 /etc/kubernetes/manifests/ 目录;

或者,通过如下命令删除 Static Pod 实例:

# 1)删除当前节点
kubectl delete pods -n kube-system \
    $(kubectl get pod --all-namespaces -o=jsonpath='{.items[?(@.metadata.ownerReferences[].name=="'${HOSTNAME}'")].metadata.name}')

# 2)删除所有节点
kubectl delete pods -n kube-system \
    $(kubectl get pod --all-namespaces -o=jsonpath='{.items[?(@.metadata.ownerReferences[].kind=="Node")].metadata.name}')

重启服务(所有节点)

systemctl restart kubelet.service

该步骤操作并没有必要:
1)kubelet.conf,虽然包含证书信息,但是当 kubeadm 部署 kubelet 时以为其配置滚动证书更新,其相关的配置保存在 /var/lib/kubelet/pki 中;
2)此外,在手动处理证书的场景中,官方文档也未提及要处理 kubelet 服务。

参考文献

Certificate Management with kubeadm | Kubernetes
Does JsonPath support the AND (&&) operator? – Stack Overflow
how to renew the certificate when apiserver cert expired? #581
JsonPath nested condition and multiple conditions are not working · Issue #20352 · kubernetes/kubernetes
JSONPath Support | Kubernetes
k8s踩坑(三)、kubeadm证书/etcd证书过期处理
k8s采坑记 – 证书过期之kubeadm重新生成证书
Kubeadm fails – kubelet fails to find /etc/kubernetes/bootstrap-kubelet.conf #3769
Kubelet fails to authenticate to apiserver due to expired certificate #65991
kubernetes – Filter kubectl get based on anotation – Stack Overflow
kubernetes – How to identify static pods via kubectl command? – Stack Overflow
Renew kubernetes pki after expired