问题描述
Metrics Server 是在 Kubernetes Cluster 中的“容器资源指标源”,负责在集群中收集各种指标数据。当 Kubernetes Cluster 需要通过 HPA 或 VPA 进行自动调整时,它将调用 Metrics API 来获取相关的资源指标数据,而 Metrics API 的数据是由 Metrics Server 提供的。当然除了 Metrics Server 实现,还有已经废弃的 Heapster 实现。
在部署 Metrices Server 服务之后,我们能够通过 kubectl top 来查看容器的资源占用情况:
# kubectl top pod --all-namespaces --containers NAMESPACE POD NAME CPU(cores) MEMORY(bytes) kube-system calico-kube-controllers-65d7476764-w88x6 calico-kube-controllers 1m 10Mi kube-system calico-node-jtfxz calico-node 19m 50Mi kube-system calico-node-m9j8k calico-node 21m 48Mi kube-system coredns-7ff77c879f-nbjs5 coredns 2m 6Mi kube-system coredns-7ff77c879f-v626m coredns 2m 6Mi kube-system etcd-k8scp-01 etcd 13m 40Mi kube-system kube-apiserver-k8scp-01 kube-apiserver 31m 317Mi kube-system kube-controller-manager-k8scp-01 kube-controller-manager 10m 42Mi kube-system kube-proxy-bj6w8 kube-proxy 1m 10Mi kube-system kube-proxy-zj8rv kube-proxy 1m 9Mi kube-system kube-scheduler-k8scp-01 kube-scheduler 3m 11Mi kube-system kube-vip-k8scp-01 kube-vip 4m 34Mi kube-system metrics-server-66d4d747c4-2267n metrics-server 3m 12Mi
该笔记将记录:在 Kubernetes Cluster 中,如何部署 Metrics Server 服务,以及常见问题处理。
解决方案
我们的测试环境为 Kubernetes Cluster v1.16 版本。
第一步、环境检查
Metrics Server 对网络和集群有特殊要求。这样要求在某些集群里不是默认配置,所以要先确认是否满足要求。
1)Metrics Server must be reachable from kube-apiserver
2)The kube-apiserver must be correctly configured to enable an aggregation layer
3)Nodes must have kubelet authorization configured to match Metrics Server configuration
4)Container runtime must implement a container metrics RPCs
第二步、获取部署文件
1)下载 components.yaml 文件(如果无法访问,使用 ./components.yaml 文件)
2)根据需要进行修改:
第三步、应用配置文件
# kubectl apply -f "/path/to/components.yaml" serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created # kubectl get -n kube-system pods -l k8s-app=metrics-server NAME READY STATUS RESTARTS AGE metrics-server-66d4d747c4-nmzs6 1/1 Running 0 8m19s
第四步、查看 Pod 资源占用情况
# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ci-testing 87m 2% 1989Mi 25% k8scp-01 214m 10% 1398Mi 38% # kubectl top pod --all-namespaces --containers NAMESPACE POD NAME CPU(cores) MEMORY(bytes) kube-system calico-kube-controllers-65d7476764-w88x6 calico-kube-controllers 1m 10Mi kube-system calico-node-jtfxz calico-node 19m 50Mi kube-system calico-node-m9j8k calico-node 21m 48Mi kube-system coredns-7ff77c879f-nbjs5 coredns 2m 6Mi kube-system coredns-7ff77c879f-v626m coredns 2m 6Mi kube-system etcd-k8scp-01 etcd 13m 40Mi kube-system kube-apiserver-k8scp-01 kube-apiserver 31m 317Mi kube-system kube-controller-manager-k8scp-01 kube-controller-manager 10m 42Mi kube-system kube-proxy-bj6w8 kube-proxy 1m 10Mi kube-system kube-proxy-zj8rv kube-proxy 1m 9Mi kube-system kube-scheduler-k8scp-01 kube-scheduler 3m 11Mi kube-system kube-vip-k8scp-01 kube-vip 4m 34Mi kube-system metrics-server-66d4d747c4-2267n metrics-server 3m 12Mi
通过 Helm 部署
GitHub – kubernetes-sigs/metrics-server
metrics-server 3.8.2 · kubernetes-sigs/metrics-server
# helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/ # helm pull metrics-server/metrics-server # Chart v3.8.2, App v0.6.1 # helm show values ./metrics-server-3.8.2.tgz > metrics-server-3.8.2.helm-values.yaml # helm --namespace metrics-server \ install metrics-server ./metrics-server-3.8.2.tgz \ -f metrics-server-3.8.2.helm-values.yaml \ --create-namespace
常见错误汇总
x509: cannot validate certificate for 172.16.187.21 because it doesn’t contain any IP SANs
kubeadm config file support –apiserver-cert-extra-sans argument? · Issue #55566 · kubernetes/kubernetes
metrics-server error because it doesn’t contain any IP SANs · Issue #196 · kubernetes-sigs/metrics-server
问题描述:在部署 Metrics Server 服务后,处于 Ready 0/1 状态,查看容器日志显示如下消息
I0421 08:41:56.319259 1 serving.go:325] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) E0421 08:41:56.887402 1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node cita-cloud-staging: unable to fetch metrics from node cita-cloud-staging: Get "https://172.16.159.15:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 172.16.159.15 because it doesn't contain any IP SANs, unable to fully scrape metrics from node k8scp-01: unable to fetch metrics from node k8scp-01: Get "https://172.16.187.21:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 172.16.187.21 because it doesn't contain any IP SANs] I0421 08:41:56.890071 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0421 08:41:56.890071 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0421 08:41:56.890071 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0421 08:41:56.890094 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0421 08:41:56.890101 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0421 08:41:56.890096 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I0421 08:41:56.890506 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key I0421 08:41:56.890751 1 secure_serving.go:197] Serving securely on [::]:4443 I0421 08:41:56.890811 1 tlsconfig.go:240] Starting DynamicServingCertificateController I0421 08:41:56.990229 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0421 08:41:56.990252 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0421 08:41:56.990229 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I0421 08:42:25.578771 1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController I0421 08:42:25.578798 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0421 08:42:25.578811 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0421 08:42:25.578914 1 tlsconfig.go:255] Shutting down DynamicServingCertificateController I0421 08:42:25.578971 1 dynamic_serving_content.go:145] Shutting down serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key I0421 08:42:25.579049 1 secure_serving.go:241] Stopped listening on [::]:4443
问题原因:初始化集群的证书是 kubeadm 生成的,SAN(Subject Alternate Name)没有包含集群节点的 IP 地址,导致通过 IP 进行 HTTPS 访问出现该错误。这也暗示我们在集群初始化时没有采用最完整的做法,正确的解决方法是:重新生成集群证书,并在生成时指定 SAN 信息(Update apiserver certificates for HA k8s cluster)。但是为了快速简单的解决问题,我们采用不安全的做法(请根据自己的要去进行取舍)。
解决方案:修改 components.yaml 部署文件,添加 –kubelet-insecure-tls 选项:
... - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port # 这是我们添加的选项 - --kubelet-insecure-tls ...
Metrics not available for pod
# kubectl top -n kube-system pod W0702 10:48:07.400630 6619 top_pod.go:266] Metrics not available for pod kube-system/coredns-58cc8c89f4-6czm4, age: 4267h8m49.400600321s error: Metrics not available for pod kube-system/coredns-58cc8c89f4-6czm4, age: 4267h8m49.400600321s
通过添加 –kubelet-insecure-tls 选项解决,参考前面的配置。
unable to fetch pod metrics for pod
E0702 02:39:09.777832 1 reststorage.go:160] unable to fetch pod metrics for pod default/counter: no metrics known for pod E0702 02:39:09.777843 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/etcd-k8s-master-02: no metrics known for pod E0702 02:39:09.777863 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/fluentd-p6d29: no metrics known for pod E0702 02:39:09.777874 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-proxy-xdh2z: no metrics known for pod E0702 02:39:09.777885 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/etcd-k8s-master-01: no metrics known for pod E0702 02:39:09.777900 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-proxy-jrpnd: no metrics known for pod
通过添加 –kubelet-insecure-tls 选项解决,参考前面的配置。
参考文献
Kubernetes metrics-server Installation
Installing the Kubernetes Metrics Server
Kubernetes Metrics unable to fetch pod/node metrics – Stack Overflow
Installing the Kubernetes Metrics Server – Amazon EKS
Configure the Aggregation Layer – Kubernetes
What is a SAN Certificate? – SSL.com
metrics/IMPLEMENTATIONS.md at master · kubernetes/metrics