cAdvisor 能收集有关给定节点上运行的所有容器的 CPU、内存、文件、网络使用情况的统计信息。kubelet 已集成 cAdvisor 的功能,用于监控资源使用情况并分析容器的性能(cAdvisor 不在 Pod 级别操作,而是针对 Container 级别);
指标信息
cadvisor/prometheus.md/Prometheus container metrics
磁盘及文件系统相关
Bandwidth = irate(container_fs_(rw)_bytes_total{}[5m])
1)container_fs_reads_bytes_total Counter Cumulative count of bytes read bytes diskIO
2)container_fs_writes_bytes_total Counter Cumulative count of bytes written bytes diskIO
IOPS = irate(container_fs_(rw)_total{}[5m])
1)container_fs_reads_total Counter Cumulative count of reads completed diskIO
2)container_fs_writes_total Counter Cumulative count of writes completed diskIO
Latency = irate(container_fs_(rw)_seconds_total{}[5m]) / irate(container_fs_(rw)_total{}[5m])
1)container_fs_read_seconds_total Counter Cumulative count of seconds spent reading diskIO
2)container_fs_write_seconds_total Counter Cumulative count of seconds spent writing seconds diskIO
Merged = irate(container_fs_(rw)_merged_total{}[5m])
1)container_fs_reads_merged_total Counter Cumulative count of reads merged diskIO
2)container_fs_writes_merged_total Counter Cumulative count of writes merged diskIO
container_blkio_device_usage_total Counter Blkio device bytes usage bytes diskIO container_fs_inodes_free Gauge Number of available Inodes disk container_fs_inodes_total Gauge Total number of Inodes disk container_fs_io_current Gauge Number of I/Os currently in progress diskIO container_fs_io_time_seconds_total Counter Cumulative count of seconds spent doing I/Os seconds diskIO container_fs_io_time_weighted_seconds_total Counter Cumulative weighted I/O time seconds diskIO container_fs_limit_bytes Gauge Number of bytes that can be consumed by the container on this filesystem bytes disk container_fs_sector_reads_total Counter Cumulative count of sector reads completed diskIO container_fs_sector_writes_total Counter Cumulative count of sector writes completed diskIO container_fs_usage_bytes Gauge Number of bytes that are consumed by the container on this filesystem bytes disk
局限:
1)无法监控所有块设备。例如 container_fs_(r/w)_seconds_total,根据源码(cadvisor/prometheus.go),该指标与文件系统相关。如果块设备没有挂载,则也不会存在该设备相关的指标;
2)无法监控网络文件系统;
服务部署
配置 Exporter 实例
kubelet,通过 API-server/api/v1/nodes/node/proxy/metrics/cadvisor 暴露指标数据,所以不需要单独的 Exporter 实例;
配置 Prometheus Scrape 抓取
- job_name: 'kubelet-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
配置 Grafana Dashbaord 展示
Grafana Labs/Docker-cAdvisor
Grafana Labs/Docker and OS metrics ( cadvisor, node_exporter )
常见问题
container=”POD”
What is the container=”POD” label in Prometheus and why do most examples exclude it?
the pause containers.
参考文献
Kubernetes cAdvisor: Native Monitoring and Metrics
容器监控:cAdvisor – prometheus-book