「PROMETHEUS」- kubelet cAdvisor

cAdvisor 能收集有关给定节点上运行的所有容器的 CPU、内存、文件、网络使用情况的统计信息。kubelet 已集成 cAdvisor 的功能,用于监控资源使用情况并分析容器的性能(cAdvisor 不在 Pod 级别操作,而是针对 Container 级别);

指标信息

cadvisor/prometheus.md/Prometheus container metrics

磁盘及文件系统相关

Bandwidth = irate(container_fs_(rw)_bytes_total{}[5m])
1)container_fs_reads_bytes_total Counter Cumulative count of bytes read bytes diskIO
2)container_fs_writes_bytes_total Counter Cumulative count of bytes written bytes diskIO

IOPS = irate(container_fs_(rw)_total{}[5m])
1)container_fs_reads_total Counter Cumulative count of reads completed diskIO
2)container_fs_writes_total Counter Cumulative count of writes completed diskIO

Latency = irate(container_fs_(rw)_seconds_total{}[5m]) / irate(container_fs_(rw)_total{}[5m])
1)container_fs_read_seconds_total Counter Cumulative count of seconds spent reading diskIO
2)container_fs_write_seconds_total Counter Cumulative count of seconds spent writing seconds diskIO

Merged = irate(container_fs_(rw)_merged_total{}[5m])
1)container_fs_reads_merged_total Counter Cumulative count of reads merged diskIO
2)container_fs_writes_merged_total Counter Cumulative count of writes merged diskIO

container_blkio_device_usage_total	Counter	Blkio device bytes usage	bytes	diskIO

container_fs_inodes_free	Gauge	Number of available Inodes		disk	
container_fs_inodes_total	Gauge	Total number of Inodes		disk	

container_fs_io_current	Gauge	Number of I/Os currently in progress		diskIO	
container_fs_io_time_seconds_total	Counter	Cumulative count of seconds spent doing I/Os	seconds	diskIO	
container_fs_io_time_weighted_seconds_total	Counter	Cumulative weighted I/O time	seconds	diskIO	
container_fs_limit_bytes	Gauge	Number of bytes that can be consumed by the container on this filesystem	bytes	disk	

container_fs_sector_reads_total	Counter	Cumulative count of sector reads completed		diskIO	
container_fs_sector_writes_total	Counter	Cumulative count of sector writes completed		diskIO	
container_fs_usage_bytes	Gauge	Number of bytes that are consumed by the container on this filesystem	bytes	disk	

局限:
1)无法监控所有块设备。例如 container_fs_(r/w)_seconds_total,根据源码(cadvisor/prometheus.go),该指标与文件系统相关。如果块设备没有挂载,则也不会存在该设备相关的指标;
2)无法监控网络文件系统;

服务部署

配置 Exporter 实例

kubelet,通过 API-server/api/v1/nodes/node/proxy/metrics/cadvisor 暴露指标数据,所以不需要单独的 Exporter 实例;

配置 Prometheus Scrape 抓取

  - job_name: 'kubelet-cadvisor'
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor

配置 Grafana Dashbaord 展示

Grafana Labs/Docker-cAdvisor
Grafana Labs/Docker and OS metrics ( cadvisor, node_exporter )

常见问题

container=”POD”

What is the container=”POD” label in Prometheus and why do most examples exclude it?

the pause containers.

参考文献

Kubernetes cAdvisor: Native Monitoring and Metrics
容器监控:cAdvisor – prometheus-book