「PVE」- Metric Server,指标含义

官方文档并未给出 Metric 的含义(03/09/2023 PVE 7.3),我们需要通过其他手段来获取相关信息:
1)分析 pvestatd 源码:git.proxmox.com Git – pve-manager.git/blob – PVE/Service/pvestatd.pm

指标分类

通过 Flux 查询,确定其中包含的指标(Measurement):

// -------------------------------------------------------- // 查看全部 Measurement 指标;

import "influxdata/influxdb/schema"
schema.measurements(bucket: "pve-metric-server")

ballooninfo ----------------------------------------------- // 包含 Host 指标,部分指标与 Guest 相关;
proxmox-support ------------------------------------------- // 包含 Host 指标,部分指标与 Guest 相关;
system ---------------------------------------------------- // 包含 Host 指标,部分指标与 Guest 相关;
blockstat ------------------------------------------------- // 包含 Host 指标,部分指标与 Guest 相关;
nics ------------------------------------------------------ // 包含 Host 指标,部分指标与 Guest 相关;
cpustat --------------------------------------------------- // 仅包含 Host 指标;
memory ---------------------------------------------------- // 仅包含 Host 指标;

// -------------------------------------------------------- // 

import "influxdata/influxdb/schema"
schema.measurementTagKeys(
    bucket: "pve-metric-server",
    measurement: "proxmox-support",
)


import "influxdata/influxdb/schema"

schema.measurementTagValues(
    bucket: "pve-metric-server",
    measurement: "ballooninfo",
    tag: "host",
)

Tags

_start
_stop
_field
_measurement
host
instance
nodename
object
type
vmid

host: pve-node-name pve-guest-name
nodename: pve-node-name
object: nodes qemu storages …
type: dir lvm lvmthin nfs …

Fields

active,actual,avail,avg1,avg15,avg5,balloon,bavail,bfree,blocks,content,cpu,cpus,ctime,disk,diskread,diskwrite,enabled,failed_flush_operations,failed_rd_operations,failed_unmap_operations,failed_wr_operations,favail,ffree,files,flush_operations,flush_total_time_ns,fper,free_mem,freemem,fused,guest,guest_nice,idle,idle_time_ns,invalid_flush_operations,invalid_rd_operations,invalid_unmap_operations,invalid_wr_operations,iowait,irq,last_update,major_page_faults,max_mem,maxdisk,maxmem,mem,mem_swapped_in,mem_swapped_out,memfree,memshared,memtotal,memused,minor_page_faults,name,netin,netout,nice,pbs-library-version,per,pid,qmpstatus,rd_bytes,rd_merged,rd_operations,rd_total_time_ns,receive,running-machine,running-qemu,serial,shared,softirq,status,steal,su_bavail,su_blocks,su_favail,su_files,sum,swapfree,swaptotal,swapused,system,template,total,total_mem,transmit,type,unmap_bytes,unmap_merged,unmap_operations,unmap_total_time_ns,uptime,used,user,user_bavail,user_blocks,user_favail,user_files,user_fused,user_used,wait,wr_bytes,wr_highest_offset,wr_merged,wr_operations,wr_total_time_ns,

ballooninfo

Field

// -------------------------------------------------------- // 查看该 Measurement 下全部 Field 信息;

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "ballooninfo",
)

actual,free_mem,last_update,major_page_faults,max_mem,mem_swapped_in,mem_swapped_out,minor_page_faults,total_mem

proxmox-support

Field

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "proxmox-support",
)

pbs-library-version:该 measurement 中,仅包含 pbs-library-version 字段。鉴于其并不包含其他指标数据,所以我们暂时忽略;

system

Field

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "system",
)

active,avail,balloon,content,cpu,cpus,disk,diskread,diskwrite,enabled,freemem,maxdisk,maxmem,mem,name,netin,netout,pid,qmpstatus,running-machine,running-qemu,serial,shared,status,template,total,type,uptime,used,

disk,diskread,diskwrite,maxdisk:该指标仅与 Guest 相关;

Query

from(bucket: "pve-metric-server")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "system")
    |> filter(fn: (r) => r._field == "disk" or r._field == "maxdisk")
    |> filter(fn: (r) => r.disk != 0)

blockstat(host-and-guest-related)

fields

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "blockstat",
)

bavail,bfree,blocks,failed_flush_operations,failed_rd_operations,failed_unmap_operations,failed_wr_operations,favail,ffree,files,flush_operations,flush_total_time_ns,fper,fused,idle_time_ns,invalid_flush_operations,invalid_rd_operations,invalid_unmap_operations,invalid_wr_operations,per,rd_bytes,rd_merged,rd_operations,rd_total_time_ns,su_bavail,su_blocks,su_favail,su_files,unmap_bytes,unmap_merged,unmap_operations,unmap_total_time_ns,used,user_bavail,user_blocks,user_favail,user_files,user_fused,user_used,wr_bytes,wr_highest_offset,wr_merged,wr_operations,wr_total_time_ns,

host-related fields

获取与 PVE Node 相关的指标:

from(bucket: "pve-metric-server")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "blockstat" and r.object == "nodes")
    |> keep(columns: ["_field"])
    |> distinct(column: "_field")

bavail,bfree,blocks,favail,ffree,files,fper,fused,per,su_bavail,su_blocks,su_favail,su_files,used,user_bavail,user_blocks,user_favail,user_files,user_fused,user_used,

PVE 使用 Perl 语言,通过 Filesys::Df 获取磁盘信息,所以这些指标也是 df 相关的输出。参考 Filesys::Df 文档,以获取相关字段含义;

The keys available in the hash are as follows:
{blocks} = Total blocks on the filesystem.
{bfree} = Total blocks free on the filesystem.
{bavail} = Total blocks available to the user executing the Perl application. This can be different than {bfree} if you have per-user quotas on the filesystem, or if the super user has a reserved amount. {bavail} can also be a negative value because of this. For instance if there is more space being used then you have available to you.
{used} = Total blocks used on the filesystem.
{per} = Percent of disk space used. This is based on the disk space available to the user executing the application. In other words, if the filesystem has 10% of its space reserved for the superuser, then the percent used can go up to 110%.

Here are the available inode keys:
{files} = Total inodes on the filesystem.
{ffree} = Total inodes free on the filesystem.
{favail} = Total inodes available to the user executing the application. See the rules for the {bavail} key.
{fused} = Total inodes used on the filesystem.
{fper} = Percent of inodes used on the filesystem. See rules for the {per} key.

guest-related fields

虚拟机实例相关:

from(bucket: "pve-metric-server")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "blockstat")
    |> filter(fn: (r) => r.host == "k8s130-wn130-206" and r.instance == "scsi0")

nics

PVE Node:

from(bucket: "pve-metric-server")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "nics")
    |> filter(fn: (r) => r.object == "nodes" and r.host == "${server}")

Field

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "nics",
)

netin、netout => 其为与 Guest 相关的指标;
receive、transmit => 其为与 Host 相关的指标; 以 Byte 为单位;

cpustat(host-related)

Field

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "cpustat",
)

avg1,avg15,avg5,cpu,cpus,ctime,guest,guest_nice,idle,iowait,irq,nice,softirq,steal,sum,system,total,used,user,wait,

判断 Host/Guest 相关:

from(bucket: "pve-metric-server")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "cpustat")
  |> drop(columns: ["_field"])
  |> distinct(column: "host")

memory(host-related)

Field

import "influxdata/influxdb/schema"
schema.measurementFieldKeys(
    bucket: "pve-metric-server",
    measurement: "memory",
)

memfree,memshared,memtotal,memused,swapfree,swaptotal,swapused,

判断 Host/Guest 相关:

from(bucket: "pve-metric-server")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "memory")
  |> drop(columns: ["_field"])
  |> distinct(column: "host")