Configuration）

Configuration

Configuration | Prometheus

通过命令行（./prometheus -h）或配置文件（–config.file）来指定服务配置；命令行多用于指定不变的配置（不经常变动）；

重新加载配置：SIGHUP 或 http:///-/reload（需开启 -web.enable-lifecycl）；
如果配置文件存在错误，将不会被重载；

针对配置选项的细节，这里不再深入。在具体场景中，我们将进一步查阅相关内容；

Rules: Recording and Alerting

Recording rules | Prometheus
Alerting rules | Prometheus

Prom 支持两种规则：
1）Record Rule：针对已有指标进行计算，以产生新的指标；
3）Alert Rule：告警规则。每当警报表达式在给定时间点产生一个或多个矢量元素时，对于这些元素的标签集，警报就会被视为活动的。

通过 rule_files 参数，来引用 rule 定义（Rule 定义在文件中）；

通过 promtool 命令，来检查 Rule 文件的定义是否正确；

在 Rule File 中：
1）Alert Rule 与 Record Rule 存在与 Group 中；
2）同个 Group 中的规则，按照顺序被执行；

Record Rule：

groups:
- name: cpu-node
  rules:
  - record: job_instance_mode:node_cpu_seconds:avg_rate5m
    expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))

Alert Rule：

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

- name: example-02
  rules:
  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

# for：用于等待时间，如果在该时间按内再次出现，则触发告警。
# labels：为 Alert 增加新标签，并且覆盖已有标签；标签值可以模板化；
# annotations：用于指定一组信息标签，可用于存储更长的附加信息，例如警报描述或 Runbook 链接。注解值可以模板化；

# $labels：其包含告警实例的所有 标签键值对；
# $externalLabels：其包含配置的外部标签（有外部系统的场景）；

查看运行时的告警：
1）通过 Prom 的 Alerts 标签能够看到当前的告警，及其状态；
2）针对 Pending/Firing 状态，Prom 将生成合成的 Time-Series：ALERTS{alertname=”<>”, alertstate=”<.>”, <additional alert labels>}，其样本为 1 则表示告警活跃，其样本为 0，则表示告警失效。

发送告警信息：
1）Prom 负责发现问题，计算告警，是并不发送告警规则；
2）Alertmanager 负责告警信息的发送；
3）此外，Prom 还能够对 Alertmanager 进行服务发现；

为了避免产生过多的 Alert 和 Series，通过如下方式进行限制：
1）通过 limit 参数，能够针对每 Group 进行限制；
2）当 limit 耗尽：所有 Series 将被丢弃；所有 Alert 也被丢弃（任何状态）；
3）并且会被以错误的进行记录，同样不会写入过时的标记；

Template: Examples and Reference

Template examples | Prometheus
Template reference | Prometheus

在 Prom 中，模板的使用比较广泛：Label、Alert、Console Page（页面被模板化，并在 Label 或 Templating 中引用）
Prom 的 Template 也是基于 Go Templating 系统；

Prom 的模板系统，具有如下特性：
1）能够简单引用变量值，而产生动态的 Lable Value 与 Annotaion Value；复杂场景建议使用 Console Page，而非在 YAML 中写入大段文本；
2）通过使用循环，来产生内容；还有更加高级和复杂的循环机制，来实现某些特殊场景；
3）通过查询得到结果，并获得其中某个值；
4）从 URL 中提取参数；
5）定义可重用模板，以便于在多个 Console 间引用；

模板参考手册

官方 Template reference 文档，其中记录能够在模板中使用的函数；

模板具有多种类型，不同类型的模板其能够使用的参数并不相同，并且还有其他差异：
1）告警字段模板：包含 .Value, .Labels, .ExternalLabels, and .ExternalURL 四个参数；
2）Console 模板：能容更加丰富；具体细节这里不再展开，建议阅读官方文档；

Unit Testing for Rules

Unit Testing for Rules | Prometheus
How to check your prometheus.yml is valid – Robust Perception

测试文件，并结合 promtool 工具，用于测试 Rule 文件是否正确，并且是否按照预期执行；

这里不再详细说明，在后续的使用过程中，我们将再深入了解。

通过 promtool 命令，检查配置文件是否正确：promtool check config prometheus.yml

HTTPS and authentication

HTTPS and authentication | Prometheus

Prom 支持对 Client 进行 TLS / Basic Auth 认证（实验性质，将来可能会发生变更）；

这里不再详细说明，在后续的使用过程中，我们将再深入了解。

Filed under: K4NZDROID - @ 11:20 PM

NOTE

/ 记录问题 / 解决问题 / 技术博客 / 工作笔记 /

Table of Contents

Categories

Recent Posts

Archives

「Prometheus」- 配置设置（=> PROMETHEUS/Configuration）

Configuration

Rules: Recording and Alerting

Template: Examples and Reference

模板参考手册

Unit Testing for Rules

HTTPS and authentication

#ezw_tco-9 .ez-toc-title{ font-size: 120%; font-weight: 500; color: #000; } #ezw_tco-9 .ez-toc-widget-container ul.ez-toc-list li a{ font-size: 120%; font-weight: 500; color: #000; } #ezw_tco-9 .ez-toc-widget-container ul.ez-toc-list li.active{ background-color: #ededed; } Table of Contents

Categories

Recent Posts

Archives

Configuration

Rules: Recording and Alerting

Template: Examples and Reference

模板参考手册

Unit Testing for Rules

HTTPS and authentication

Table of Contents