「Linux」- 处理器使用率（学习笔记）

多任务操作系统，将每个 CPU 的时间划分为很短的时间片，再通过调度器轮流分配给各个任务使用，因此造成多任务同时运行的错觉。

理解概念：节拍率（Timer Interrupt Frequency）

为了维护 CPU 时间，Linux 通过事先定义的节拍率（在内核中表示为 HZ），触发时间中断，并使用全局变量 Jiffies 记录开机以来的节拍数。每发生一次时间中断，Jiffies 的值就加 1。

节拍率 HZ 是内核的可配选项，可以设置为 100、250、1000 等。系统不同，设置不同，通过查询 /boot/config 的 CONFIG_HZ 内核选项来查看它的配置值。

用户空间节拍率

节拍率 HZ 是内核选项，所以用户空间程序并不能直接访问。为了方便用户空间程序，内核还提供了一个用户空间节拍率 USER_HZ，它总是固定为 100，也就是 1/100 秒。这样，用户空间程序并不需要关心内核中 HZ 被设置成了多少，因为它看到的总是固定值 USER_HZ。

统计数据：查看处理器的总体使用率

# cat /proc/stat | grep ^cpu
cpu  11424212 19477670 9071090 783703472 711024 0 796129 0 0 0
cpu0 3497583 4899058 2362550 195054993 232905 0 330682 0 0 0
cpu1 1923176 4857103 2082146 197184640 87422 0 108655 0 0 0
cpu2 2074101 4841395 2245553 196715460 180663 0 216623 0 0 0
cpu3 3929351 4880112 2380840 194748377 210033 0 140168 0 0 0

如上命令，仅获取 CPU 相关的指标，含义如下：
1）cpu，表示的是所有 CPU 的累加
2）cpu0, cpu1, …，表示不同场景下 CPU 的累加节拍数，它的单位是 USER_HZ，也就是 10 ms（1/100 秒）

每个字段的含义可以查看 man proc 手册，它们都是 CPU 使用率相关的重要指标，很多工具使用这些数据。各列含义如下（依序）：
01）user, us：代表用户态 CPU 时间，但不包括下面的 nice 时间，但包括 guest 时间
02）nice, ni：低优先级用户态 CPU 时间，也就是进程的 nice 值被调整为 1-19 之间时的 CPU 时间。nice 可取值范围是 -20 到 19，数值越大，优先级反而越低
03）system, sys：代表内核态 CPU 时间
04）idle, id：代表空闲时间。但不包括等待 I/O 的时间（iowait）
05）iowait, wa：代表等待 I/O 的 CPU 时间
06）irq, hi：代表处理硬中断的 CPU 时间
07）softirq, si：代表处理软中断的 CPU 时间
08）steal, st：代表当操作系统运行在虚拟机中的时候，被其他虚拟机占用的 CPU 时间
09）guest, guest：代表通过虚拟化运行其他操作系统的时间，也就是运行虚拟机的 CPU 时间
10）guest_nice, gnice：代表以低优先级运行虚拟机的时间

计算处理器使用率

CPU 使用率，就是除了空闲时间外的其他时间占总 CPU 时间的百分比：1 – （空闲 / 总 CPU 时间）

每个场景的 CPU 使用率 = 该场景的 CPU 时间 / 总的 CPU 时间

但是使用开机以来的总的 CPU 时间没有太大意义，所以一般计算方法为：1 – （空闲时间_new – 空闲时间_old / 总 CPU 时间_new – 总 CPU 时间_old）

统计数据：查看特定进程的处理器使用率

进程统计可以通过 /proc/<pid>/stat 文件系统，总共有 52 列数据，通过 man proc 获取字段信息。

计算处理器使用率

方法是类似的，但是多数工具已经计算，因此我们无需手动计算。

统计数据：使用命令查看

性能分析工具给出的都是间隔一段时间的平均 CPU 使用率，所以要注意间隔时间的设置。比如 top 和 ps 这两个工具报告的 CPU 使用率，默认的结果很可能不一样，因为 top 默认使用 3 秒时间间隔，而 ps 使用的却是进程的整个生命周期。

使用 top 命令

# top
top - 09:25:09 up 22 days, 12:35, 15 users,  load average: 1.33, 1.25, 1.28
Tasks: 457 total,   2 running, 450 sleeping,   0 stopped,   5 zombie
%Cpu0  : 16.3 us,  6.7 sy,  0.0 ni, 76.0 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 17.6 us,  2.9 sy,  0.0 ni, 78.4 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu2  : 28.2 us,  3.9 sy,  0.0 ni, 68.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 47.1 us,  6.9 sy,  0.0 ni, 46.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15731.1 total,    235.4 free,  13699.6 used,   1796.1 buff/cache
MiB Swap:  30720.0 total,  19851.2 free,  10868.8 used.    825.9 avail Mem

  PID  PPID  PGRP USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+   SID TTY      COMMAND
 1607     1  1606 libvirt+  20   0 6732.0m   3.9g   2.9m S  39.2  25.1  21068:47  1606 ?        /usr/bin/qemu-system-x86_64 -name guest=win7-
22894     1  7409 k4nz      20   0 5078.4m   1.3g  41.0m S   0.0   8.7 220:09.98  7409 tty2     /opt/thunderbird/thunderbird-60.5.0/thunderbi
22132     1  7409 k4nz      20   0 3969.3m   1.1g  65.1m S   1.0   7.3 190:11.43  7409 tty2     /usr/lib/chromium/chromium --show-component-e
 4635  4599  7409 k4nz      20   0 4489.9m 615.4m   3.9m S   1.0   3.9  80:22.94  7409 tty2     /opt/java/bin/java -Xms128m -Xmx750m -XX:MaxP

%CpuN：是将 CPU 时间变换成了 CPU 使用率进行显示

%CPU：表示进程的 CPU 使用率。它是用户态和内核态 CPU 使用率的总和，包括：
1）进程用户空间使用的 CPU
2）通过系统调用执行的内核空间 CPU
3）在就绪队列等待运行的 CPU
4）运行虚拟机占用的 CPU（在虚拟化环境中）

查看特定进程的处理器使用情况（pidstat）

# pidstat 1 5 // 每隔 1 秒输出一组数据，共输出 5 组
Linux 4.19.0-9-amd64 (laptop)   07/28/2020      _x86_64_        (4 CPU)

09:52:33 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:52:34 AM     0      1084    0.98    0.00    0.00    0.00    0.98     2  client_linux_am
09:52:34 AM 64055      1607    0.00   27.45   37.25    0.00   60.78     1  qemu-system-x86
09:52:34 AM     0      4199    0.00    0.98    0.00    0.00    0.98     0  kworker/0:0-events_power_efficient
09:52:34 AM  1001      6025    0.98    0.00    0.00    0.00    0.98     3  Web Content
09:52:34 AM  1001      7411    0.98    0.98    0.00    0.00    1.96     3  Xorg
09:52:34 AM     0      7995    0.98    1.96    0.00    0.00    2.94     2  pidstat
09:52:34 AM  1001     14683    0.98    0.00    0.00    0.00    0.98     3  switchhosts

...

09:52:34 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:52:35 AM     0      1084    1.00    0.00    0.00    0.00    1.00     2  client_linux_am
09:52:35 AM 64055      1607   10.00   24.00   22.00    1.00   56.00     1  qemu-system-x86
09:52:35 AM     0      1622    1.00    0.00    0.00    0.00    1.00     1  dockerd
09:52:35 AM  1001      6776    4.00    2.00    0.00    0.00    6.00     3  chrome
09:52:35 AM  1001      7074    1.00    0.00    0.00    0.00    1.00     2  _Postman
09:52:35 AM  1001      7411    0.00    4.00    0.00    1.00    4.00     0  Xorg
09:52:35 AM     0      7995    1.00    0.00    0.00    0.00    1.00     2  pidstat

...

Average:      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
Average:        0      1084    0.99    0.00    0.00    0.00    0.99     -  client_linux_am
Average:    64055      1607    2.97   25.74   29.70    0.50   58.42     -  qemu-system-x86
Average:        0      1622    0.50    0.00    0.00    0.00    0.50     -  dockerd

%usr：用户态 CPU 使用率；
%system：内核态 CPU 使用率；
%guest：运行虚拟机 CPU 使用率；
%wait：在等待运行时，由任务消耗 CPU 使用率；
%CPU：总的 CPU 使用率

排查方法：定位处理器使用率较高的函数

使用 GDB 会中断程序，不适合生产环境问题排查，GDB 只适合用在性能分析的后期，当你找到了出问题的大致函数后，线下再借助它来进一步调试函数内部的问题。

我们可以使用 perf 命令追踪处理器性能问题：

方法一、perf top

perf top，实时显示占用 CPU 时钟最多的函数或者指令，因此可以用来查找热点函数

Samples: 106K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 20534774874
Overhead  Shared Object                     Symbol
   3.97%  qemu-system-x86_64 (deleted)      [.] 0x0000000000633c21
   3.08%  libxul.so                         [.] 0x0000000003e14391
   2.90%  [kernel]                          [k] vmx_vcpu_run
   1.39%  [kernel]                          [k] ___bpf_prog_run
   1.22%  qemu-system-x86_64 (deleted)      [.] 0x0000000000633c10
   0.74%  libc-2.28.so                      [.] malloc
   0.68%  [kernel]                          [k] __indirect_thunk_start
   0.66%  libgirepository-1.0.so.1.0.0      [.] 0x0000000000025950
   0.55%  [kernel]                          [k] _raw_spin_lock_irqsave
   0.53%  [kernel]                          [k] __schedule
   0.53%  [kernel]                          [k] __entry_SYSCALL_64_trampoline
   0.50%  qemu-system-x86_64 (deleted)      [.] 0x0000000000633c14
   0.48%  [kernel]                          [k] menu_select
   0.46%  qemu-system-x86_64 (deleted)      [.] 0x000000000037201d
   0.45%  libglib-2.0.so.0.5800.3           [.] g_main_context_check

采样数（Samples）、事件类型（event）、事件总数量（Event count）。注意，如果采样数过少（比如只有十几个），那下面的排序和百分比就没什么实际参考价值。

Overhead ，是该符号的性能事件在所有采样中的比例，用百分比来表示。
Shared Object，该函数或指令所在的动态共享对象（Dynamic Shared Object），如内核、进程名、动态链接库名、内核模块名等
Symbol，符号名，也就是函数名。当函数名未知时，用十六进制的地址来表示。前缀是动态共享对象的类型：[.] 表示用户空间的可执行程序、动态链接库；[k] 表示内核空间

方法二、perf record, perf report

perf record 则提供保存数据的功能。保存后的数据，需要你用 perf report 解析展示。而前面的 perf top 只能实时显示。

# perf record // 按 Ctrl+C 终止采样
[ perf record: Woken up 14 times to write data ]
[ perf record: Captured and wrote 7.411 MB perf.data (83826 samples) ]

// 此时会生成 perf.data 文件（当前目录）

# perf report

为 perf top 和 perf record 加上 -g 参数，可以开启调用关系的采样，方便根据调用链来分析性能问题

方法三、使用 execsnoop 工具

https://github.com/brendangregg/perf-tools
Debian 10 => perf-tools-unstable

该工具专门用于短时进程，通过 ftrace 实时监控进程的 exec() 行为，并输出短时进程的基本信息，包括进程 PID、父进程 PID、命令行参数以及执行的结果。

理论实践：查找异常进程，定位问题函数

// 在 a 主机，执行：

# docker run --name nginx-perf -p 10000:80 -itd feisky/nginx
# docker run --name phpfpm-perf -itd --network container:nginx-perf feisky/php-fpm

// 在 b 主机，执行：

# ab -c 10 -n 100 http://10.10.50.166:10000/
...
Requests per second:    11.28 [#/sec] (mean)
Time per request:       886.599 [ms] (mean)
Time per request:       88.660 [ms] (mean, across all concurrent requests)
...

// 从输出中可以看出，平均每秒只有 11.63 个请求，非常低

// 在 b 主机，我们增加请求，并进行测试

ab -c 10 -n 10000 http://10.10.50.166:10000/

// 在 a 主机，使用 top 命令

# top
top - 11:19:05 up 218 days,  3:07,  3 users,  load average: 4.77, 1.82, 0.82
Tasks: 186 total,   7 running, 177 sleeping,   1 stopped,   1 zombie
%Cpu(s): 97.5 us,  1.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
MiB Mem :   6813.8 total,    199.8 free,   3860.9 used,   2753.2 buff/cache
MiB Swap:   8192.0 total,   4145.0 free,   4047.0 used.   2505.7 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5760 bin       20   0  328.8m   7.2m   1.6m R  46.1  0.1   0:35.35 php-fpm: pool www
 5762 bin       20   0  328.8m   7.2m   1.6m R  37.3  0.1   0:34.51 php-fpm: pool www
 5763 bin       20   0  328.8m   7.2m   1.6m R  35.3  0.1   0:34.74 php-fpm: pool www
 5761 bin       20   0  328.8m   7.2m   1.6m R  34.3  0.1   0:35.52 php-fpm: pool www
 5759 bin       20   0  328.8m   7.2m   1.6m R  32.4  0.1   0:36.23 php-fpm: pool www

// 可以看到 php-fpm 的 CPU 占用很高

// 接下来，定位占用 CPU 最高的函数

# perf top -g -p 5760 // -g 开启调用关系分析，-p 指定 php-fpm 的进程号 21515
# perf record -g -p 5760 // 由于是在容器中进行测试，所以 perf top 无法显示函数
# docker cp php-fpm-perf:/ /tmp/php-fpm-perf/ // bindfs /proc/$PID/root /tmp/foo
# perf report -g --input=./perf.data --symfs=/tmp/php-fpm-perf/
Samples: 175K of event 'cpu-clock', Event count (approx.): 43919250000
  Children      Self  Command  Shared Object           Symbol
-   99.85%     0.00%  php-fpm  [unknown]               [.] 0x6cb6258d4c544155
     0x6cb6258d4c544155
   + __libc_start_main
+   99.85%     0.00%  php-fpm  libc-2.24.so            [.] __libc_start_main
+   99.57%     0.00%  php-fpm  php-fpm                 [.] 0x000055a806174642
+   99.56%     0.00%  php-fpm  php-fpm                 [.] php_execute_script
-   99.54%     0.00%  php-fpm  php-fpm                 [.] zend_execute_scripts
   - zend_execute_scripts
      - 99.52% zend_execute
         - 96.74% execute_ex
            - 20.33% 0x55a80606da7c
                 5.96% sqrt
                 1.64% 0x55a805e2ab9d
                 1.05% 0x55a805e2ab99
                 0.52% 0x55a805e2ab91
            - 17.27% 0x55a806136ea3
               - 4.59% 0x55a806136d97
                    4.52% add_function
                 1.04% 0x55a806136c03

// 使用 上下 移动光标进行操作，使用 Enter 展开对应项

// 从上面我们可以看出，最大的消耗来自于 sqrt 与  add_function 函数

// 接下俩就是排查代码处理这个问题，然后再次进行测试

// 由于代码比较简单，我们这里跳过修改，直接进行新代码的测试

# ab -c 10 -n 10000 http://10.10.50.166:10000/
...
Requests per second:    1548.36 [#/sec] (mean)
Time per request:       6.458 [ms] (mean)
Time per request:       0.646 [ms] (mean, across all concurrent requests)
Transfer rate:          260.08 [Kbytes/sec] received
...

理论实践：处理器使用率很高，但无法定位进程（短暂进程）

第一步、初始环境，创建负载

// 在 A 主机执行

# docker run --name nginx-sp -p 10000:80 -itd feisky/nginx:sp
# docker run --name phpfpm-sp -itd --network container:nginx-sp feisky/php-fpm:sp

// 在 B 主机执行

# curl http://10.10.50.166:10000/
# ab -c 100 -n 1000 http://10.10.50.166:10000/
...
Requests per second:    65.54 [#/sec] (mean)
Time per request:       1525.747 [ms] (mean)
Time per request:       15.257 [ms] (mean, across all concurrent requests)
...

// 重点观察 Requests per second 字段，发现每秒内接受的请求量并不是很多

// 接下来进入调试阶段，持续请求，用于排查问题

# ab -c 5 -t 600 http://10.10.50.166:10000/            // 持续请求十分钟

第二步、问题排查，找到原因

// 首先，使用 top 命令观察

# top
top - 10:35:33 up 219 days,  2:23,  3 users,  load average: 4.22, 1.31, 0.61
Tasks: 199 total,   5 running, 193 sleeping,   1 stopped,   0 zombie
%Cpu0  : 74.0 us, 23.0 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 68.3 us, 23.8 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi,  5.0 si,  0.0 st
MiB Mem :   6813.8 total,    263.8 free,   3943.9 used,   2606.2 buff/cache
MiB Swap:   8192.0 total,   4150.3 free,   4041.7 used.   2553.6 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  462 101       20   0   32.3m   2.1m   0.7m S   4.9  0.0   0:03.03 nginx: worker process
22608 systemd+  20   0 4298.3m   2.0g   4.5m S   4.9 30.0  16028:57 mongod --smallfiles --rest --auth
 5055 root      20   0  671.6m  42.1m   7.5m S   3.9  0.6 608:49.97 /usr/bin/dockerd -H fd:// --containerd=/ru
 4450 bin       20   0  328.8m   7.5m   1.9m S   2.9  0.1   0:01.28 php-fpm: pool www
 4458 bin       20   0  328.8m   7.5m   1.9m S   2.9  0.1   0:01.31 php-fpm: pool www
 4470 bin       20   0  328.8m   7.5m   1.9m S   2.9  0.1   0:01.25 php-fpm: pool www
 4479 bin       20   0  328.8m   7.5m   1.9m S   2.9  0.1   0:01.24 php-fpm: pool www
  407 root      20   0    9.1m   3.7m   2.7m S   2.0  0.1   0:01.52 containerd-shim -namespace moby -workdir /
 4465 bin       20   0  328.8m   7.5m   1.9m S   2.0  0.1   0:01.25 php-fpm: pool www
14995 100       20   0  142.5m  48.7m  31.9m S   2.0  0.7   6990:18 consul agent -data-dir=/consul/data -confi
15252 systemd+  20   0 2753.3m  76.6m   1.0m S   2.0  1.1   4805:58 /usr/lib/erlang/erts-8.2.1/bin/beam.smp -W
    9 root      20   0    0.0m   0.0m   0.0m S   1.0  0.0 549:39.18 [rcu_sched]
 5032 root      20   0  560.5m   1.0m   0.6m S   1.0  0.0  60:36.71 /usr/bin/python2 -Es /usr/sbin/tuned -l -P

// 观察单个进程的最高 CPU 使用率，但是进程的使用率并不是非常高，没有异常的进程
// 观察系统 CPU 使用率，系统总的 CPU 使用率非常高（%sys），并且用户 CPU 使用率非常高（%usr）
// 观察进程，没有发现过度使用 CPU 异常的进程。

// 然后，使用 pidstat 命令观察

# pidstat
... // 省略输出

// 与使用 top 类似，依旧无法观察到使用 CPU 过度的进程

// 回头再次查看 top 命令输出，会发现处于 running 状态的进程较多，且为 stress 程序

// 但是无法使用 ps / pidstat 找到这些进程，则可能是因为这些进程是：
// 临时进程，在不停的重启
// 运行时间短，在不断的创建新实例

// 接下来我们要定位到这些进程的父进程，即找到创建他们的进程

# pstree | grep -C 5 stress // 搜索 stress 进程，并显示前后 5 行输出
        |-auditd---{auditd}
        |-chronyd
        |-containerd-+-containerd-shim-+-nginx---2*[nginx]
        |            |                 `-9*[{containerd-shim}]
        |            |-containerd-shim-+-php-fpm-+-4*[php-fpm]
        |            |                 |         `-php-fpm---sh---stress---stress
        |            |                 `-10*[{containerd-shim}]
        |            |-containerd-shim-+-nginx---2*[nginx]
        |            |                 `-11*[{containerd-shim}]
        |            |-containerd-shim-+-docker-entrypoi---consul---12*[{consul}]
        |            |                 `-11*[{containerd-shim}]

// 这时候你会发现，请求还是来自于容器的，我们应该去排查代码，检查 stress 命令调用

// 当找到 stress 调用之后，需要检查日志输出，查看这个调用是否存在问题，是否有异常发生

# curl http://10.10.50.166:10000?verbose=1
Server internal error: Array
(
    [0] => stress: info: [22982] dispatching hogs: 0 cpu, 0 io, 0 vm, 1 hdd
    [1] => stress: FAIL: [22983] (563) mkstemp failed: Permission denied
    [2] => stress: FAIL: [22982] (394) <-- worker 22983 returned error 1
    [3] => stress: WARN: [22982] (396) now reaping child worker processes
    [4] => stress: FAIL: [22982] (400) kill error: No such process
    [5] => stress: FAIL: [22982] (451) failed run completed in 0s
)

// 但是我们还是要确认 CPU 时间分配到了哪里，使用 perf 分析

# perf record -g
# docker cp phpfpm-sp:/ /tmp/phpfpm-sp // bindfs /proc/$PID/root /tmp/foo
# perf report --input=./perf.data --symfs=/tmp/phpfpm-sp
Samples: 390K of event 'cpu-clock', Event count (approx.): 97539000000
  Children      Self  Command          Shared Object               Symbol
-   61.05%     0.03%  stress           stress                      [.] main
   - 61.05% main
      + 30.37% hoghdd
        12.21% random_r
        9.70% random
        3.27% rand
        1.73% rand@plt
        1.22% random_r
        1.09% random
+   30.44%    26.73%  stress           stress                      [.] hoghdd
+   13.66%    13.43%  stress           libc-2.24.so                [.] random_r
+   10.99%    10.79%  stress           libc-2.24.so                [.] random
+    5.49%     0.00%  stress           [kernel.kallsyms]           [k] 0xffffffffb7b6b758
+    5.49%     0.00%  stress           [kernel.kallsyms]           [k] 0xffffffffb7b6f915
+    3.79%     0.00%  stress           [kernel.kallsyms]           [k] 0xffffffffb7b6f5e3

// 此时我们发现主要负载来自于 random() hoghdd() 函数

该案例最主要的特征是无法通过 top/ps/pidstat 直接观察到 CPU 使用较高的进程，但是回归本质：CPU 使用率较高因为存在很多 RUNNING 状态的进程，因此应该将排查的重点放在 RUNNING 状态的进程。可以使用 ps --ppid 2 -p 2 --deselect -o stat,command | grep -E "^R" 命令进行排查。

或者在最开始我们可以直接使用 perf record 命令找到 CPU 消耗的哪里，而不是先用 top ps pidstat 进行观察。原因是 top ps pidstat 是属于进程快照，追踪某个时间点，可能在这个时间点进程并不存在，但是 perf 不同，它采用事件记录的方式。

碰到常规问题无法解释的 CPU 使用率情况时，首先要想到有可能是短时应用导致的问题：
1）应用里直接调用了其他二进制程序，这些程序通常运行时间比较短，通过 top 等工具也不容易发现
2）应用本身在不停地崩溃重启，而启动过程的资源初始化，很可能会占用相当多的 CPU 资源，前面 stress/random() 就是例子；

参考文献

05 | 基础篇：某个应用的CPU使用率居然达到100%，我该怎么办？
docker cp | Docker Documentation

Filed under: K4NZDROID - @ 2:35 PM

NOTE

/ 记录问题 / 解决问题 / 技术博客 / 工作笔记 /

Table of Contents

Categories

Recent Posts

Archives

「Linux」- 处理器使用率（学习笔记）

理解概念：节拍率（Timer Interrupt Frequency）

用户空间节拍率

统计数据：查看处理器的总体使用率

计算处理器使用率

统计数据：查看特定进程的处理器使用率

计算处理器使用率

统计数据：使用命令查看

使用 top 命令

查看特定进程的处理器使用情况（pidstat）

排查方法：定位处理器使用率较高的函数

方法一、perf top

方法二、perf record, perf report

方法三、使用 execsnoop 工具

理论实践：查找异常进程，定位问题函数

理论实践：处理器使用率很高，但无法定位进程（短暂进程）

第一步、初始环境，创建负载

第二步、问题排查，找到原因

参考文献

#ezw_tco-9 .ez-toc-title{ font-size: 120%; font-weight: 500; color: #000; } #ezw_tco-9 .ez-toc-widget-container ul.ez-toc-list li a{ font-size: 120%; font-weight: 500; color: #000; } #ezw_tco-9 .ez-toc-widget-container ul.ez-toc-list li.active{ background-color: #ededed; } Table of Contents

Categories

Recent Posts

Archives

理解概念：节拍率（Timer Interrupt Frequency）

用户空间节拍率

统计数据：查看处理器的总体使用率

计算处理器使用率

统计数据：查看特定进程的处理器使用率

计算处理器使用率

统计数据：使用命令查看

使用 top 命令

查看特定进程的处理器使用情况（pidstat）

排查方法：定位处理器使用率较高的函数

方法一、perf top

方法二、perf record, perf report

方法三、使用 execsnoop 工具

理论实践：查找异常进程，定位问题函数

理论实践：处理器使用率很高，但无法定位进程（短暂进程）

第一步、初始环境，创建负载

第二步、问题排查，找到原因

参考文献

Table of Contents