Process vs Thread#

TBD

Troubleshooting#

overview

0 /proc/stat#

1. top#

$ top
top - 17:18:53 up 50 days, 16:06,  7 users,  load average: 0.00, 0.00, 0.00
Tasks: 127 total,   1 running, 126 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    957.5 total,    122.6 free,    162.0 used,    673.0 buff/cache
MiB Swap:   2400.0 total,   2282.0 free,    118.0 used.    622.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 782513 root      20   0   17312  11036   8652 S   0.3   1.1   0:00.04 sshd
      1 root      20   0  167728   9424   6504 S   0.0   1.0   3:05.88 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.66 kthreadd

# man top to get top details, simple put:

# load average : the average number of processes that are either in a runnable or uninterruptable state for the past 1, 5, and 15 minutes. 
# e.g. `load average = 1.5` in 6 cpu system means 1/4 cpu is under laod, 3/4 cpu is idle

# us, user     : time running un-niced user processes
# sy, system   : time running kernel processes
# ni, nice     : time running niced user processes (process priority, a negative nice value means higher priority, whereas a positive nice value means lower priority)
# id, idle     : time spent in the kernel idle handler
# wa, IO-wait  : time waiting for I/O completion
# hi           : time spent servicing hardware interrupts
# si           : time spent servicing software interrupts
# st           : time stolen from this vm by the hypervisor

# PR: The scheduling priority of the task (real priority of a process as seen by the kernel)
# NI: The nice value of the task (a priority hint for the kernel)
# VIRT: The total amount of virtual memory used by the task (physical memory + swap)
# RES: A subset of the virtual address space (VIRT) representing the non-swapped physical memory (only physical memory)
# SHR: A subset of resident memory (RES) that may be used by other processes
# S: Process Status, the status of the task which can be one of:
#       D = uninterruptible sleep
#       I = idle
#       R = running
#       S = sleeping
#       T = stopped by job control signal
#       t = stopped by debugger during trace
#       Z = zombie

2. vmstat#

vmstat reports information about processes, memory, paging, block IO, traps, disks and cpu activity

refer to details

3. pidstat (process level)#

pidstat report statistics (cpu, memory, disk, stack) for Linux tasks (process), by defaut is cpu utilization without params.

  • -d Report I/O statistics
  • -R Report realtime priority and scheduling policy information.
  • -r Report page faults and memory utilization.
  • -s Report stack utilization.
  • -u Report CPU utilization
  • -v Report values of some kernel tables.
  • -w Report task switching activity
# Display 2 reports of CPU statistics for every active task in the system per second intervals.
$ pidstat 1 2
05:31:13 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
05:31:14 PM     0    798270    0.00    1.00    0.00    0.00    1.00     0  pidstat

05:31:14 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command

Average:      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
Average:        0    798270    0.00    0.50    0.00    0.00    0.50     -  pidstat

# -p Select tasks (processes) for which statistics are to be reported 
$ pidstat -p 655152 2 3
05:38:56 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
05:38:58 PM     0    655152    0.00    0.00    0.00    0.00    0.00     0  node_exporter
05:39:00 PM     0    655152    0.00    0.00    0.00    0.00    0.00     0  node_exporter
05:39:02 PM     0    655152    0.00    0.00    0.00    0.00    0.00     0  node_exporter
Average:        0    655152    0.00    0.00    0.00    0.00    0.00     -  node_exporter

# man pidstat for details, simple put

# PID     The identification number of the task being monitored.
# %usr    Percentage of CPU used by the task while executing at the user level (application)
# %system Percentage of CPU used by the task while executing at the system level (kernel).
# %guest  Percentage of CPU spent by the task in virtual machine (running a virtual processor)
# %wait   Percentage of CPU spent by the task while waiting to run.
# %CPU    Total percentage of CPU time used by the task. 
# CPU     Processor number to which the task is attached.
# Command The command name of the task.

4. mpstat (cpu level)#

mpstat - Report processors related statistics.

# Display 2 reports of statistics for all processors at 1 second intervals.
$ mpstat -P ALL 1 2
11:43:14 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:43:15 PM  all    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
11:43:15 PM    0    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00

11:43:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:43:16 PM  all    1.02    0.00    1.02    0.00    0.00    0.00    0.00    0.00    0.00   97.96
11:43:16 PM    0    1.02    0.00    1.02    0.00    0.00    0.00    0.00    0.00    0.00   97.96

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    1.01    0.00    0.51    0.00    0.00    0.00    0.00    0.00    0.00   98.48
Average:       0    1.01    0.00    0.51    0.00    0.00    0.00    0.00    0.00    0.00   98.48

# man mpstat for details, simple put

# %usr     Show the percentage of CPU utilization that occurred while executing at the user level (application).
# %nice    Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
# %sys     Show the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.
# %iowait  Show the percentage of time that the CPU or CPUs were idle during which the system had an  outstanding disk I/O request.
# %irq     Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
# %soft    Show the percentage of time spent by the CPU or CPUs to service software interrupts.
# %steal   Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
# %guest   Show the percentage of time spent by the CPU or CPUs to run a virtual processor.
# %gnice   Show the percentage of time spent by the CPU or CPUs to run a niced guest.
# %idle    Show the percentage of time that the CPU or CPUs were idle and the system did not have an  outstanding disk I/O request.

5. perf#

Performance analysis tools for Linux (TBD after the actual usage)

# install perf in linux (with root)
$ apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

$ perf top
Samples: 22K of event 'cycles', 4000 Hz, Event count (approx.): 1228941005 lost: 0/0 drop: 0/734
Overhead  Shared Object                                                   Symbol
  14.88%  perf                                                            [.] __symbols__insert
   9.56%  perf                                                            [.] rb_next
   1.74%  perf                                                            [.] rust_demangle_callback
   1.53%  perf                                                            [.] output_resort
   1.44%  perf                                                            [.] dso__find_symbol
   1.23%  perf                                                            [.] rb_insert_color
   1.04%  [kernel]                                                        [k] clear_page_rep
   0.89%  perf                                                            [.] hist_entry__sort
   0.88%  perf                                                            [.] hpp__sort_overhead
   0.88%  [kernel]                                                        [k] asm_sysvec_apic_timer_interrupt
   0.82%  libslang.so.2.3.2                                               [.] SLsmg_write_chars
   0.66%  libc.so.6                                                       [.] cfree
   0.63%  libc.so.6                                                       [.] 0x00000000000a1747
   0.60%  sshd                                                            [.] 0x000000000006228c
   0.57%  [kernel]                                                        [k] memcpy_toio

# -g Enables call-graph (stack chain/backtrace) recording
$ perf top -g -p 655152

6. strace#

strace - trace system calls and signals, refer to details