在運行 systemd 的系統上, 有時候關機時會碰到特定 unit 超時的問題. 尤其是多進程的 unit (例如 systemd-user-session.service 及相關的 units), 這時候要定位造成問題的進程會需要花點心思.

查找關機問題麻煩的是, 這時候進程會被一一的幹掉, 除非是 init (pid 1) 進程本身, 不然想要記錄這個過程基本都不太可行.

好在, Linux kernel 現在都內建有 ftrace, 在告訴 kernel 我們想 trace 哪些 events 後, log 會保留在 kernel 中的 ring buffer 裡, 這樣, 只要能在最後將 ring buffer 中的內容保留下來, 就可以用來分析定位問題.

保留的時間點, 在 systemd 可以很好的保證, 只要在以下位置放個 script, 並將執行權加上, systemd 在結束前就會先執行它完成我們的 log 保留需求

/lib/systemd/system-shutdown/debug.sh

例如 freedesktop.org 上的例子為將 dmesg 保留在 /

#!/bin/sh
mount -o remount,rw /
dmesg >/shutdown.log
mount -o remount,ro /

我們可以把上面的 script 改成保存 ftrace 的 ring buffer

#!/bin/sh
mount -o remount,rw /
cat /sys/kernel/debug/tracing/trace >/shutdown.log
mount -o remount,ro /

保存問題解決了, 接下來要考慮的是, 有哪些 events 對解決我們的問題有幫助. 第一個想到的就是進程的結束

# cd /sys/kernel/debug/tracing
# grep 'exit$' available_events
[...]
raw_syscalls:sys_exit
[...]
syscalls:sys_exit_exit
syscalls:sys_enter_exit
[...]
sched:sched_process_exit
[...]

上面幾個是可能的選項, syscall exit() 代表進程主動要求結束的開始, sched_process_exit 代表進程真正結束的時間點

# echo sys_exit_exit >set_events
# echo sys_exit >>set_events
# echo sched_process_exit >>set_events

另外, 進程有可能被動結束, 例如收到 signal 時

# grep signal available_events 
signal:signal_deliver
signal:signal_generate
syscalls:sys_exit_signalfd
syscalls:sys_enter_signalfd
syscalls:sys_exit_signalfd4
syscalls:sys_enter_signalfd4
[...]

# echo signal_deliver >>set_events

通過 signal_deliver 我們可以得知進程收到 signal 的時間點 (有可能進程收到但無做為). signal_generate 可得知 signal 實際產生的時間點及發送對象, 但接下來我們會記錄 kill() 調用時間, 所以就不記錄它了. 順便一提, kill() 可被用來發送 signal, 但當 signo 為 0 時也可以用來 check 特定 pid 是否存在 (但不會發出 signal, systemd 會做這樣的 check), 所以也可考慮加到記錄之列

# grep 'kill$' available_events 
syscalls:sys_exit_tkill
syscalls:sys_enter_tkill
syscalls:sys_exit_tgkill
syscalls:sys_enter_tgkill
syscalls:sys_exit_kill
syscalls:sys_enter_kill

# echo sys_enter_tkill >>set_events
# echo sys_enter_tgkill >>set_events
# echo sys_enter_kill >>set_events

最終的 script 如下

#!/bin/bash
set -e
PS4="> ${0##*/}: "
set -x

cd /sys/kernel/debug/tracing
echo 0 >tracing_on
echo >trace

echo syscalls:sys_enter_exit >set_event
echo syscalls:sys_enter_kill >>set_event
echo syscalls:sys_enter_tkill >>set_event
echo syscalls:sys_enter_tgkill >>set_event
echo signal:signal_deliver >>set_event
echo sched:sched_process_exit >>set_event

echo global >trace_clock
echo 40960 >buffer_size_kb
echo nop >current_tracer
echo 1 >tracing_on
  • echo 0 >tracing_on 停止 ftrace
  • echo >trace 清空 ring buffer
  • echo global >trace_clock 指定 global clock, 會比 per CPU local clock 慢些, 但回報的時間會在 CPU 間同步
  • echo 40960 >buffer_size_kb 指定總的 ring buffer size (所有 CPU 個別的 ring buffer 加總後的大小)
  • echo nop >current_tracer 用最簡單的 nop tracer, 對以上需求而言足夠了
  • echo 1 >tracing_on 啟動 ftrace, 開始記錄我們要求的 events

Log 看起來大致如下

# tracer: nop
#
# entries-in-buffer/entries-written: 3497/3497   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           <...>-2146  [000] ....   119.038270: sched_process_exit: comm=set-events pid=2146 prio=120
           <...>-2145  [001] d...   119.038350: signal_deliver: sig=17 errno=0 code=262145 sa_handler=556fb690de40 sa_flags=24000004
           <...>-2145  [001] ....   119.038835: sched_process_exit: comm=sudo pid=2145 prio=120
           <...>-2131  [000] d...   119.039021: signal_deliver: sig=17 errno=0 code=262145 sa_handler=4479c0 sa_flags=14000000
            Xorg-1747  [003] d...   119.874999: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
            Xorg-1747  [003] d...   119.902560: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
            Xorg-1747  [003] d...   119.904339: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
           <...>-1844  [001] ....   119.938338: sys_exit(error_code: 0)
           <...>-1844  [001] ....   119.938348: sched_process_exit: comm=pool pid=1844 prio=120
            Xorg-1747  [003] d...   120.115103: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
            Xorg-1747  [003] d...   120.116538: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
            Xorg-1747  [003] d...   120.186543: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
            Xorg-1747  [003] d...   120.186688: signal_deliver: sig=29 errno=0 code=128 sa_handler=55e099e095a0 sa_flags=14000000
            Xorg-1747  [000] d...   120.327975: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.399263: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.404417: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.409596: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.414742: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.419907: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.447898: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.469599: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
            Xorg-1747  [002] d...   120.481394: signal_deliver: sig=14 errno=0 code=128 sa_handler=55e099f02da0 sa_flags=14000000
           <...>-2046  [003] ....   120.486334: sys_exit(error_code: 0)
           <...>-2046  [003] ....   120.486339: sched_process_exit: comm=pool pid=2046 prio=120
           <...>-1927  [000] ....   120.492760: sys_exit(error_code: 0)
           <...>-1927  [000] ....   120.492770: sched_process_exit: comm=pool pid=1927 prio=120
           <...>-2011  [001] ....   120.744878: sys_exit(error_code: 0)
           <...>-2011  [001] ....   120.744888: sched_process_exit: comm=pool pid=2011 prio=120
           <...>-2016  [000] ....   120.792097: sys_exit(error_code: 0)
           <...>-2016  [000] ....   120.792106: sched_process_exit: comm=pool pid=2016 prio=120
           <...>-2027  [001] ....   120.812743: sys_exit(error_code: 0)
           <...>-2027  [001] ....   120.812752: sched_process_exit: comm=pool pid=2027 prio=120
           <...>-2034  [002] ....   120.874853: sys_exit(error_code: 0)
           <...>-2034  [002] ....   120.874862: sched_process_exit: comm=pool pid=2034 prio=120
[...]

需要處理過才比較容易理解. 想瞭解的資訊有

  • 進程什麼時候結束的?
  • 為什麼結束的? exit() or kill()?
  • 收到了些什麼 signals? 被要求結束時有好好的結束嗎?

Script 如下

#!/bin/bash
declare -A kill_time
declare -A exit_time
declare -a exit_seq
declare -A pid_comm
declare -A wait_time
declare -A killer
declare -A recv_sig

# ts, pid, signo
set_kill_time()
{
    if [[ "${kill_time[$2]}" ]]; then
        return
    fi

    case "$3" in
        0 | 1 | 2 | 3 | 6 | 9 | 15)
            kill_time[$2]=$1
        ;;
    esac
}

# ts, pid
set_exit_time()
{
    if [[ "${exit_time[$2]}" ]]; then
        return
    fi

    exit_time[$2]=$1
    exit_seq+=($pid)
}

# pid, target_pid, signo
add_killer()
{
    killer[$1]+=" $3>$target_pid"
}

#ts, pid, signo
add_signal()
{
    recv_sig[$2]+=" $3($1)"
}

declare -i lineno=0
while read line; do
    let lineno++
    if [[ 0 == ${#line} ]] || [[ '#' == ${line:0:1} ]]; then
        continue
    fi

    cols=($line)
    comm=${cols[0]%-*}
    pid=${cols[0]#$comm-}
    ts=${cols[3]%:}
    ts=${ts//./}
    func=${cols[4]%(*}
    func=${func%:}

    if [[ -z "$shut_time" ]]; then
        shut_time=$ts
        ts=0
    else
        zeros=000000
        ts=$(($ts-$shut_time))
        if (( 6 >= ${#ts} )); then
            ts=${zeros:0:7-${#ts}}$ts
        fi
        ts=${ts:0:${#ts}-6}.${ts:${#ts}-6}
    fi

    case "$func" in
        sched_process_exit)
            set_exit_time $ts $pid
            comm="${cols[5]#comm=}"
        ;;
        sys_kill | sys_tgkill)
            if [[ '->' == "${cols[5]}" ]]; then
                if [[ '0x0' != "${cols[6]}" ]]; then
                    target_pid=${killer[$pid]##*>}
                    if [[ "$target_pid" ]]; then
                        add_killer $pid $target_pid $signo
                        set_exit_time $ts $target_pid
                    fi
                fi
            else
                let target_pid=0x${cols[5]%,}
                if [[ 'sys_kill' == "$func" ]]; then
                    let signo=0x${cols[7]%)}
                else
                    let signo=0x${cols[9]%)}
                fi
                set_kill_time $ts $target_pid $signo
                add_killer $pid $target_pid $signo
            fi
        ;;
        signal_deliver)
            let signo=${cols[5]#sig=}
            add_signal $ts $pid $signo
        ;;
        sys_exit)
            set_kill_time $ts $pid -
        ;;
    esac

    if [[ -z "${pid_comm[$pid]}" ]] || [[ '<...>' == "${pid_comm[$pid]}" ]]; then
        pid_comm[$pid]=$comm
    fi
done

for pid in "${exit_seq[@]}"; do
    printf "%15s(%4s) exited at %10ss, got signals: %s\n" ${pid_comm[$pid]} $pid ${exit_time[$pid]} "${recv_sig[$pid]}"
done

使用方式如下

$ ./fiilter <shutdown.log

結果如下

[...]
 scim-panel-gtk(2266) exited at  3.884117s, got signals: 
 scim-panel-gtk(2159) exited at  3.884164s, got signals:  1(3.150028) 15(3.150035)
at-spi2-registr(2173) exited at  3.885975s, got signals: 
           Xorg(1747) exited at  3.895634s, got signals:  29(0.836729) 29(0.864290) 29(0.866069) 29(1.076833) 29(1.078268) 29(1.148273) 29(1.148418) 14(1.289705) 14(1.360993
) 14(1.366147) 14(1.371326) 14(1.376472) 14(1.381637) 14(1.409628) 14(1.431329) 14(1.443124) 29(1.998043) 29(2.006202) 29(2.022365) 29(2.038427) 29(2.062376) 29(2.069829) 29
(2.077944) 29(2.086442) 29(2.101877) 29(2.109834) 29(2.117942) 29(2.125946) 29(2.136089) 29(2.148175) 29(2.158052) 29(2.165800) 29(2.174002) 29(2.190183) 29(2.198382) 29(2.2
08437) 29(2.219089) 29(2.228533) 29(2.237668) 29(2.246176) 29(2.255719) 29(2.270366) 29(2.285721) 29(2.293638) 29(2.303714) 29(2.313787) 29(2.324164) 29(2.333582) 29(2.34172
6) 29(2.352274) 29(2.361832) 29(2.372004) 29(2.381490) 29(2.398544) 29(2.413549) 29(2.421531) 29(2.431643) 29(2.441739) 29(2.451986) 29(2.461559) 29(2.470074) 29(2.479828) 2
9(2.489956) 29(2.500014) 29(2.509554) 29(2.525545) 29(2.533449) 29(2.541605) 29(2.549438) 29(2.557798) 29(2.573405) 29(2.582021) 29(2.589658) 29(2.597150) 29(2.607460) 29(2.
617566) 29(2.629573) 29(2.669550) 29(2.677281) 29(2.693330) 29(2.702131) 29(2.709302) 29(2.717498) 29(2.727314) 29(2.769571) 29(2.777288) 29(2.787306) 29(2.797253) 29(2.8051
86) 29(2.949550) 29(3.069041) 15(3.135771) 15(3.210278)
      plymouthd(2267) exited at  3.908589s, got signals: 
systemd-cgroups(2268) exited at  3.911000s, got signals: 
  systemd-udevd(2271) exited at  3.923018s, got signals: 
  systemd-udevd(2272) exited at  3.923192s, got signals: 
       plymouth(2270) exited at  3.928383s, got signals: 
systemd-cgroups(2273) exited at  3.944084s, got signals: 
  scim-im-agent(2274) exited at  4.532707s, got signals: 
  scim-im-agent(2055) exited at  5.234651s, got signals:  1(3.125658) 15(3.125659)
 scim-panel-gtk(2275) exited at 93.183315s, got signals:  9(93.182749)
 systemd-logind( 509) exited at 93.189001s, got signals:  9(93.188841)
systemd-cgroups(2279) exited at 93.191502s, got signals: 
systemd-cgroups(2278) exited at 93.195857s, got signals: 
systemd-user-se(2277) exited at 93.195867s, got signals: 
systemd-cgroups(2276) exited at 93.201005s, got signals: 
systemd-cgroups(2280) exited at 93.205846s, got signals: 
systemd-cgroups(2281) exited at 93.212758s, got signals: 
     sd-resolve( 493) exited at 93.230556s, got signals: 
systemd-timesyn( 489) exited at 93.230760s, got signals: 
systemd-random-(2284) exited at 93.231697s, got signals: 
systemd-cgroups(2282) exited at 93.235206s, got signals: 
systemd-update-(2283) exited at 93.236979s, got signals: 
[...]

可以看到, systemd-logind 因為用戶進程 scim-panel-gtk 未正常結束而等待著, 最終因為超時而被 SIGKILL 強制結束. 定位了問題進程後, 接下來就可以對它進行調適啦~