基於 Fprobe 的事件追蹤¶
概述¶
Fprobe 事件類似於 kprobe 事件,但僅限於在函式入口和出口處進行探測。這對於許多僅跟蹤某些特定函式的用例來說已經足夠了。
本文件還涵蓋了跟蹤點探針事件 (tprobe),因為它也僅適用於跟蹤點入口。使用者可以跟蹤部分跟蹤點引數,或者跟蹤沒有跟蹤事件的跟蹤點,這些跟蹤點在 tracefs 上沒有公開。
與其他動態事件一樣,fprobe 事件和跟蹤點探針事件透過 tracefs 上的 dynamic_events 介面檔案定義。
fprobe 事件概要¶
f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry
f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit
t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint
GRP1 : Group name for fprobe. If omitted, use "fprobes" for it.
GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it.
EVENT1 : Event name for fprobe. If omitted, the event name is
"SYM__entry" or "SYM__exit".
EVENT2 : Event name for tprobe. If omitted, the event name is
the same as "TRACEPOINT", but if the "TRACEPOINT" starts
with a digit character, "_TRACEPOINT" is used.
MAXACTIVE : Maximum number of instances of the specified function that
can be probed simultaneously, or 0 for the default value
as defined in Documentation/trace/fprobe.rst
FETCHARGS : Arguments. Each probe can have up to 128 args.
ARG : Fetch "ARG" function argument using BTF (only for function
entry or tracepoint.) (\*1)
@ADDR : Fetch memory at ADDR (ADDR should be in kernel)
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack : Fetch stack address.
$argN : Fetch the Nth function argument. (N >= 1) (\*2)
$retval : Fetch return value.(\*3)
$comm : Fetch current task comm.
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
\IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
(x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
and bitfield are supported.
(\*1) This is available only when BTF is enabled.
(\*2) only for the probe on function entry (offs == 0). Note, this argument access
is best effort, because depending on the argument type, it may be passed on
the stack. But this only support the arguments via registers.
(\*3) only for return probe. Note that this is also best effort. Depending on the
return value type, it might be passed via a pair of registers. But this only
accesses one register.
(\*4) this is useful for fetching a field of data structures.
(\*5) "u" means user-space dereference.
有關 TYPE 的詳細資訊,請參閱 kprobetrace 文件。
出口處的函式引數¶
可以在出口探針處使用 $arg<N> fetcharg 訪問函式引數。這對於一次記錄函式引數和返回值,以及跟蹤結構欄位的差異(用於除錯函式是否正確更新給定的資料結構)很有用。請參閱下面的 示例,瞭解其工作原理。
BTF 引數¶
BTF(BPF 型別格式)引數允許使用者透過名稱而不是 $argN 來跟蹤函式和跟蹤點引數。如果核心配置了 CONFIG_BPF_SYSCALL 和 CONFIG_DEBUG_INFO_BTF,則此功能可用。如果使用者僅指定 BTF 引數,則事件的引數名稱也會自動設定為給定名稱。
# echo 'f:myprobe vfs_read count pos' >> dynamic_events
# cat dynamic_events
f:fprobes/myprobe vfs_read count=count pos=pos
它還從 BTF 資訊中選擇提取型別。例如,在上面的示例中,count 是 unsigned long,pos 是一個指標。因此,兩者都轉換為 64 位 unsigned long,但只有 pos 具有 “%Lx” 列印格式,如下所示
# cat events/fprobes/myprobe/format
name: myprobe
ID: 1313
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
field:u64 count; offset:16; size:8; signed:0;
field:u64 pos; offset:24; size:8; signed:0;
print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos
如果使用者不確定引數的名稱,$arg* 將會很有用。$arg* 會擴充套件到函式或跟蹤點的所有函式引數。
# echo 'f:myprobe vfs_read $arg*' >> dynamic_events
# cat dynamic_events
f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos
BTF 也會影響 $retval。如果使用者未設定任何型別,則 retval 型別會自動從 BTF 中選取。如果函式返回 void,則 $retval 將被拒絕。
您可以使用箭頭運算子 ->(對於指標型別)和點運算子 .(對於資料結構型別)訪問資料結構的欄位。
# echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events
欄位訪問運算子 -> 和 . 可以組合使用,以訪問更深層的成員和成員指向的其他結構成員。例如 foo->bar.baz->qux 如果存在非命名聯合成員,您可以像 C 程式碼一樣直接訪問它。例如
struct {
union {
int a;
int b;
};
} *foo;
要訪問 a 和 b,在這種情況下請使用 foo->a 和 foo->b。
此資料欄位訪問可用於透過 $retval 返回的值,例如 $retval->name。
對於這些 BTF 引數和欄位,:string 和 :ustring 會改變行為。如果這些用於 BTF 引數或欄位,它會檢查引數或資料欄位的 BTF 型別是否為 char * 或 char []。如果不是,則拒絕應用字串型別。此外,透過 BTF 支援,您不需要記憶體解引用運算子 (+0(PTR)) 來訪問 PTR 指向的字串。它會根據 BTF 型別自動新增記憶體解引用運算子。例如:
# echo 't sched_switch prev->comm:string' >> dynamic_events
# echo 'f getname_flags%return $retval->name:string' >> dynamic_events
prev->comm 是資料結構中的嵌入式字元陣列,$retval->name 是資料結構中的字元指標。但在兩種情況下,您都可以使用 :string 型別來獲取字串。
使用示例¶
這是一個在 vfs_read() 函式入口和出口處新增 fprobe 事件的示例,帶有 BTF 引數。
# echo 'f vfs_read $arg*' >> dynamic_events
# echo 'f vfs_read%return $retval' >> dynamic_events
# cat dynamic_events
f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos
f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
# echo 1 > events/fprobes/enable
# head -n 20 trace | tail
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08
sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
您可以看到所有函式引數和返回值都被記錄為 signed int。
此外,這裡還有一個關於 sched_switch 跟蹤點的跟蹤點事件的示例。為了比較結果,這也啟用了 sched_switch traceevent。
# echo 't sched_switch $arg*' >> dynamic_events
# echo 1 > events/sched/sched_switch/enable
# echo 1 > events/tracepoints/sched_switch/enable
# echo > trace
# head -n 20 trace | tail
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1
<idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
<idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026
<idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
<idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
正如您所看到的,sched_switch 跟蹤事件顯示了熟的引數,另一方面,sched_switch 跟蹤點探針事件顯示了原始的引數。這意味著您可以訪問 prev 和 next 引數指向的 task 結構中的任何欄位值。
例如,通常不跟蹤 task_struct::start_time,但透過此 traceprobe 事件,您可以跟蹤該欄位,如下所示。
# echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events
# head -n 20 trace | tail
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526
sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
<idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
<idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
<idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
返回探針允許我們訪問某些函式的結果,這些函式返回錯誤程式碼,其結果透過函式引數傳遞,例如結構初始化函式。
例如,vfs_open() 會將檔案結構連結到 inode 並更新模式。您可以使用返回探針跟蹤這些更改。
# echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
# echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
# echo 1 > events/fprobes/enable
# cat trace
sh-131 [006] ...1. 1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0
sh-131 [006] ...1. 1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168
cat-143 [007] ...1. 1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
cat-143 [007] ...1. 1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28
cat-143 [007] ...1. 1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
cat-143 [007] ...1. 1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8
您可以看到 file::f_mode 和 file::f_inode 在 vfs_open() 中被更新。