頁面所有者:跟蹤每個頁面由誰分配¶
介紹¶
頁面所有者用於跟蹤每個頁面由誰分配。它可以用於除錯記憶體洩漏或找到記憶體佔用者。當發生分配時,關於分配的資訊,例如呼叫堆疊和頁面順序,將儲存到每個頁面的特定儲存中。當我們需要知道所有頁面的狀態時,我們可以獲取和分析這些資訊。
雖然我們已經有用於跟蹤頁面分配/釋放的跟蹤點,但使用它來分析誰分配了每個頁面相當複雜。我們需要擴大跟蹤緩衝區以防止重疊,直到使用者空間程式啟動。並且,啟動的程式會不斷轉儲跟蹤緩衝區以供以後分析,並且與僅將其儲存在記憶體中相比,它會以更多的可能性改變系統行為,因此對除錯不利。
頁面所有者也可以用於各種目的。例如,可以透過每個頁面的 gfp 標誌資訊獲得準確的碎片統計資訊。如果啟用頁面所有者,它已經實現並激活。歡迎更多用法。
它還可以用於顯示所有堆疊及其當前分配的基本頁面數量,這使我們可以快速瞭解記憶體的去向,而無需篩選所有頁面並匹配分配和釋放操作。
頁面所有者預設是停用的。因此,如果您想使用它,您需要將“page_owner=on”新增到您的啟動 cmdline。如果核心是用頁面所有者構建的,並且由於未啟用啟動選項而在執行時停用頁面所有者,則執行時開銷很小。如果在執行時停用,它不需要記憶體來儲存所有者資訊,因此沒有執行時記憶體開銷。並且,頁面所有者僅將兩個不太可能的分支插入到頁面分配器的熱路徑中,如果未啟用,則像沒有頁面所有者的核心一樣完成分配。這兩個不太可能的分支不應影響分配效能,特別是如果靜態鍵跳轉標籤修補功能可用。以下是由於此功能導致的核心程式碼大小更改。
雖然啟用頁面所有者會使核心大小增加幾千位元組,但大多數程式碼都在頁面分配器及其熱路徑之外。使用頁面所有者構建核心並在需要時開啟它將是除錯核心記憶體問題的好選擇。
有一個注意事項是由實現細節引起的。頁面所有者將資訊儲存到來自 struct page 擴充套件的記憶體中。此記憶體的初始化時間比頁面分配器在稀疏記憶體系統中啟動的時間晚一些,因此,在初始化之前,可以分配許多頁面,並且它們將沒有所有者資訊。為了修復它,這些早期分配的頁面被調查並在初始化階段標記為已分配。雖然這並不意味著他們擁有正確的所有者資訊,但至少我們可以更準確地判斷頁面是否已分配。在 2GB 記憶體 x86-64 VM box 上,捕獲並標記了 13343 個早期分配的頁面,儘管它們主要從 struct page 擴充套件功能分配。無論如何,在那之後,沒有頁面處於未跟蹤狀態。
用法¶
構建使用者空間助手
cd tools/mm make page_owner_sort
啟用頁面所有者:將“page_owner=on”新增到啟動 cmdline。
執行您要除錯的作業。
分析來自頁面所有者的資訊
cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt post_alloc_hook+0x177/0x1a0 get_page_from_freelist+0xd01/0xd80 __alloc_pages+0x39e/0x7e0 allocate_slab+0xbc/0x3f0 ___slab_alloc+0x528/0x8a0 kmem_cache_alloc+0x224/0x3b0 sk_prot_alloc+0x58/0x1a0 sk_alloc+0x32/0x4f0 inet_create+0x427/0xb50 __sock_create+0x2e4/0x650 inet_ctl_sock_create+0x30/0x180 igmp_net_init+0xc1/0x130 ops_init+0x167/0x410 setup_net+0x304/0xa60 copy_net_ns+0x29b/0x4a0 create_new_namespaces+0x4a1/0x820 nr_base_pages: 16 ... ... echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt cat stacks_7000.txt post_alloc_hook+0x177/0x1a0 get_page_from_freelist+0xd01/0xd80 __alloc_pages+0x39e/0x7e0 alloc_pages_mpol+0x22e/0x490 folio_alloc+0xd5/0x110 filemap_alloc_folio+0x78/0x230 page_cache_ra_order+0x287/0x6f0 filemap_get_pages+0x517/0x1160 filemap_read+0x304/0x9f0 xfs_file_buffered_read+0xe6/0x1d0 [xfs] xfs_file_read_iter+0x1f0/0x380 [xfs] __kernel_read+0x3b9/0x730 kernel_read_file+0x309/0x4d0 __do_sys_finit_module+0x381/0x730 do_syscall_64+0x8d/0x150 entry_SYSCALL_64_after_hwframe+0x62/0x6a nr_base_pages: 20824 ... cat /sys/kernel/debug/page_owner > page_owner_full.txt ./page_owner_sort page_owner_full.txt sorted_page_owner.txt
page_owner_full.txt的一般輸出如下Page allocated via order XXX, ... PFN XXX ... // Detailed stack Page allocated via order XXX, ... PFN XXX ... // Detailed stack By default, it will do full pfn dump, to start with a given pfn, page_owner supports fseek. FILE *fp = fopen("/sys/kernel/debug/page_owner", "r"); fseek(fp, pfn_start, SEEK_SET);page_owner_sort工具忽略PFN行,將剩餘的行放入 buf 中,使用 regexp 提取頁面順序值,計算 buf 的次數和頁面數,最後根據引數對其進行排序。在
sorted_page_owner.txt中檢視關於誰分配了每個頁面的結果。一般輸出XXX times, XXX pages: Page allocated via order XXX, ... // Detailed stack
預設情況下,
page_owner_sort根據 buf 的次數排序。如果要按 buf 的頁面數排序,請使用-m引數。詳細引數是基本功能
Sort: -a Sort by memory allocation time. -m Sort by total memory. -p Sort by pid. -P Sort by tgid. -n Sort by task command name. -r Sort by memory release time. -s Sort by stack trace. -t Sort by times (default). --sort <order> Specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is optional since default direction is increasing numerical or lexicographic order. Mixed use of abbreviated and complete-form of keys is allowed. Examples: ./page_owner_sort <input> <output> --sort=n,+pid,-tgid ./page_owner_sort <input> <output> --sort=at附加功能
Cull: --cull <rules> Specify culling rules.Culling syntax is key[,key[,...]].Choose a multi-letter key from the **STANDARD FORMAT SPECIFIERS** section. <rules> is a single argument in the form of a comma-separated list, which offers a way to specify individual culling rules. The recognized keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. <rules> can be specified by the sequence of keys k1,k2, ..., as described in the STANDARD SORT KEYS section below. Mixed use of abbreviated and complete-form of keys is allowed. Examples: ./page_owner_sort <input> <output> --cull=stacktrace ./page_owner_sort <input> <output> --cull=st,pid,name ./page_owner_sort <input> <output> --cull=n,f Filter: -f Filter out the information of blocks whose memory has been released. Select: --pid <pidlist> Select by pid. This selects the blocks whose process ID numbers appear in <pidlist>. --tgid <tgidlist> Select by tgid. This selects the blocks whose thread group ID numbers appear in <tgidlist>. --name <cmdlist> Select by task command name. This selects the blocks whose task command name appear in <cmdlist>. <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, which offers a way to specify individual selecting rules. Examples: ./page_owner_sort <input> <output> --pid=1 ./page_owner_sort <input> <output> --tgid=1,2,3 ./page_owner_sort <input> <output> --name name1,name2
標準格式說明符¶
For --sort option:
KEY LONG DESCRIPTION
p pid process ID
tg tgid thread group ID
n name task command name
st stacktrace stack trace of the page allocation
T txt full text of block
ft free_ts timestamp of the page when it was released
at alloc_ts timestamp of the page when it was allocated
ator allocator memory allocator for pages
For --cull option:
KEY LONG DESCRIPTION
p pid process ID
tg tgid thread group ID
n name task command name
f free whether the page has been released or not
st stacktrace stack trace of the page allocation
ator allocator memory allocator for pages