summaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts
diff options
context:
space:
mode:
authorLi Zhe <lizhe.67@bytedance.com>2025-06-06 10:37:42 +0800
committerAndrew Morton <akpm@linux-foundation.org>2025-07-09 22:42:06 -0700
commita03db236aebfaeadf79396dbd570896b870bda01 (patch)
tree0de4225e6e6c12c3eb63bd2c055b77ab9e88310f /tools/perf/scripts
parent96d81e4766f9e88b66a0502b5a7f34a4c20ac754 (diff)
downloadlinux-a03db236aebfaeadf79396dbd570896b870bda01.tar.gz
linux-a03db236aebfaeadf79396dbd570896b870bda01.zip
gup: optimize longterm pin_user_pages() for large folio
In the current implementation of longterm pin_user_pages(), we invoke collect_longterm_unpinnable_folios(). This function iterates through the list to check whether each folio belongs to the "longterm_unpinnabled" category. The folios in this list essentially correspond to a contiguous region of userspace addresses, with each folio representing a physical address in increments of PAGESIZE. If this userspace address range is mapped with large folio, we can optimize the performance of function collect_longterm_unpinnable_folios() by reducing the using of READ_ONCE() invoked in pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify the logic of collect_longterm_unpinnable_folios(). Instead of comparing with prev_folio after calling pofs_get_folio(), we can check whether the next page is within the same folio. The performance test results, based on v6.15, obtained through the gup_test tool from the kernel source tree are as follows. We achieve an improvement of over 66% for large folio with pagesize=2M. For small folio, we have only observed a very slight degradation in performance. Without this patch: [root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 With this patch: [root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [lizhe.67@bytedance.com: whitespace fix, per David] Link: https://lkml.kernel.org/r/20250606091917.91384-1-lizhe.67@bytedance.com Link: https://lkml.kernel.org/r/20250606023742.58344-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'tools/perf/scripts')
0 files changed, 0 insertions, 0 deletions