<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/tools/testing/selftests/bpf/benchs, branch master</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=master</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2026-04-11T04:22:31Z</updated>
<entry>
<title>selftests/bpf: Remove kmalloc tracing from local storage create bench</title>
<updated>2026-04-11T04:22:31Z</updated>
<author>
<name>Amery Hung</name>
<email>ameryhung@gmail.com</email>
</author>
<published>2026-04-11T01:54:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=78ee02a966ad76966be516ed3d56860d7a58fe7e'/>
<id>urn:sha1:78ee02a966ad76966be516ed3d56860d7a58fe7e</id>
<content type='text'>
Remove the raw_tp/kmalloc BPF program and its associated reporting from
the local storage create benchmark. The kmalloc count per create is not
a useful metric as different code paths use different allocators (e.g.
kmalloc_nolock vs kzalloc), introducing noise that makes the number
hard to interpret.

Keep total_creates in the summary output as it is useful for normalizing
perf statistics collected alongside the benchmark.

Signed-off-by: Amery Hung &lt;ameryhung@gmail.com&gt;
Link: https://lore.kernel.org/r/20260411015419.114016-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests/bpf: Add usdt trigger bench</title>
<updated>2026-03-03T16:39:22Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2026-02-24T10:39:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0c4fc6bd61054a9378bce149b3758f9b6e8fb5ab'/>
<id>urn:sha1:0c4fc6bd61054a9378bce149b3758f9b6e8fb5ab</id>
<content type='text'>
Adding usdt trigger bench for usdt:
 trig-usdt-nop  - usdt on top of nop1 instruction
 trig-usdt-nop5 - usdt on top of nop1/nop5 combo

Adding it to benchs/run_bench_uprobes.sh script.

Example run on x86_64 kernel with uprobe syscall:

  # ./benchs/run_bench_uprobes.sh
  usermode-count :  152.507 ± 0.098M/s
  syscall-count  :   14.309 ± 0.093M/s
  uprobe-nop     :    3.190 ± 0.012M/s
  uprobe-push    :    3.057 ± 0.004M/s
  uprobe-ret     :    1.095 ± 0.009M/s
  uprobe-nop5    :    7.305 ± 0.034M/s
  uretprobe-nop  :    2.175 ± 0.005M/s
  uretprobe-push :    2.109 ± 0.003M/s
  uretprobe-ret  :    0.945 ± 0.002M/s
  uretprobe-nop5 :    3.530 ± 0.006M/s
  usdt-nop       :    3.235 ± 0.008M/s   &lt;-- added
  usdt-nop5      :    7.511 ± 0.045M/s   &lt;-- added

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: https://lore.kernel.org/r/20260224103915.1369690-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests/bpf: Refactor bpf_get_ksyms() trace helper</title>
<updated>2026-02-24T16:19:49Z</updated>
<author>
<name>Ihor Solodrai</name>
<email>ihor.solodrai@linux.dev</email>
</author>
<published>2026-02-23T19:07:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a1a771bd649212ef32cf9b0bcc63213a762d354a'/>
<id>urn:sha1:a1a771bd649212ef32cf9b0bcc63213a762d354a</id>
<content type='text'>
ASAN reported a memory leak in bpf_get_ksyms(): it allocates a struct
ksyms internally and never frees it.

Move struct ksyms to trace_helpers.h and return it from the
bpf_get_ksyms(), giving ownership to the caller. Add filtered_syms and
filtered_cnt fields to the ksyms to hold the filtered array of
symbols, previously returned by bpf_get_ksyms().

Fixup the call sites: kprobe_multi_test and bench_trigger.

Signed-off-by: Ihor Solodrai &lt;ihor.solodrai@linux.dev&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/r/20260223190736.649171-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
</entry>
<entry>
<title>selftests/bpf: Allow to benchmark trigger with stacktrace</title>
<updated>2026-01-30T21:40:09Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2026-01-26T21:18:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4173b494d93a8057d3ed23e65853cd76b647f870'/>
<id>urn:sha1:4173b494d93a8057d3ed23e65853cd76b647f870</id>
<content type='text'>
Adding support to call bpf_get_stackid helper from trigger programs,
so far added for kprobe multi.

Adding the --stacktrace/-g option to enable it.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20260126211837.472802-7-jolsa@kernel.org
</content>
</entry>
<entry>
<title>selftests/bpf: Add perfbuf multi-producer benchmark</title>
<updated>2026-01-20T19:37:25Z</updated>
<author>
<name>Gyutae Bae</name>
<email>gyutae.bae@navercorp.com</email>
</author>
<published>2026-01-20T09:07:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2e6690d4f7fc41c4fae7d0a4c0bf11f1973e5650'/>
<id>urn:sha1:2e6690d4f7fc41c4fae7d0a4c0bf11f1973e5650</id>
<content type='text'>
Add a multi-producer benchmark for perfbuf to complement the existing
ringbuf multi-producer test. Unlike ringbuf which uses a shared buffer
and experiences contention, perfbuf uses per-CPU buffers so the test
measures scaling behavior rather than contention.

This allows developers to compare perfbuf vs ringbuf performance under
multi-producer workloads when choosing between the two for their systems.

Signed-off-by: Gyutae Bae &lt;gyutae.bae@navercorp.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20260120090716.82927-1-gyutae.opensource@navercorp.com
</content>
</entry>
<entry>
<title>selftests/bpf: Call bpf_get_numa_node_id() in trigger_count()</title>
<updated>2025-11-25T22:32:50Z</updated>
<author>
<name>Menglong Dong</name>
<email>menglong8.dong@gmail.com</email>
</author>
<published>2025-11-16T01:42:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f2cb0660ac99b093d833ddff46a0d046396d3d4c'/>
<id>urn:sha1:f2cb0660ac99b093d833ddff46a0d046396d3d4c</id>
<content type='text'>
The bench test "trig-kernel-count" can be used as a baseline comparison
for fentry and other benchmarks, and the calling to bpf_get_numa_node_id()
should be considered as composition of the baseline. So, let's call it in
trigger_count(). Meanwhile, rename trigger_count() to
trigger_kernel_count() to make it easier understand.

Signed-off-by: Menglong Dong &lt;dongml2@chinatelecom.cn&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
</content>
</entry>
<entry>
<title>selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer</title>
<updated>2025-10-28T02:47:32Z</updated>
<author>
<name>Xu Kuohai</name>
<email>xukuohai@huawei.com</email>
</author>
<published>2025-10-18T03:57:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9db3a38224ec560d7adc5f2163946839d1b649f'/>
<id>urn:sha1:f9db3a38224ec560d7adc5f2163946839d1b649f</id>
<content type='text'>
Add --rb-overwrite option to benchmark BPF ring buffer in overwrite mode.
Since overwrite mode is not yet supported by libbpf for consumer, also add
--rb-bench-producer option to benchmark producer directly without a consumer.

Benchmarks on an x86_64 and an arm64 CPU are shown below for reference.

- AMD EPYC 9654 (x86_64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    32.180 ± 0.033M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    9.617 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    8.810 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    9.272 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    9.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.086 ± 0.032M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   2.945 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   2.519 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   2.545 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.363 ± 0.024M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   2.357 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.267 ± 0.011M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.284 ± 0.020M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.215 ± 0.025M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.193 ± 0.023M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 52   2.208 ± 0.024M/s (drops 0.000 ± 0.000M/s)

- HiSilicon Kunpeng 920 (arm64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    14.478 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    21.787 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    6.045 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    5.352 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    4.850 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.542 ± 0.016M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   3.509 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   3.171 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   3.154 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.974 ± 0.015M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   3.167 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.903 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.866 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.914 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.806 ± 0.012M/s (drops 0.000 ± 0.000M/s)
Rb-prod nr_prod 52   2.840 ± 0.012M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai &lt;xukuohai@huawei.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20251018035738.4039621-4-xukuohai@huaweicloud.com
</content>
</entry>
<entry>
<title>selftests/bpf: Fix incorrect array size calculation</title>
<updated>2025-09-09T16:23:47Z</updated>
<author>
<name>Jiayuan Chen</name>
<email>jiayuan.chen@linux.dev</email>
</author>
<published>2025-09-09T12:47:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f85981327a90c51e76f60e073cb6648b2f167226'/>
<id>urn:sha1:f85981327a90c51e76f60e073cb6648b2f167226</id>
<content type='text'>
The loop in bench_sockmap_prog_destroy() has two issues:

1. Using 'sizeof(ctx.fds)' as the loop bound results in the number of
   bytes, not the number of file descriptors, causing the loop to iterate
   far more times than intended.

2. The condition 'ctx.fds[0] &gt; 0' incorrectly checks only the first fd for
   all iterations, potentially leaving file descriptors unclosed. Change
   it to 'ctx.fds[i] &gt; 0' to check each fd properly.

These fixes ensure correct cleanup of all file descriptors when the
benchmark exits.

Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Jiayuan Chen &lt;jiayuan.chen@linux.dev&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20250909124721.191555-1-jiayuan.chen@linux.dev

Closes: https://lore.kernel.org/bpf/aLqfWuRR9R_KTe5e@stanley.mountain/
</content>
</entry>
<entry>
<title>selftests/bpf: add benchmark testing for kprobe-multi-all</title>
<updated>2025-09-04T16:00:25Z</updated>
<author>
<name>Menglong Dong</name>
<email>menglong8.dong@gmail.com</email>
</author>
<published>2025-09-04T02:10:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a85d888768ea0e024dcc9d5fb172e7be8fd7d631'/>
<id>urn:sha1:a85d888768ea0e024dcc9d5fb172e7be8fd7d631</id>
<content type='text'>
For now, the benchmark for kprobe-multi is single, which means there is
only 1 function is hooked during testing. Add the testing
"kprobe-multi-all", which will hook all the kernel functions during
the benchmark. And the "kretprobe-multi-all" is added too.

Signed-off-by: Menglong Dong &lt;dongml2@chinatelecom.cn&gt;
Link: https://lore.kernel.org/r/20250904021011.14069-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests/bpf: Add LPM trie microbenchmarks</title>
<updated>2025-08-28T00:28:14Z</updated>
<author>
<name>Matt Fleming</name>
<email>mfleming@cloudflare.com</email>
</author>
<published>2025-08-27T14:01:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=737433c6a559c4e8acb065cfe9b6e2ff45ad655c'/>
<id>urn:sha1:737433c6a559c4e8acb065cfe9b6e2ff45ad655c</id>
<content type='text'>
Add benchmarks for the standard set of operations: LOOKUP, INSERT,
UPDATE, DELETE. Also include benchmarks to measure the overhead of the
bench framework itself (NOOP) as well as the overhead of generating keys
(BASELINE). Lastly, this includes a benchmark for FREE (trie_free())
which is known to have terrible performance for maps with many entries.

Benchmarks operate on tries without gaps in the key range, i.e. each
test begins or ends with a trie with valid keys in the range [0,
nr_entries). This is intended to cause maximum branching when traversing
the trie.

LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace
using bpf_map_update_batch() and run the corresponding benchmark
operation via bpf_loop(). INSERT starts with an empty map and fills it
kernel-side from bpf_loop(). FREE records the time to free a filled LPM
trie by attaching and destroying a BPF prog. NOOP measures the overhead
of the test harness by running an empty function with bpf_loop().
BASELINE is similar to NOOP except that the function generates a key.

Each operation runs 10,000 times using bpf_loop(). Note that this value
is intentionally independent of the number of entries in the LPM trie so
that the stability of the results isn't affected by the number of
entries.

For those benchmarks that need to reset the LPM trie once it's full
(INSERT) or empty (DELETE), throughput and latency results are scaled by
the fraction of a second the operation actually ran to ignore any time
spent reinitialising the trie.

By default, benchmarks run using sequential keys in the range [0,
nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the
--random parameter but beware there is a runtime cost involved in
generating random keys. Other benchmarks are prohibited from using
random keys because it can skew the results, e.g. when inserting an
existing key or deleting a missing one.

All measurements are recorded from within the kernel to eliminate
syscall overhead. Most benchmarks run an XDP program to generate stats
but FREE needs to collect latencies using fentry/fexit on
map_free_deferred() because it's not possible to use fentry directly on
lpm_trie.c since commit c83508da5620 ("bpf: Avoid deadlock caused by
nested kprobe and fentry bpf programs") and there's no way to
create/destroy a map from within an XDP program.

Here is example output from an AMD EPYC 9684X 96-Core machine for each
of the benchmarks using a trie with 10K entries and a 32-bit prefix
length, e.g.

  $ ./bench lpm-trie-$op \
  	--prefix_len=32  \
	--producers=1     \
	--nr_entries=10000

     noop: throughput   74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency   13.438 ns/op
 baseline: throughput   70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency   14.264 ns/op
   lookup: throughput    8.467 ± 0.047 M ops/s (  8.467M ops/prod), latency  118.109 ns/op
   insert: throughput    2.440 ± 0.015 M ops/s (  2.440M ops/prod), latency  409.290 ns/op
   update: throughput    2.806 ± 0.042 M ops/s (  2.806M ops/prod), latency  356.322 ns/op
   delete: throughput    4.625 ± 0.011 M ops/s (  4.625M ops/prod), latency  215.613 ns/op
     free: throughput    0.578 ± 0.006 K ops/s (  0.578K ops/prod), latency    1.730 ms/op

And the same benchmarks using random keys:

  $ ./bench lpm-trie-$op \
  	--prefix_len=32  \
	--producers=1     \
	--nr_entries=10000 \
	--random

     noop: throughput   74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency   13.466 ns/op
 baseline: throughput   35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency   28.450 ns/op
   lookup: throughput    7.119 ± 0.048 M ops/s (  7.119M ops/prod), latency  140.469 ns/op
   insert: N/A
   update: throughput    2.736 ± 0.012 M ops/s (  2.736M ops/prod), latency  365.523 ns/op
   delete: N/A
     free: N/A

Signed-off-by: Matt Fleming &lt;mfleming@cloudflare.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
