<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-09-10T19:34:09Z</updated>
<entry>
<title>bpf: Reject bpf_timer for PREEMPT_RT</title>
<updated>2025-09-10T19:34:09Z</updated>
<author>
<name>Leon Hwang</name>
<email>leon.hwang@linux.dev</email>
</author>
<published>2025-09-10T12:57:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e25ddfb388c8b7e5f20e3bf38d627fb485003781'/>
<id>urn:sha1:e25ddfb388c8b7e5f20e3bf38d627fb485003781</id>
<content type='text'>
When enable CONFIG_PREEMPT_RT, the kernel will warn when run timer
selftests by './test_progs -t timer':

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48

In order to avoid such warning, reject bpf_timer in verifier when
PREEMPT_RT is enabled.

Signed-off-by: Leon Hwang &lt;leon.hwang@linux.dev&gt;
Link: https://lore.kernel.org/r/20250910125740.52172-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
</entry>
<entry>
<title>bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()</title>
<updated>2025-09-09T22:24:34Z</updated>
<author>
<name>Peilin Ye</name>
<email>yepeilin@google.com</email>
</author>
<published>2025-09-09T09:52:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6d78b4473cdb08b74662355a9e8510bde09c511e'/>
<id>urn:sha1:6d78b4473cdb08b74662355a9e8510bde09c511e</id>
<content type='text'>
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:

...
 [10.011566]  do_raw_spin_lock.cold
 [10.011570]  try_to_wake_up             (5) double-acquiring the same
 [10.011575]  kick_pool                      rq_lock, causing a hardlockup
 [10.011579]  __queue_work
 [10.011582]  queue_work_on
 [10.011585]  kernfs_notify
 [10.011589]  cgroup_file_notify
 [10.011593]  try_charge_memcg           (4) memcg accounting raises an
 [10.011597]  obj_cgroup_charge_pages        MEMCG_MAX event
 [10.011599]  obj_cgroup_charge_account
 [10.011600]  __memcg_slab_post_alloc_hook
 [10.011603]  __kmalloc_node_noprof
...
 [10.011611]  bpf_map_kmalloc_node
 [10.011612]  __bpf_async_init
 [10.011615]  bpf_timer_init             (3) BPF calls bpf_timer_init()
 [10.011617]  bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
 [10.011619]  bpf__sched_ext_ops_runnable
 [10.011620]  enqueue_task_scx           (2) BPF runs with rq_lock held
 [10.011622]  enqueue_task
 [10.011626]  ttwu_do_activate
 [10.011629]  sched_ttwu_pending         (1) grabs rq_lock
...

The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.

We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.

As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.

Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/

v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/

Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Peilin Ye &lt;yepeilin@google.com&gt;
Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Allow fall back to interpreter for programs with stack size &lt;= 512</title>
<updated>2025-09-09T22:12:16Z</updated>
<author>
<name>KaFai Wan</name>
<email>kafai.wan@linux.dev</email>
</author>
<published>2025-09-09T14:46:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df0cb5cb50bd54d3cd4d0d83417ceec6a66404aa'/>
<id>urn:sha1:df0cb5cb50bd54d3cd4d0d83417ceec6a66404aa</id>
<content type='text'>
OpenWRT users reported regression on ARMv6 devices after updating to latest
HEAD, where tcpdump filter:

tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \
and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \
and not ether host a84b4dedf471 and not ether host d022be17e1d7 \
and not ether host 5c497967208b and not ether host 706655784d5b"

fails with warning: "Kernel filter failed: No error information"
when using config:
 # CONFIG_BPF_JIT_ALWAYS_ON is not set
 CONFIG_BPF_JIT_DEFAULT_ON=y

The issue arises because commits:
1. "bpf: Fix array bounds error with may_goto" changed default runtime to
   __bpf_prog_ret0_warn when jit_requested = 1
2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when
   jit_requested = 1 but jit fails

This change restores interpreter fallback capability for BPF programs with
stack size &lt;= 512 bytes when jit fails.

Reported-by: Felix Fietkau &lt;nbd@nbd.name&gt;
Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/
Fixes: 6ebc5030e0c5 ("bpf: Fix array bounds error with may_goto")
Signed-off-by: KaFai Wan &lt;kafai.wan@linux.dev&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
</entry>
<entry>
<title>rqspinlock: Choose trylock fallback for NMI waiters</title>
<updated>2025-09-09T22:10:28Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2025-09-09T18:49:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0d80e7f951be1bdd08d328fd87694be0d6e8aaa8'/>
<id>urn:sha1:0d80e7f951be1bdd08d328fd87694be0d6e8aaa8</id>
<content type='text'>
Currently, out of all 3 types of waiters in the rqspinlock slow path
(i.e., pending bit waiter, wait queue head waiter, and wait queue
non-head waiter), only the pending bit waiter and wait queue head
waiters apply deadlock checks and a timeout on their waiting loop. The
assumption here was that the wait queue head's forward progress would be
sufficient to identify cases where the lock owner or pending bit waiter
is stuck, and non-head waiters relying on the head waiter would prove to
be sufficient for their own forward progress.

However, the head waiter itself can be preempted by a non-head waiter
for the same lock (AA) or a different lock (ABBA) in a manner that
impedes its forward progress. In such a case, non-head waiters not
performing deadlock and timeout checks becomes insufficient, and the
system can enter a state of lockup.

This is typically not a concern with non-NMI lock acquisitions, as lock
holders which in run in different contexts (IRQ, non-IRQ) use "irqsave"
variants of the lock APIs, which naturally excludes such lock holders
from preempting one another on the same CPU.

It might seem likely that a similar case may occur for rqspinlock when
programs are attached to contention tracepoints (begin, end), however,
these tracepoints either precede the enqueue into the wait queue, or
succeed it, therefore cannot be used to preempt a head waiter's waiting
loop.

We must still be careful against nested kprobe and fentry programs that
may attach to the middle of the head's waiting loop to stall forward
progress and invoke another rqspinlock acquisition that proceeds as a
non-head waiter. To this end, drop CC_FLAGS_FTRACE from the rqspinlock.o
object file.

For now, this issue is resolved by falling back to a repeated trylock on
the lock word from NMI context, while performing the deadlock checks to
break out early in case forward progress is impossible, and use the
timeout as a final fallback.

A more involved fix to terminate the queue when such a condition occurs
will be made as a follow up. A selftest to stress this aspect of nested
NMI/non-NMI locking attempts will be added in a subsequent patch to the
bpf-next tree when this fix lands and trees are synchronized.

Reported-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Fixes: 164c246571e9 ("rqspinlock: Protect waiters in queue from stalls")
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20250909184959.3509085-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
</entry>
<entry>
<title>bpf: Fix bpf_strnstr() to handle suffix match cases better</title>
<updated>2025-09-09T22:07:58Z</updated>
<author>
<name>Rong Tao</name>
<email>rongtao@cestc.cn</email>
</author>
<published>2025-08-29T16:31:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7edfc024708258d75f65fadffd7e5f6ac46810b6'/>
<id>urn:sha1:7edfc024708258d75f65fadffd7e5f6ac46810b6</id>
<content type='text'>
bpf_strnstr() should not treat the ending '\0' of s2 as a matching character
if the parameter 'len' equal to s2 string length, for example:

    1. bpf_strnstr("openat", "open", 4) = -ENOENT
    2. bpf_strnstr("openat", "open", 5) = 0

This patch makes (1) return 0, fix just the `len == strlen(s2)` case.

And fix a more general case when s2 is a suffix of the first len
characters of s1.

Fixes: e91370550f1f ("bpf: Add kfuncs for read-only string operations")
Signed-off-by: Rong Tao &lt;rongtao@cestc.cn&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/tencent_17DC57B9D16BC443837021BEACE84B7C1507@qq.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt</title>
<updated>2025-09-09T22:07:57Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2025-08-29T14:36:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9bb6ffa7f5ad0f8ee0f53fc4a10655872ee4a14'/>
<id>urn:sha1:f9bb6ffa7f5ad0f8ee0f53fc4a10655872ee4a14</id>
<content type='text'>
Stanislav reported that in bpf_crypto_crypt() the destination dynptr's
size is not validated to be at least as large as the source dynptr's
size before calling into the crypto backend with 'len = src_len'. This
can result in an OOB write when the destination is smaller than the
source.

Concretely, in mentioned function, psrc and pdst are both linear
buffers fetched from each dynptr:

  psrc = __bpf_dynptr_data(src, src_len);
  [...]
  pdst = __bpf_dynptr_data_rw(dst, dst_len);
  [...]
  err = decrypt ?
        ctx-&gt;type-&gt;decrypt(ctx-&gt;tfm, psrc, pdst, src_len, piv) :
        ctx-&gt;type-&gt;encrypt(ctx-&gt;tfm, psrc, pdst, src_len, piv);

The crypto backend expects pdst to be large enough with a src_len length
that can be written. Add an additional src_len &gt; dst_len check and bail
out if it's the case. Note that these kfuncs are accessible under root
privileges only.

Fixes: 3e1c6f35409f ("bpf: make common crypto API for TC/XDP programs")
Reported-by: Stanislav Fort &lt;disclosure@aisle.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Vadim Fedorenko &lt;vadim.fedorenko@linux.dev&gt;
Reviewed-by: Vadim Fedorenko &lt;vadim.fedorenko@linux.dev&gt;
Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Check the helper function is valid in get_helper_proto</title>
<updated>2025-08-15T09:16:56Z</updated>
<author>
<name>Jiri Olsa</name>
<email>olsajiri@gmail.com</email>
</author>
<published>2025-08-14T20:06:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e4414b01c1cd9887bbde92f946c1ba94e40d6d64'/>
<id>urn:sha1:e4414b01c1cd9887bbde92f946c1ba94e40d6d64</id>
<content type='text'>
kernel test robot reported verifier bug [1] where the helper func
pointer could be NULL due to disabled config option.

As Alexei suggested we could check on that in get_helper_proto
directly. Marking tail_call helper func with BPF_PTR_POISON,
because it is unused by design.

  [1] https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com

Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Reported-by: syzbot+a9ed3d9132939852d0df@syzkaller.appspotmail.com
Suggested-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Paul Chaignon &lt;paul.chaignon@gmail.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20250814200655.945632-1-jolsa@kernel.org
Closes: https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com
</content>
</entry>
<entry>
<title>bpf, cpumap: Disable page_pool direct xdp_return need larger scope</title>
<updated>2025-08-15T09:08:08Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2025-08-14T18:24:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2b986b9e917bc88f81aa1ed386af63b26c983f1d'/>
<id>urn:sha1:2b986b9e917bc88f81aa1ed386af63b26c983f1d</id>
<content type='text'>
When running an XDP bpf_prog on the remote CPU in cpumap code
then we must disable the direct return optimization that
xdp_return can perform for mem_type page_pool.  This optimization
assumes code is still executing under RX-NAPI of the original
receiving CPU, which isn't true on this remote CPU.

The cpumap code already disabled this via helpers
xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(),
but the scope didn't include xdp_do_flush().

When doing XDP_REDIRECT towards e.g devmap this causes the
function bq_xmit_all() to run with direct return optimization
enabled. This can lead to hard to find bugs.  The issue
only happens when bq_xmit_all() cannot ndo_xdp_xmit all
frames and them frees them via xdp_return_frame_rx_napi().

Fix by expanding scope to include xdp_do_flush(). This was found
by Dragos Tatulea.

Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap")
Reported-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Reported-by: Chris Arges &lt;carges@cloudflare.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Tested-by: Chris Arges &lt;carges@cloudflare.com&gt;
Link: https://patch.msgid.link/175519587755.3008742.1088294435150406835.stgit@firesoul
</content>
</entry>
<entry>
<title>bpf: Fix memory leak of bpf_scc_info objects</title>
<updated>2025-08-02T16:04:57Z</updated>
<author>
<name>Eduard Zingerman</name>
<email>eddyz87@gmail.com</email>
</author>
<published>2025-08-01T23:23:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1b30d44417278196a90c79244bb43e8428586345'/>
<id>urn:sha1:1b30d44417278196a90c79244bb43e8428586345</id>
<content type='text'>
env-&gt;scc_info array contains references to bpf_scc_info objects
allocated lazily in verifier.c:scc_visit_alloc().
env-&gt;scc_cnt was supposed to track env-&gt;scc_info array size
in order to free referenced objects in verifier.c:free_states().
Fix initialization of env-&gt;scc_cnt that was omitted in
verifier.c:compute_scc().

To reproduce the bug:
- build with CONFIG_DEBUG_KMEMLEAK
- boot and load bpf program with loops, e.g.:
  ./veristat -q pyperf180.bpf.o
- initiate memleak scan and check results:
  echo scan &gt; /sys/kernel/debug/kmemleak
  cat /sys/kernel/debug/kmemleak

Fixes: c9e31900b54c ("bpf: propagate read/precision marks over state graph backedges")
Reported-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Closes: https://lore.kernel.org/bpf/CAADnVQKXUWg9uRCPD5ebRXwN4dmBCRUFFM7kN=GxymYz3zU25A@mail.gmail.com/T/
Suggested-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Tested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/r/20250801232330.1800436-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Improve ctx access verifier error message</title>
<updated>2025-08-01T16:22:44Z</updated>
<author>
<name>Paul Chaignon</name>
<email>paul.chaignon@gmail.com</email>
</author>
<published>2025-08-01T09:49:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f914876eec9e72ae94b5cee81a9dc7935c255b2f'/>
<id>urn:sha1:f914876eec9e72ae94b5cee81a9dc7935c255b2f</id>
<content type='text'>
We've already had two "error during ctx access conversion" warnings
triggered by syzkaller. Let's improve the error message by dumping the
cnt variable so that we can more easily differentiate between the
different error cases.

Signed-off-by: Paul Chaignon &lt;paul.chaignon@gmail.com&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/r/cc94316c30dd76fae4a75a664b61a2dbfe68e205.1754039605.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
</entry>
</feed>
