<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf, branch v6.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-02-20T23:37:17Z</updated>
<entry>
<title>Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2025-02-20T23:37:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-02-20T23:37:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=319fc77f8f45a1b3dba15b0cc1a869778fd222f7'/>
<id>urn:sha1:319fc77f8f45a1b3dba15b0cc1a869778fd222f7</id>
<content type='text'>
Pull BPF fixes from Daniel Borkmann:

 - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
   (Alan Maguire)

 - Fix a missing allocation failure check in BPF verifier's
   acquire_lock_state (Kumar Kartikeya Dwivedi)

 - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
   to the raw_tp_null_args set (Kuniyuki Iwashima)

 - Fix a deadlock when freeing BPF cgroup storage (Abel Wu)

 - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
   (Andrii Nakryiko)

 - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
   accessing skb data not containing an Ethernet header (Shigeru
   Yoshida)

 - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)

 - Several BPF sockmap fixes to address incorrect TCP copied_seq
   calculations, which prevented correct data reads from recv(2) in user
   space (Jiayuan Chen)

 - Two fixes for BPF map lookup nullness elision (Daniel Xu)

 - Fix a NULL-pointer dereference from vmlinux BTF lookup in
   bpf_sk_storage_tracing_allowed (Jared Kangas)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests: bpf: test batch lookup on array of maps with holes
  bpf: skip non exist keys in generic_map_lookup_batch
  bpf: Handle allocation failure in acquire_lock_state
  bpf: verifier: Disambiguate get_constant_map_key() errors
  bpf: selftests: Test constant key extraction on irrelevant maps
  bpf: verifier: Do not extract constant map keys for irrelevant maps
  bpf: Fix softlockup in arena_map_free on 64k page kernel
  net: Add rx_skb of kfree_skb to raw_tp_null_args[].
  bpf: Fix deadlock when freeing cgroup storage
  selftests/bpf: Add strparser test for bpf
  selftests/bpf: Fix invalid flag of recv()
  bpf: Disable non stream socket for strparser
  bpf: Fix wrong copied_seq calculation
  strparser: Add read_sock callback
  bpf: avoid holding freeze_mutex during mmap operation
  bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
  selftests/bpf: Adjust data size to have ETH_HLEN
  bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
  bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
</content>
</entry>
<entry>
<title>bpf: skip non exist keys in generic_map_lookup_batch</title>
<updated>2025-02-19T01:27:37Z</updated>
<author>
<name>Yan Zhai</name>
<email>yan@cloudflare.com</email>
</author>
<published>2025-02-10T07:22:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5644c6b50ffee0a56c1e01430a8c88e34decb120'/>
<id>urn:sha1:5644c6b50ffee0a56c1e01430a8c88e34decb120</id>
<content type='text'>
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.

Rather, do not retry in generic_map_lookup_batch. If it finds a non
existing element, skip to the next key. This simple solution comes with
a price that transient errors may not be recovered, and the iteration
might cycle back to the first key under parallel deletion. For example,
Hou Tao &lt;houtao@huaweicloud.com&gt; pointed out a following scenario:

For LPM trie map:
(1) -&gt;map_get_next_key(map, prev_key, key) returns a valid key

(2) bpf_map_copy_value() return -ENOMENT
It means the key must be deleted concurrently.

(3) goto next_key
It swaps the prev_key and key

(4) -&gt;map_get_next_key(map, prev_key, key) again
prev_key points to a non-existing key, for LPM trie it will treat just
like prev_key=NULL case, the returned key will be duplicated.

With the retry logic, the iteration can continue to the key next to the
deleted one. But if we directly skip to the next key, the iteration loop
would restart from the first key for the lpm_trie type.

However, not all races may be recovered. For example, if current key is
deleted after instead of before bpf_map_copy_value, or if the prev_key
also gets deleted, then the loop will still restart from the first key
for lpm_tire anyway. For generic lookup it might be better to stay
simple, i.e. just skip to the next key. To guarantee that the output
keys are not duplicated, it is better to implement map type specific
batch operations, which can properly lock the trie and synchronize with
concurrent mutators.

Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
Signed-off-by: Yan Zhai &lt;yan@cloudflare.com&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Handle allocation failure in acquire_lock_state</title>
<updated>2025-02-08T02:17:07Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2025-02-06T10:54:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8784714d7f27045c7cb72456cf66705b73fbc804'/>
<id>urn:sha1:8784714d7f27045c7cb72456cf66705b73fbc804</id>
<content type='text'>
The acquire_lock_state function needs to handle possible NULL values
returned by acquire_reference_state, and return -ENOMEM.

Fixes: 769b0f1c8214 ("bpf: Refactor {acquire,release}_reference_state")
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20250206105435.2159977-24-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: verifier: Disambiguate get_constant_map_key() errors</title>
<updated>2025-02-07T23:45:44Z</updated>
<author>
<name>Daniel Xu</name>
<email>dxu@dxuuu.xyz</email>
</author>
<published>2025-02-04T17:25:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7968c6581507052c1c6484ee6c5cbe07381e2dbc'/>
<id>urn:sha1:7968c6581507052c1c6484ee6c5cbe07381e2dbc</id>
<content type='text'>
Refactor get_constant_map_key() to disambiguate the constant key
value from potential error values. In the case that the key is
negative, it could be confused for an error.

It's not currently an issue, as the verifier seems to track s32 spills
as u32. So even if the program wrongly uses a negative value for an
arraymap key, the verifier just thinks it's an impossibly high value
which gets correctly discarded.

Refactor anyways to make things cleaner and prevent potential future
issues.

Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Signed-off-by: Daniel Xu &lt;dxu@dxuuu.xyz&gt;
Link: https://lore.kernel.org/r/dfe144259ae7cfc98aa63e1b388a14869a10632a.1738689872.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: verifier: Do not extract constant map keys for irrelevant maps</title>
<updated>2025-02-07T23:45:43Z</updated>
<author>
<name>Daniel Xu</name>
<email>dxu@dxuuu.xyz</email>
</author>
<published>2025-02-04T17:25:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=884c3a18dadfda326dffa364477cc027728219de'/>
<id>urn:sha1:884c3a18dadfda326dffa364477cc027728219de</id>
<content type='text'>
Previously, we were trying to extract constant map keys for all
bpf_map_lookup_elem(), regardless of map type. This is an issue if the
map has a u64 key and the value is very high, as it can be interpreted
as a negative signed value. This in turn is treated as an error value by
check_func_arg() which causes a valid program to be incorrectly
rejected.

Fix by only extracting constant map keys for relevant maps. This fix
works because nullness elision is only allowed for {PERCPU_}ARRAY maps,
and keys for these are within u32 range. See next commit for an example
via selftest.

Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Reported-by: Marc Hartmayer &lt;mhartmay@linux.ibm.com&gt;
Reported-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Tested-by: Marc Hartmayer &lt;mhartmay@linux.ibm.com&gt;
Signed-off-by: Daniel Xu &lt;dxu@dxuuu.xyz&gt;
Link: https://lore.kernel.org/r/aa868b642b026ff87ba6105ea151bc8693b35932.1738689872.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix softlockup in arena_map_free on 64k page kernel</title>
<updated>2025-02-06T11:41:08Z</updated>
<author>
<name>Alan Maguire</name>
<email>alan.maguire@oracle.com</email>
</author>
<published>2025-02-05T17:00:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=517e8a7835e8cfb398a0aeb0133de50e31cae32b'/>
<id>urn:sha1:517e8a7835e8cfb398a0aeb0133de50e31cae32b</id>
<content type='text'>
On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y,
arena_htab tests cause a segmentation fault and soft lockup.
The same failure is not observed with 4k pages on aarch64.

It turns out arena_map_free() is calling
apply_to_existing_page_range() with the address returned by
bpf_arena_get_kern_vm_start().  If this address is not page-aligned
the code ends up calling apply_to_pte_range() with that unaligned
address causing soft lockup.

Fix it by round up GUARD_SZ to PAGE_SIZE &lt;&lt; 1 so that the
division by 2 in bpf_arena_get_kern_vm_start() returns
a page-aligned value.

Fixes: 317460317a02 ("bpf: Introduce bpf_arena.")
Reported-by: Colm Harrington &lt;colm.harrington@oracle.com&gt;
Suggested-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Alan Maguire &lt;alan.maguire@oracle.com&gt;
Link: https://lore.kernel.org/r/20250205170059.427458-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Add rx_skb of kfree_skb to raw_tp_null_args[].</title>
<updated>2025-02-06T10:46:00Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2025-02-01T03:01:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5da7e15fb5a12e78de974d8908f348e279922ce9'/>
<id>urn:sha1:5da7e15fb5a12e78de974d8908f348e279922ce9</id>
<content type='text'>
Yan Zhai reported a BPF prog could trigger a null-ptr-deref [0]
in trace_kfree_skb if the prog does not check if rx_sk is NULL.

Commit c53795d48ee8 ("net: add rx_sk to trace_kfree_skb") added
rx_sk to trace_kfree_skb, but rx_sk is optional and could be NULL.

Let's add kfree_skb to raw_tp_null_args[] to let the BPF verifier
validate such a prog and prevent the issue.

Now we fail to load such a prog:

  libbpf: prog 'drop': -- BEGIN PROG LOAD LOG --
  0: R1=ctx() R10=fp0
  ; int BPF_PROG(drop, struct sk_buff *skb, void *location, @ kfree_skb_sk_null.bpf.c:21
  0: (79) r3 = *(u64 *)(r1 +24)
  func 'kfree_skb' arg3 has btf_id 5253 type STRUCT 'sock'
  1: R1=ctx() R3_w=trusted_ptr_or_null_sock(id=1)
  ; bpf_printk("sk: %d, %d\n", sk, sk-&gt;__sk_common.skc_family); @ kfree_skb_sk_null.bpf.c:24
  1: (69) r4 = *(u16 *)(r3 +16)
  R3 invalid mem access 'trusted_ptr_or_null_'
  processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  -- END PROG LOAD LOG --

Note this fix requires commit 838a10bd2ebf ("bpf: Augment raw_tp
arguments with PTR_MAYBE_NULL").

[0]:
BUG: kernel NULL pointer dereference, address: 0000000000000010
 PF: supervisor read access in kernel mode
 PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
PREEMPT SMP
RIP: 0010:bpf_prog_5e21a6db8fcff1aa_drop+0x10/0x2d
Call Trace:
 &lt;TASK&gt;
 ? __die+0x1f/0x60
 ? page_fault_oops+0x148/0x420
 ? search_bpf_extables+0x5b/0x70
 ? fixup_exception+0x27/0x2c0
 ? exc_page_fault+0x75/0x170
 ? asm_exc_page_fault+0x22/0x30
 ? bpf_prog_5e21a6db8fcff1aa_drop+0x10/0x2d
 bpf_trace_run4+0x68/0xd0
 ? unix_stream_connect+0x1f4/0x6f0
 sk_skb_reason_drop+0x90/0x120
 unix_stream_connect+0x1f4/0x6f0
 __sys_connect+0x7f/0xb0
 __x64_sys_connect+0x14/0x20
 do_syscall_64+0x47/0xc30
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: c53795d48ee8 ("net: add rx_sk to trace_kfree_skb")
Reported-by: Yan Zhai &lt;yan@cloudflare.com&gt;
Closes: https://lore.kernel.org/netdev/Z50zebTRzI962e6X@debian.debian/
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Tested-by: Yan Zhai &lt;yan@cloudflare.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: https://lore.kernel.org/r/20250201030142.62703-1-kuniyu@amazon.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix deadlock when freeing cgroup storage</title>
<updated>2025-01-30T02:38:19Z</updated>
<author>
<name>Abel Wu</name>
<email>wuyun.abel@bytedance.com</email>
</author>
<published>2024-12-21T06:10:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c78f4afbd962f43a3989f45f3ca04300252b19b5'/>
<id>urn:sha1:c78f4afbd962f43a3989f45f3ca04300252b19b5</id>
<content type='text'>
The following commit
bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
first introduced deadlock prevention for fentry/fexit programs attaching
on bpf_task_storage helpers. That commit also employed the logic in map
free path in its v6 version.

Later bpf_cgrp_storage was first introduced in
c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
which faces the same issue as bpf_task_storage, instead of its busy
counter, NULL was passed to bpf_local_storage_map_free() which opened
a window to cause deadlock:

	&lt;TASK&gt;
		(acquiring local_storage-&gt;lock)
	_raw_spin_lock_irqsave+0x3d/0x50
	bpf_local_storage_update+0xd1/0x460
	bpf_cgrp_storage_get+0x109/0x130
	bpf_prog_a4d4a370ba857314_cgrp_ptr+0x139/0x170
	? __bpf_prog_enter_recur+0x16/0x80
	bpf_trampoline_6442485186+0x43/0xa4
	cgroup_storage_ptr+0x9/0x20
		(holding local_storage-&gt;lock)
	bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160
	bpf_selem_unlink_storage+0x6f/0x110
	bpf_local_storage_map_free+0xa2/0x110
	bpf_map_free_deferred+0x5b/0x90
	process_one_work+0x17c/0x390
	worker_thread+0x251/0x360
	kthread+0xd2/0x100
	ret_from_fork+0x34/0x50
	ret_from_fork_asm+0x1a/0x30
	&lt;/TASK&gt;

Progs:
 - A: SEC("fentry/cgroup_storage_ptr")
   - cgid (BPF_MAP_TYPE_HASH)
	Record the id of the cgroup the current task belonging
	to in this hash map, using the address of the cgroup
	as the map key.
   - cgrpa (BPF_MAP_TYPE_CGRP_STORAGE)
	If current task is a kworker, lookup the above hash
	map using function parameter @owner as the key to get
	its corresponding cgroup id which is then used to get
	a trusted pointer to the cgroup through
	bpf_cgroup_from_id(). This trusted pointer can then
	be passed to bpf_cgrp_storage_get() to finally trigger
	the deadlock issue.
 - B: SEC("tp_btf/sys_enter")
   - cgrpb (BPF_MAP_TYPE_CGRP_STORAGE)
	The only purpose of this prog is to fill Prog A's
	hash map by calling bpf_cgrp_storage_get() for as
	many userspace tasks as possible.

Steps to reproduce:
 - Run A;
 - while (true) { Run B; Destroy B; }

Fix this issue by passing its busy counter to the free procedure so
it can be properly incremented before storage/smap locking.

Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
Signed-off-by: Abel Wu &lt;wuyun.abel@bytedance.com&gt;
Acked-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Link: https://lore.kernel.org/r/20241221061018.37717-1-wuyun.abel@bytedance.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl</title>
<updated>2025-01-29T18:35:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-29T18:35:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af13ff1c33e043b746cd96c83c7660ddf0272f73'/>
<id>urn:sha1:af13ff1c33e043b746cd96c83c7660ddf0272f73</id>
<content type='text'>
Pull sysctl table constification from Joel Granados:
 "All ctl_table declared outside of functions and that remain unmodified
  after initialization are const qualified.

  This prevents unintended modifications to proc_handler function
  pointers by placing them in the .rodata section.

  This is a continuation of the tree-wide effort started a few releases
  ago with the constification of the ctl_table struct arguments in the
  sysctl API done in 78eb4ea25cd5 ("sysctl: treewide: constify the
  ctl_table argument of proc_handlers")"

* tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  treewide: const qualify ctl_tables where applicable
</content>
</entry>
<entry>
<title>bpf: avoid holding freeze_mutex during mmap operation</title>
<updated>2025-01-29T17:49:50Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2025-01-29T01:22:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bc27c52eea189e8f7492d40739b7746d67b65beb'/>
<id>urn:sha1:bc27c52eea189e8f7492d40739b7746d67b65beb</id>
<content type='text'>
We use map-&gt;freeze_mutex to prevent races between map_freeze() and
memory mapping BPF map contents with writable permissions. The way we
naively do this means we'll hold freeze_mutex for entire duration of all
the mm and VMA manipulations, which is completely unnecessary. This can
potentially also lead to deadlocks, as reported by syzbot in [0].

So, instead, hold freeze_mutex only during writeability checks, bump
(proactively) "write active" count for the map, unlock the mutex and
proceed with mmap logic. And only if something went wrong during mmap
logic, then undo that "write active" counter increment.

  [0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@google.com/

Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Reported-by: syzbot+4dc041c686b7c816a71e@syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/r/20250129012246.1515826-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
