<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/syscall.c, branch v6.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-01-05T04:40:19Z</updated>
<entry>
<title>bpf: Fix re-attachment branch in bpf_tracing_prog_attach</title>
<updated>2024-01-05T04:40:19Z</updated>
<author>
<name>Jiri Olsa</name>
<email>olsajiri@gmail.com</email>
</author>
<published>2024-01-03T19:05:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=715d82ba636cb3629a6e18a33bb9dbe53f9936ee'/>
<id>urn:sha1:715d82ba636cb3629a6e18a33bb9dbe53f9936ee</id>
<content type='text'>
The following case can cause a crash due to missing attach_btf:

1) load rawtp program
2) load fentry program with rawtp as target_fd
3) create tracing link for fentry program with target_fd = 0
4) repeat 3

In the end we have:

- prog-&gt;aux-&gt;dst_trampoline == NULL
- tgt_prog == NULL (because we did not provide target_fd to link_create)
- prog-&gt;aux-&gt;attach_btf == NULL (the program was loaded with attach_prog_fd=X)
- the program was loaded for tgt_prog but we have no way to find out which one

    BUG: kernel NULL pointer dereference, address: 0000000000000058
    Call Trace:
     &lt;TASK&gt;
     ? __die+0x20/0x70
     ? page_fault_oops+0x15b/0x430
     ? fixup_exception+0x22/0x330
     ? exc_page_fault+0x6f/0x170
     ? asm_exc_page_fault+0x22/0x30
     ? bpf_tracing_prog_attach+0x279/0x560
     ? btf_obj_id+0x5/0x10
     bpf_tracing_prog_attach+0x439/0x560
     __sys_bpf+0x1cf4/0x2de0
     __x64_sys_bpf+0x1c/0x30
     do_syscall_64+0x41/0xf0
     entry_SYSCALL_64_after_hwframe+0x6e/0x76

Return -EINVAL in this situation.

Fixes: f3a95075549e0 ("bpf: Allow trampoline re-attach for tracing and lsm programs")
Cc: stable@vger.kernel.org
Signed-off-by: Jiri Olsa &lt;olsajiri@gmail.com&gt;
Acked-by: Jiri Olsa &lt;olsajiri@gmail.com&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Dmitrii Dolgov &lt;9erthalion6@gmail.com&gt;
Link: https://lore.kernel.org/r/20240103190559.14750-4-9erthalion6@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Relax tracing prog recursive attach rules</title>
<updated>2024-01-05T04:31:34Z</updated>
<author>
<name>Dmitrii Dolgov</name>
<email>9erthalion6@gmail.com</email>
</author>
<published>2024-01-03T19:05:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=19bfcdf9498aa968ea293417fbbc39e523527ca8'/>
<id>urn:sha1:19bfcdf9498aa968ea293417fbbc39e523527ca8</id>
<content type='text'>
Currently, it's not allowed to attach an fentry/fexit prog to another
one fentry/fexit. At the same time it's not uncommon to see a tracing
program with lots of logic in use, and the attachment limitation
prevents usage of fentry/fexit for performance analysis (e.g. with
"bpftool prog profile" command) in this case. An example could be
falcosecurity libs project that uses tp_btf tracing programs.

Following the corresponding discussion [1], the reason for that is to
avoid tracing progs call cycles without introducing more complex
solutions. But currently it seems impossible to load and attach tracing
programs in a way that will form such a cycle. The limitation is coming
from the fact that attach_prog_fd is specified at the prog load (thus
making it impossible to attach to a program loaded after it in this
way), as well as tracing progs not implementing link_detach.

Replace "no same type" requirement with verification that no more than
one level of attachment nesting is allowed. In this way only one
fentry/fexit program could be attached to another fentry/fexit to cover
profiling use case, and still no cycle could be formed. To implement,
add a new field into bpf_prog_aux to track nested attachment for tracing
programs.

[1]: https://lore.kernel.org/bpf/20191108064039.2041889-16-ast@kernel.org/

Acked-by: Jiri Olsa &lt;olsajiri@gmail.com&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Dmitrii Dolgov &lt;9erthalion6@gmail.com&gt;
Link: https://lore.kernel.org/r/20240103190559.14750-2-9erthalion6@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert BPF token-related functionality</title>
<updated>2023-12-19T16:23:03Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2023-12-19T15:37:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d17aff807f845cf93926c28705216639c7279110'/>
<id>urn:sha1:d17aff807f845cf93926c28705216639c7279110</id>
<content type='text'>
This patch includes the following revert (one  conflicting BPF FS
patch and three token patch sets, represented by merge commits):
  - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
  - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
  - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
  - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix a race condition between btf_put() and map_free()</title>
<updated>2023-12-15T01:10:32Z</updated>
<author>
<name>Yonghong Song</name>
<email>yonghong.song@linux.dev</email>
</author>
<published>2023-12-14T20:38:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=59e5791f59dd83e8aa72a4e74217eabb6e8cfd90'/>
<id>urn:sha1:59e5791f59dd83e8aa72a4e74217eabb6e8cfd90</id>
<content type='text'>
When running `./test_progs -j` in my local vm with latest kernel,
I once hit a kasan error like below:

  [ 1887.184724] BUG: KASAN: slab-use-after-free in bpf_rb_root_free+0x1f8/0x2b0
  [ 1887.185599] Read of size 4 at addr ffff888106806910 by task kworker/u12:2/2830
  [ 1887.186498]
  [ 1887.186712] CPU: 3 PID: 2830 Comm: kworker/u12:2 Tainted: G           OEL     6.7.0-rc3-00699-g90679706d486-dirty #494
  [ 1887.188034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [ 1887.189618] Workqueue: events_unbound bpf_map_free_deferred
  [ 1887.190341] Call Trace:
  [ 1887.190666]  &lt;TASK&gt;
  [ 1887.190949]  dump_stack_lvl+0xac/0xe0
  [ 1887.191423]  ? nf_tcp_handle_invalid+0x1b0/0x1b0
  [ 1887.192019]  ? panic+0x3c0/0x3c0
  [ 1887.192449]  print_report+0x14f/0x720
  [ 1887.192930]  ? preempt_count_sub+0x1c/0xd0
  [ 1887.193459]  ? __virt_addr_valid+0xac/0x120
  [ 1887.194004]  ? bpf_rb_root_free+0x1f8/0x2b0
  [ 1887.194572]  kasan_report+0xc3/0x100
  [ 1887.195085]  ? bpf_rb_root_free+0x1f8/0x2b0
  [ 1887.195668]  bpf_rb_root_free+0x1f8/0x2b0
  [ 1887.196183]  ? __bpf_obj_drop_impl+0xb0/0xb0
  [ 1887.196736]  ? preempt_count_sub+0x1c/0xd0
  [ 1887.197270]  ? preempt_count_sub+0x1c/0xd0
  [ 1887.197802]  ? _raw_spin_unlock+0x1f/0x40
  [ 1887.198319]  bpf_obj_free_fields+0x1d4/0x260
  [ 1887.198883]  array_map_free+0x1a3/0x260
  [ 1887.199380]  bpf_map_free_deferred+0x7b/0xe0
  [ 1887.199943]  process_scheduled_works+0x3a2/0x6c0
  [ 1887.200549]  worker_thread+0x633/0x890
  [ 1887.201047]  ? __kthread_parkme+0xd7/0xf0
  [ 1887.201574]  ? kthread+0x102/0x1d0
  [ 1887.202020]  kthread+0x1ab/0x1d0
  [ 1887.202447]  ? pr_cont_work+0x270/0x270
  [ 1887.202954]  ? kthread_blkcg+0x50/0x50
  [ 1887.203444]  ret_from_fork+0x34/0x50
  [ 1887.203914]  ? kthread_blkcg+0x50/0x50
  [ 1887.204397]  ret_from_fork_asm+0x11/0x20
  [ 1887.204913]  &lt;/TASK&gt;
  [ 1887.204913]  &lt;/TASK&gt;
  [ 1887.205209]
  [ 1887.205416] Allocated by task 2197:
  [ 1887.205881]  kasan_set_track+0x3f/0x60
  [ 1887.206366]  __kasan_kmalloc+0x6e/0x80
  [ 1887.206856]  __kmalloc+0xac/0x1a0
  [ 1887.207293]  btf_parse_fields+0xa15/0x1480
  [ 1887.207836]  btf_parse_struct_metas+0x566/0x670
  [ 1887.208387]  btf_new_fd+0x294/0x4d0
  [ 1887.208851]  __sys_bpf+0x4ba/0x600
  [ 1887.209292]  __x64_sys_bpf+0x41/0x50
  [ 1887.209762]  do_syscall_64+0x4c/0xf0
  [ 1887.210222]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
  [ 1887.210868]
  [ 1887.211074] Freed by task 36:
  [ 1887.211460]  kasan_set_track+0x3f/0x60
  [ 1887.211951]  kasan_save_free_info+0x28/0x40
  [ 1887.212485]  ____kasan_slab_free+0x101/0x180
  [ 1887.213027]  __kmem_cache_free+0xe4/0x210
  [ 1887.213514]  btf_free+0x5b/0x130
  [ 1887.213918]  rcu_core+0x638/0xcc0
  [ 1887.214347]  __do_softirq+0x114/0x37e

The error happens at bpf_rb_root_free+0x1f8/0x2b0:

  00000000000034c0 &lt;bpf_rb_root_free&gt;:
  ; {
    34c0: f3 0f 1e fa                   endbr64
    34c4: e8 00 00 00 00                callq   0x34c9 &lt;bpf_rb_root_free+0x9&gt;
    34c9: 55                            pushq   %rbp
    34ca: 48 89 e5                      movq    %rsp, %rbp
  ...
  ;       if (rec &amp;&amp; rec-&gt;refcount_off &gt;= 0 &amp;&amp;
    36aa: 4d 85 ed                      testq   %r13, %r13
    36ad: 74 a9                         je      0x3658 &lt;bpf_rb_root_free+0x198&gt;
    36af: 49 8d 7d 10                   leaq    0x10(%r13), %rdi
    36b3: e8 00 00 00 00                callq   0x36b8 &lt;bpf_rb_root_free+0x1f8&gt;
                                        &lt;==== kasan function
    36b8: 45 8b 7d 10                   movl    0x10(%r13), %r15d
                                        &lt;==== use-after-free load
    36bc: 45 85 ff                      testl   %r15d, %r15d
    36bf: 78 8c                         js      0x364d &lt;bpf_rb_root_free+0x18d&gt;

So the problem is at rec-&gt;refcount_off in the above.

I did some source code analysis and find the reason.
                                  CPU A                        CPU B
  bpf_map_put:
    ...
    btf_put with rcu callback
    ...
    bpf_map_free_deferred
      with system_unbound_wq
    ...                          ...                           ...
    ...                          btf_free_rcu:                 ...
    ...                          ...                           bpf_map_free_deferred:
    ...                          ...
    ...         ---------&gt;       btf_struct_metas_free()
    ...         | race condition ...
    ...         ---------&gt;                                     map-&gt;ops-&gt;map_free()
    ...
    ...                          btf-&gt;struct_meta_tab = NULL

In the above, map_free() corresponds to array_map_free() and eventually
calling bpf_rb_root_free() which calls:
  ...
  __bpf_obj_drop_impl(obj, field-&gt;graph_root.value_rec, false);
  ...

Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code:

  meta = btf_find_struct_meta(btf, btf_id);
  if (!meta)
    return -EFAULT;
  rec-&gt;fields[i].graph_root.value_rec = meta-&gt;record;

So basically, 'value_rec' is a pointer to the record in struct_metas_tab.
And it is possible that that particular record has been freed by
btf_struct_metas_free() and hence we have a kasan error here.

Actually it is very hard to reproduce the failure with current bpf/bpf-next
code, I only got the above error once. To increase reproducibility, I added
a delay in bpf_map_free_deferred() to delay map-&gt;ops-&gt;map_free(), which
significantly increased reproducibility.

  diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
  index 5e43ddd1b83f..aae5b5213e93 100644
  --- a/kernel/bpf/syscall.c
  +++ b/kernel/bpf/syscall.c
  @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
        struct bpf_map *map = container_of(work, struct bpf_map, work);
        struct btf_record *rec = map-&gt;record;

  +     mdelay(100);
        security_bpf_map_free(map);
        bpf_map_release_memcg(map);
        /* implementation dependent freeing */

Hao also provided test cases ([1]) for easily reproducing the above issue.

There are two ways to fix the issue, the v1 of the patch ([2]) moving
btf_put() after map_free callback, and the v5 of the patch ([3]) using
a kptr style fix which tries to get a btf reference during
map_check_btf(). Each approach has its pro and cons. The first approach
delays freeing btf while the second approach needs to acquire reference
depending on context which makes logic not very elegant and may
complicate things with future new data structures. Alexei
suggested in [4] going back to v1 which is what this patch
tries to do.

Rerun './test_progs -j' with the above mdelay() hack for a couple
of times and didn't observe the error for the above rb_root test cases.
Running Hou's test ([1]) is also successful.

  [1] https://lore.kernel.org/bpf/20231207141500.917136-1-houtao@huaweicloud.com/
  [2] v1: https://lore.kernel.org/bpf/20231204173946.3066377-1-yonghong.song@linux.dev/
  [3] v5: https://lore.kernel.org/bpf/20231208041621.2968241-1-yonghong.song@linux.dev/
  [4] v4: https://lore.kernel.org/bpf/CAADnVQJ3FiXUhZJwX_81sjZvSYYKCFB3BT6P8D59RS2Gu+0Z7g@mail.gmail.com/

Cc: Hou Tao &lt;houtao@huaweicloud.com&gt;
Fixes: 958cf2e273f0 ("bpf: Introduce bpf_obj_new")
Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Link: https://lore.kernel.org/r/20231214203815.1469107-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Reduce the scope of rcu_read_lock when updating fd map</title>
<updated>2023-12-14T04:49:11Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-12-14T04:30:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f82583f9527b3be9d70d9a5d1f33435e29d0480'/>
<id>urn:sha1:8f82583f9527b3be9d70d9a5d1f33435e29d0480</id>
<content type='text'>
There is no rcu-read-lock requirement for ops-&gt;map_fd_get_ptr() or
ops-&gt;map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
callbacks.

For bpf_fd_array_map_update_elem(), accessing array-&gt;ptrs doesn't need
rcu-read-lock because array-&gt;ptrs must still be allocated. For
bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
rcu_read_lock() during the invocation of htab_map_update_elem().

Acked-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231214043010.3458072-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Update the comments in maybe_wait_bpf_programs()</title>
<updated>2023-12-14T01:01:42Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-12-11T08:34:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a0c6b41eec90c2a138ea8b574836744783c67ff'/>
<id>urn:sha1:2a0c6b41eec90c2a138ea8b574836744783c67ff</id>
<content type='text'>
Since commit 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in
sleepable programs"), sleepable BPF program can also use map-in-map, but
maybe_wait_bpf_programs() doesn't handle it accordingly. The main reason
is that using synchronize_rcu_tasks_trace() to wait for the completions
of these sleepable BPF programs may incur a very long delay and
userspace may think it is hung, so the wait for sleepable BPF programs
is skipped. Update the comments in maybe_wait_bpf_programs() to reflect
the reason.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Acked-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/r/20231211083447.1921178-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Set uattr-&gt;batch.count as zero before batched update or deletion</title>
<updated>2023-12-10T02:12:34Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-12-08T10:23:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=06e5c999f10269a532304e89a6adb2fbfeb0593c'/>
<id>urn:sha1:06e5c999f10269a532304e89a6adb2fbfeb0593c</id>
<content type='text'>
generic_map_{delete,update}_batch() doesn't set uattr-&gt;batch.count as
zero before it tries to allocate memory for key. If the memory
allocation fails, the value of uattr-&gt;batch.count will be incorrect.

Fix it by setting uattr-&gt;batch.count as zero beore batched update or
deletion.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231208102355.2628918-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Only call maybe_wait_bpf_programs() when map operation succeeds</title>
<updated>2023-12-10T02:12:34Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-12-08T10:23:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67ad2c73ff29b32bd09135ec07c26e59490dbb3b'/>
<id>urn:sha1:67ad2c73ff29b32bd09135ec07c26e59490dbb3b</id>
<content type='text'>
There is no need to call maybe_wait_bpf_programs() if update or deletion
operation fails. So only call maybe_wait_bpf_programs() if update or
deletion operation succeeds.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231208102355.2628918-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add missed maybe_wait_bpf_programs() for htab of maps</title>
<updated>2023-12-10T02:12:34Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-12-08T10:23:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=012772581d040607ac1f981f47f6afd2336b4580'/>
<id>urn:sha1:012772581d040607ac1f981f47f6afd2336b4580</id>
<content type='text'>
When doing batched lookup and deletion operations on htab of maps,
maybe_wait_bpf_programs() is needed to ensure all programs don't use the
inner map after the bpf syscall returns.

Instead of adding the wait in __htab_map_lookup_and_delete_batch(),
adding the wait in bpf_map_do_batch() and also removing the calling of
maybe_wait_bpf_programs() from generic_map_{delete,update}_batch().

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231208102355.2628918-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch()</title>
<updated>2023-12-10T02:12:33Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2023-12-08T10:23:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=37ba5b59d6adfa08926acd3a833608487a18c2ef'/>
<id>urn:sha1:37ba5b59d6adfa08926acd3a833608487a18c2ef</id>
<content type='text'>
Just like commit 9087c6ff8dfe ("bpf: Call maybe_wait_bpf_programs() only
once from generic_map_delete_batch()"), there is also no need to call
maybe_wait_bpf_programs() for each update in batched update, so only
call it once in generic_map_update_batch().

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231208102355.2628918-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
