<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/ringbuf.c, branch v6.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-05-23T21:31:28Z</updated>
<entry>
<title>bpf: Dynptr support for ring buffers</title>
<updated>2022-05-23T21:31:28Z</updated>
<author>
<name>Joanne Koong</name>
<email>joannelkoong@gmail.com</email>
</author>
<published>2022-05-23T21:07:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bc34dee65a65e9c920c420005b8a43f2a721a458'/>
<id>urn:sha1:bc34dee65a65e9c920c420005b8a43f2a721a458</id>
<content type='text'>
Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.

The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.

There are 3 new APIs:

long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);

These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.

Signed-off-by: Joanne Koong &lt;joannelkoong@gmail.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: David Vernet &lt;void@manifault.com&gt;
Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com
</content>
</entry>
<entry>
<title>bpf: Compute map_btf_id during build time</title>
<updated>2022-04-26T18:35:21Z</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-04-25T13:32:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c317ab71facc2cd0a94145973318a4c914e11acc'/>
<id>urn:sha1:c317ab71facc2cd0a94145973318a4c914e11acc</id>
<content type='text'>
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -&gt; btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Tag argument to be released in bpf_func_proto</title>
<updated>2022-04-26T00:31:35Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2022-04-24T21:48:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f14852e89113d738c99c375b4c8b8b7e1073df1'/>
<id>urn:sha1:8f14852e89113d738c99c375b4c8b8b7e1073df1</id>
<content type='text'>
Add a new type flag for bpf_arg_type that when set tells verifier that
for a release function, that argument's register will be the one for
which meta.ref_obj_id will be set, and which will then be released
using release_reference. To capture the regno, introduce a new field
release_regno in bpf_call_arg_meta.

This would be required in the next patch, where we may either pass NULL
or a refcounted pointer as an argument to the release function
bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
enough, as there is a case where the type of argument needed matches,
but the ref_obj_id is set to 0. Hence, we must enforce that whenever
meta.ref_obj_id is zero, the register that is to be released can only
be NULL for a release function.

Since we now indicate whether an argument is to be released in
bpf_func_proto itself, is_release_function helper has lost its utitlity,
hence refactor code to work without it, and just rely on
meta.release_regno to know when to release state for a ref_obj_id.
Still, the restriction of one release argument and only one ref_obj_id
passed to BPF helper or kfunc remains. This may be lifted in the future.

Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com
</content>
</entry>
<entry>
<title>bpf: Use VM_MAP instead of VM_ALLOC for ringbuf</title>
<updated>2022-02-03T07:15:24Z</updated>
<author>
<name>Hou Tao</name>
<email>hotforest@gmail.com</email>
</author>
<published>2022-02-02T06:01:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b293dcc473d22a62dc6d78de2b15e4f49515db56'/>
<id>urn:sha1:b293dcc473d22a62dc6d78de2b15e4f49515db56</id>
<content type='text'>
After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages
after mapping"), non-VM_ALLOC mappings will be marked as accessible
in __get_vm_area_node() when KASAN is enabled. But now the flag for
ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access
after vmap() returns. Because the ringbuf area is created by mapping
allocated pages, so use VM_MAP instead.

After the change, info in /proc/vmallocinfo also changes from
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmalloc user
to
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmap user

Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com
Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
</content>
</entry>
<entry>
<title>bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.</title>
<updated>2021-12-18T21:27:41Z</updated>
<author>
<name>Hao Luo</name>
<email>haoluo@google.com</email>
</author>
<published>2021-12-17T00:31:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20'/>
<id>urn:sha1:216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20</id>
<content type='text'>
Some helper functions may modify its arguments, for example,
bpf_d_path, bpf_get_stack etc. Previously, their argument types
were marked as ARG_PTR_TO_MEM, which is compatible with read-only
mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate,
but technically incorrect, to modify a read-only memory by passing
it into one of such helper functions.

This patch tags the bpf_args compatible with immutable memory with
MEM_RDONLY flag. The arguments that don't have this flag will be
only compatible with mutable memory types, preventing the helper
from modifying a read-only memory. The bpf_args that have
MEM_RDONLY are compatible with both mutable memory and immutable
memory.

Signed-off-by: Hao Luo &lt;haoluo@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
</content>
</entry>
<entry>
<title>bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc()</title>
<updated>2021-06-28T13:57:46Z</updated>
<author>
<name>Rustam Kovhaev</name>
<email>rkovhaev@gmail.com</email>
</author>
<published>2021-06-26T18:11:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ccff81e1d028bbbf8573d3364a87542386c707bf'/>
<id>urn:sha1:ccff81e1d028bbbf8573d3364a87542386c707bf</id>
<content type='text'>
kmemleak scans struct page, but it does not scan the page content. If we
allocate some memory with kmalloc(), then allocate page with alloc_page(),
and if we put kmalloc pointer somewhere inside that page, kmemleak will
report kmalloc pointer as a false positive.

We can instruct kmemleak to scan the memory area by calling kmemleak_alloc()
and kmemleak_free(), but part of struct bpf_ringbuf is mmaped to user space,
and if struct bpf_ringbuf changes we would have to revisit and review size
argument in kmemleak_alloc(), because we do not want kmemleak to scan the
user space memory. Let's simplify things and use kmemleak_not_leak() here.

For posterity, also adding additional prior analysis from Andrii:

  I think either kmemleak or syzbot are misreporting this. I've added a
  bunch of printks around all allocations performed by BPF ringbuf. [...]
  On repro side I get these two warnings:

  [vmuser@archvm bpf]$ sudo ./repro
  BUG: memory leak
  unreferenced object 0xffff88810d538c00 (size 64):
    comm "repro", pid 2140, jiffies 4294692933 (age 14.540s)
    hex dump (first 32 bytes):
      00 af 19 04 00 ea ff ff c0 ae 19 04 00 ea ff ff  ................
      80 ae 19 04 00 ea ff ff c0 29 2e 04 00 ea ff ff  .........)......
    backtrace:
      [&lt;0000000077bfbfbd&gt;] __bpf_map_area_alloc+0x31/0xc0
      [&lt;00000000587fa522&gt;] ringbuf_map_alloc.cold.4+0x48/0x218
      [&lt;0000000044d49e96&gt;] __do_sys_bpf+0x359/0x1d90
      [&lt;00000000f601d565&gt;] do_syscall_64+0x2d/0x40
      [&lt;0000000043d3112a&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xae

  BUG: memory leak
  unreferenced object 0xffff88810d538c80 (size 64):
    comm "repro", pid 2143, jiffies 4294699025 (age 8.448s)
    hex dump (first 32 bytes):
      80 aa 19 04 00 ea ff ff 00 ab 19 04 00 ea ff ff  ................
      c0 ab 19 04 00 ea ff ff 80 44 28 04 00 ea ff ff  .........D(.....
    backtrace:
      [&lt;0000000077bfbfbd&gt;] __bpf_map_area_alloc+0x31/0xc0
      [&lt;00000000587fa522&gt;] ringbuf_map_alloc.cold.4+0x48/0x218
      [&lt;0000000044d49e96&gt;] __do_sys_bpf+0x359/0x1d90
      [&lt;00000000f601d565&gt;] do_syscall_64+0x2d/0x40
      [&lt;0000000043d3112a&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xae

  Note that both reported leaks (ffff88810d538c80 and ffff88810d538c00)
  correspond to pages array bpf_ringbuf is allocating and tracking properly
  internally. Note also that syzbot repro doesn't close FD of created BPF
  ringbufs, and even when ./repro itself exits with error, there are still
  two forked processes hanging around in my system. So clearly ringbuf maps
  are alive at that point. So reporting any memory leak looks weird at that
  point, because that memory is being used by active referenced BPF ringbuf.

  It's also a question why repro doesn't clean up its forks. But if I do a
  `pkill repro`, I do see that all the allocated memory is /properly/ cleaned
  up [and the] "leaks" are deallocated properly.

  BTW, if I add close() right after bpf() syscall in syzbot repro, I see that
  everything is immediately deallocated, like designed. And no memory leak
  is reported. So I don't think the problem is anywhere in bpf_ringbuf code,
  rather in the leak detection and/or repro itself.

Reported-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com
Signed-off-by: Rustam Kovhaev &lt;rkovhaev@gmail.com&gt;
[ Daniel: also included analysis from Andrii to the commit log ]
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Tested-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/CAEf4BzYk+dqs+jwu6VKXP-RttcTEGFe+ySTGWT9CRNkagDiJVA@mail.gmail.com
Link: https://lore.kernel.org/lkml/YNTAqiE7CWJhOK2M@nuc10
Link: https://lore.kernel.org/lkml/20210615101515.GC26027@arm.com
Link: https://syzkaller.appspot.com/bug?extid=5d895828587f49e7fe9b
Link: https://lore.kernel.org/bpf/20210626181156.1873604-1-rkovhaev@gmail.com
</content>
</entry>
<entry>
<title>bpf: Prevent writable memory-mapping of read-only ringbuf pages</title>
<updated>2021-05-11T11:31:10Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2021-05-04T23:38:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04ea3086c4d73da7009de1e84962a904139af219'/>
<id>urn:sha1:04ea3086c4d73da7009de1e84962a904139af219</id>
<content type='text'>
Only the very first page of BPF ringbuf that contains consumer position
counter is supposed to be mapped as writeable by user-space. Producer
position is read-only and can be modified only by the kernel code. BPF ringbuf
data pages are read-only as well and are not meant to be modified by
user-code to maintain integrity of per-record headers.

This patch allows to map only consumer position page as writeable and
everything else is restricted to be read-only. remap_vmalloc_range()
internally adds VM_DONTEXPAND, so all the established memory mappings can't be
extended, which prevents any future violations through mremap()'ing.

Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Ryota Shiga (Flatt Security)
Reported-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, ringbuf: Deny reserve of buffers larger than ringbuf</title>
<updated>2021-05-11T11:30:45Z</updated>
<author>
<name>Thadeu Lima de Souza Cascardo</name>
<email>cascardo@canonical.com</email>
</author>
<published>2021-04-27T13:12:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b81ccebaeee885ab1aa1438133f2991e3a2b6ea'/>
<id>urn:sha1:4b81ccebaeee885ab1aa1438133f2991e3a2b6ea</id>
<content type='text'>
A BPF program might try to reserve a buffer larger than the ringbuf size.
If the consumer pointer is way ahead of the producer, that would be
successfully reserved, allowing the BPF program to read or write out of
the ringbuf allocated area.

Reported-by: Ryota Shiga (Flatt Security)
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Eliminate rlimit-based memory accounting for bpf ringbuffer</title>
<updated>2020-12-03T02:32:47Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=abbdd0813f347f9d1eea376409a68295318b2ef5'/>
<id>urn:sha1:abbdd0813f347f9d1eea376409a68295318b2ef5</id>
<content type='text'>
Do not use rlimit-based memory accounting for bpf ringbuffer.
It has been replaced with the memcg-based memory accounting.

bpf_ringbuf_alloc() can't return anything except ERR_PTR(-ENOMEM)
and a valid pointer, so to simplify the code make it return NULL
in the first case. This allows to drop a couple of lines in
ringbuf_map_alloc() and also makes it look similar to other memory
allocating function like kmalloc().

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-28-guro@fb.com
</content>
</entry>
<entry>
<title>bpf: Memcg-based memory accounting for bpf ringbuffer</title>
<updated>2020-12-03T02:32:45Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=be4035c734d12918866c5eb2c496d420aa80adeb'/>
<id>urn:sha1:be4035c734d12918866c5eb2c496d420aa80adeb</id>
<content type='text'>
Enable the memcg-based memory accounting for the memory used by
the bpf ringbuffer.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-15-guro@fb.com
</content>
</entry>
</feed>
