<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/ringbuf.c, branch v5.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-02-03T07:15:24Z</updated>
<entry>
<title>bpf: Use VM_MAP instead of VM_ALLOC for ringbuf</title>
<updated>2022-02-03T07:15:24Z</updated>
<author>
<name>Hou Tao</name>
<email>hotforest@gmail.com</email>
</author>
<published>2022-02-02T06:01:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b293dcc473d22a62dc6d78de2b15e4f49515db56'/>
<id>urn:sha1:b293dcc473d22a62dc6d78de2b15e4f49515db56</id>
<content type='text'>
After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages
after mapping"), non-VM_ALLOC mappings will be marked as accessible
in __get_vm_area_node() when KASAN is enabled. But now the flag for
ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access
after vmap() returns. Because the ringbuf area is created by mapping
allocated pages, so use VM_MAP instead.

After the change, info in /proc/vmallocinfo also changes from
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmalloc user
to
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmap user

Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com
Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
</content>
</entry>
<entry>
<title>bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.</title>
<updated>2021-12-18T21:27:41Z</updated>
<author>
<name>Hao Luo</name>
<email>haoluo@google.com</email>
</author>
<published>2021-12-17T00:31:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20'/>
<id>urn:sha1:216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20</id>
<content type='text'>
Some helper functions may modify its arguments, for example,
bpf_d_path, bpf_get_stack etc. Previously, their argument types
were marked as ARG_PTR_TO_MEM, which is compatible with read-only
mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate,
but technically incorrect, to modify a read-only memory by passing
it into one of such helper functions.

This patch tags the bpf_args compatible with immutable memory with
MEM_RDONLY flag. The arguments that don't have this flag will be
only compatible with mutable memory types, preventing the helper
from modifying a read-only memory. The bpf_args that have
MEM_RDONLY are compatible with both mutable memory and immutable
memory.

Signed-off-by: Hao Luo &lt;haoluo@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
</content>
</entry>
<entry>
<title>bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc()</title>
<updated>2021-06-28T13:57:46Z</updated>
<author>
<name>Rustam Kovhaev</name>
<email>rkovhaev@gmail.com</email>
</author>
<published>2021-06-26T18:11:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ccff81e1d028bbbf8573d3364a87542386c707bf'/>
<id>urn:sha1:ccff81e1d028bbbf8573d3364a87542386c707bf</id>
<content type='text'>
kmemleak scans struct page, but it does not scan the page content. If we
allocate some memory with kmalloc(), then allocate page with alloc_page(),
and if we put kmalloc pointer somewhere inside that page, kmemleak will
report kmalloc pointer as a false positive.

We can instruct kmemleak to scan the memory area by calling kmemleak_alloc()
and kmemleak_free(), but part of struct bpf_ringbuf is mmaped to user space,
and if struct bpf_ringbuf changes we would have to revisit and review size
argument in kmemleak_alloc(), because we do not want kmemleak to scan the
user space memory. Let's simplify things and use kmemleak_not_leak() here.

For posterity, also adding additional prior analysis from Andrii:

  I think either kmemleak or syzbot are misreporting this. I've added a
  bunch of printks around all allocations performed by BPF ringbuf. [...]
  On repro side I get these two warnings:

  [vmuser@archvm bpf]$ sudo ./repro
  BUG: memory leak
  unreferenced object 0xffff88810d538c00 (size 64):
    comm "repro", pid 2140, jiffies 4294692933 (age 14.540s)
    hex dump (first 32 bytes):
      00 af 19 04 00 ea ff ff c0 ae 19 04 00 ea ff ff  ................
      80 ae 19 04 00 ea ff ff c0 29 2e 04 00 ea ff ff  .........)......
    backtrace:
      [&lt;0000000077bfbfbd&gt;] __bpf_map_area_alloc+0x31/0xc0
      [&lt;00000000587fa522&gt;] ringbuf_map_alloc.cold.4+0x48/0x218
      [&lt;0000000044d49e96&gt;] __do_sys_bpf+0x359/0x1d90
      [&lt;00000000f601d565&gt;] do_syscall_64+0x2d/0x40
      [&lt;0000000043d3112a&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xae

  BUG: memory leak
  unreferenced object 0xffff88810d538c80 (size 64):
    comm "repro", pid 2143, jiffies 4294699025 (age 8.448s)
    hex dump (first 32 bytes):
      80 aa 19 04 00 ea ff ff 00 ab 19 04 00 ea ff ff  ................
      c0 ab 19 04 00 ea ff ff 80 44 28 04 00 ea ff ff  .........D(.....
    backtrace:
      [&lt;0000000077bfbfbd&gt;] __bpf_map_area_alloc+0x31/0xc0
      [&lt;00000000587fa522&gt;] ringbuf_map_alloc.cold.4+0x48/0x218
      [&lt;0000000044d49e96&gt;] __do_sys_bpf+0x359/0x1d90
      [&lt;00000000f601d565&gt;] do_syscall_64+0x2d/0x40
      [&lt;0000000043d3112a&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xae

  Note that both reported leaks (ffff88810d538c80 and ffff88810d538c00)
  correspond to pages array bpf_ringbuf is allocating and tracking properly
  internally. Note also that syzbot repro doesn't close FD of created BPF
  ringbufs, and even when ./repro itself exits with error, there are still
  two forked processes hanging around in my system. So clearly ringbuf maps
  are alive at that point. So reporting any memory leak looks weird at that
  point, because that memory is being used by active referenced BPF ringbuf.

  It's also a question why repro doesn't clean up its forks. But if I do a
  `pkill repro`, I do see that all the allocated memory is /properly/ cleaned
  up [and the] "leaks" are deallocated properly.

  BTW, if I add close() right after bpf() syscall in syzbot repro, I see that
  everything is immediately deallocated, like designed. And no memory leak
  is reported. So I don't think the problem is anywhere in bpf_ringbuf code,
  rather in the leak detection and/or repro itself.

Reported-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com
Signed-off-by: Rustam Kovhaev &lt;rkovhaev@gmail.com&gt;
[ Daniel: also included analysis from Andrii to the commit log ]
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Tested-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/CAEf4BzYk+dqs+jwu6VKXP-RttcTEGFe+ySTGWT9CRNkagDiJVA@mail.gmail.com
Link: https://lore.kernel.org/lkml/YNTAqiE7CWJhOK2M@nuc10
Link: https://lore.kernel.org/lkml/20210615101515.GC26027@arm.com
Link: https://syzkaller.appspot.com/bug?extid=5d895828587f49e7fe9b
Link: https://lore.kernel.org/bpf/20210626181156.1873604-1-rkovhaev@gmail.com
</content>
</entry>
<entry>
<title>bpf: Prevent writable memory-mapping of read-only ringbuf pages</title>
<updated>2021-05-11T11:31:10Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2021-05-04T23:38:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04ea3086c4d73da7009de1e84962a904139af219'/>
<id>urn:sha1:04ea3086c4d73da7009de1e84962a904139af219</id>
<content type='text'>
Only the very first page of BPF ringbuf that contains consumer position
counter is supposed to be mapped as writeable by user-space. Producer
position is read-only and can be modified only by the kernel code. BPF ringbuf
data pages are read-only as well and are not meant to be modified by
user-code to maintain integrity of per-record headers.

This patch allows to map only consumer position page as writeable and
everything else is restricted to be read-only. remap_vmalloc_range()
internally adds VM_DONTEXPAND, so all the established memory mappings can't be
extended, which prevents any future violations through mremap()'ing.

Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Ryota Shiga (Flatt Security)
Reported-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, ringbuf: Deny reserve of buffers larger than ringbuf</title>
<updated>2021-05-11T11:30:45Z</updated>
<author>
<name>Thadeu Lima de Souza Cascardo</name>
<email>cascardo@canonical.com</email>
</author>
<published>2021-04-27T13:12:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b81ccebaeee885ab1aa1438133f2991e3a2b6ea'/>
<id>urn:sha1:4b81ccebaeee885ab1aa1438133f2991e3a2b6ea</id>
<content type='text'>
A BPF program might try to reserve a buffer larger than the ringbuf size.
If the consumer pointer is way ahead of the producer, that would be
successfully reserved, allowing the BPF program to read or write out of
the ringbuf allocated area.

Reported-by: Ryota Shiga (Flatt Security)
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Eliminate rlimit-based memory accounting for bpf ringbuffer</title>
<updated>2020-12-03T02:32:47Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=abbdd0813f347f9d1eea376409a68295318b2ef5'/>
<id>urn:sha1:abbdd0813f347f9d1eea376409a68295318b2ef5</id>
<content type='text'>
Do not use rlimit-based memory accounting for bpf ringbuffer.
It has been replaced with the memcg-based memory accounting.

bpf_ringbuf_alloc() can't return anything except ERR_PTR(-ENOMEM)
and a valid pointer, so to simplify the code make it return NULL
in the first case. This allows to drop a couple of lines in
ringbuf_map_alloc() and also makes it look similar to other memory
allocating function like kmalloc().

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-28-guro@fb.com
</content>
</entry>
<entry>
<title>bpf: Memcg-based memory accounting for bpf ringbuffer</title>
<updated>2020-12-03T02:32:45Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=be4035c734d12918866c5eb2c496d420aa80adeb'/>
<id>urn:sha1:be4035c734d12918866c5eb2c496d420aa80adeb</id>
<content type='text'>
Enable the memcg-based memory accounting for the memory used by
the bpf ringbuffer.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-15-guro@fb.com
</content>
</entry>
<entry>
<title>bpf: Add map_meta_equal map ops</title>
<updated>2020-08-28T13:41:30Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2020-08-28T01:18:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f4d05259213ff1e91f767c91dcab455f68308fac'/>
<id>urn:sha1:f4d05259213ff1e91f767c91dcab455f68308fac</id>
<content type='text'>
Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.

In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time.  It limits the use case that
wants to replace a smaller inner map with a larger inner map.  There are
some maps do use max_entries during verification though.  For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.

To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops.  Each map-type can decide what to check when its
map is used as an inner map during runtime.

Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists.  This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops.  It is based on the
discussion in [1].

All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch.  A later patch will
relax the max_entries check for most maps.  bpf_types.h
counts 28 map types.  This patch adds 23 ".map_meta_equal"
by using coccinelle.  -5 for
	BPF_MAP_TYPE_PROG_ARRAY
	BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
	BPF_MAP_TYPE_STRUCT_OPS
	BPF_MAP_TYPE_ARRAY_OF_MAPS
	BPF_MAP_TYPE_HASH_OF_MAPS

The "if (inner_map-&gt;inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.

[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2020-07-11T07:46:00Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-07-11T07:46:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=71930d61025e7d0254f3c682cb1b5242e0499cf3'/>
<id>urn:sha1:71930d61025e7d0254f3c682cb1b5242e0499cf3</id>
<content type='text'>
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Remove redundant synchronize_rcu.</title>
<updated>2020-07-01T15:07:13Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2020-06-30T04:33:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bba1dc0b55ac462d24ed1228ad49800c238cd6d7'/>
<id>urn:sha1:bba1dc0b55ac462d24ed1228ad49800c238cd6d7</id>
<content type='text'>
bpf_free_used_maps() or close(map_fd) will trigger map_free callback.
bpf_free_used_maps() is called after bpf prog is no longer executing:
bpf_prog_put-&gt;call_rcu-&gt;bpf_prog_free-&gt;bpf_free_used_maps.
Hence there is no need to call synchronize_rcu() to protect map elements.

Note that hash_of_maps and array_of_maps update/delete inner maps via
sys_bpf() that calls maybe_wait_bpf_programs() and synchronize_rcu().

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200630043343.53195-2-alexei.starovoitov@gmail.com
</content>
</entry>
</feed>
