<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/core.c, branch v4.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-08-16T19:55:32Z</updated>
<entry>
<title>bpf: fix a rcu usage warning in bpf_prog_array_copy_core()</title>
<updated>2018-08-16T19:55:32Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2018-08-14T18:01:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=965931e3a803a506482616f89239eff6901c17b8'/>
<id>urn:sha1:965931e3a803a506482616f89239eff6901c17b8</id>
<content type='text'>
Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
to the cgroup storage") refactored the bpf_prog_array_copy_core()
to accommodate new structure bpf_prog_array_item which contains
bpf_prog array itself.

In the old code, we had
   perf_event_query_prog_array():
     mutex_lock(...)
     bpf_prog_array_copy_call():
       prog = rcu_dereference_check(array, 1)-&gt;progs
       bpf_prog_array_copy_core(prog, ...)
     mutex_unlock(...)

With the above commit, we had
   perf_event_query_prog_array():
     mutex_lock(...)
     bpf_prog_array_copy_call():
       bpf_prog_array_copy_core(array, ...):
         item = rcu_dereference(array)-&gt;items;
         ...
     mutex_unlock(...)

The new code will trigger a lockdep rcu checking warning.
The fix is to change rcu_dereference() to rcu_dereference_check()
to prevent such a warning.

Reported-by: syzbot+6e72317008eef84a216b@syzkaller.appspotmail.com
Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: introduce the bpf_get_local_storage() helper function</title>
<updated>2018-08-02T22:47:32Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2018-08-02T21:27:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd3394317653837e2eb5c5d0904a8996102af9fc'/>
<id>urn:sha1:cd3394317653837e2eb5c5d0904a8996102af9fc</id>
<content type='text'>
The bpf_get_local_storage() helper function is used
to get a pointer to the bpf local storage from a bpf program.

It takes a pointer to a storage map and flags as arguments.
Right now it accepts only cgroup storage maps, and flags
argument has to be 0. Further it can be extended to support
other types of local storage: e.g. thread local storage etc.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: extend bpf_prog_array to store pointers to the cgroup storage</title>
<updated>2018-08-02T22:47:32Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2018-08-02T21:27:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=394e40a29788820c9c0526b1c3497c9e0ec2a126'/>
<id>urn:sha1:394e40a29788820c9c0526b1c3497c9e0ec2a126</id>
<content type='text'>
This patch converts bpf_prog_array from an array of prog pointers
to the array of struct bpf_prog_array_item elements.

This allows to save a cgroup storage pointer for each bpf program
efficiently attached to a cgroup.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: bpf_prog_array_alloc() should return a generic non-rcu pointer</title>
<updated>2018-07-18T13:01:20Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2018-07-13T19:41:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d29ab6e1fa21ebc2a8a771015dd9e0e5d4e28dc1'/>
<id>urn:sha1:d29ab6e1fa21ebc2a8a771015dd9e0e5d4e28dc1</id>
<content type='text'>
Currently the return type of the bpf_prog_array_alloc() is
struct bpf_prog_array __rcu *, which is not quite correct.
Obviously, the returned pointer is a generic pointer, which
is valid for an indefinite amount of time and it's not shared
with anyone else, so there is no sense in marking it as __rcu.

This change eliminate the following sparse warnings:
kernel/bpf/core.c:1544:31: warning: incorrect type in return expression (different address spaces)
kernel/bpf/core.c:1544:31:    expected struct bpf_prog_array [noderef] &lt;asn:4&gt;*
kernel/bpf/core.c:1544:31:    got void *
kernel/bpf/core.c:1548:17: warning: incorrect type in return expression (different address spaces)
kernel/bpf/core.c:1548:17:    expected struct bpf_prog_array [noderef] &lt;asn:4&gt;*
kernel/bpf/core.c:1548:17:    got struct bpf_prog_array *&lt;noident&gt;
kernel/bpf/core.c:1681:15: warning: incorrect type in assignment (different address spaces)
kernel/bpf/core.c:1681:15:    expected struct bpf_prog_array *array
kernel/bpf/core.c:1681:15:    got struct bpf_prog_array [noderef] &lt;asn:4&gt;*

Fixes: 324bda9e6c5a ("bpf: multi program support for cgroup+bpf")
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: undo prog rejection on read-only lock failure</title>
<updated>2018-06-29T17:47:35Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-06-28T21:34:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=85782e037f8aba8922dadb24a1523ca0b82ab8bc'/>
<id>urn:sha1:85782e037f8aba8922dadb24a1523ca0b82ab8bc</id>
<content type='text'>
Partially undo commit 9facc336876f ("bpf: reject any prog that failed
read-only lock") since it caused a regression, that is, syzkaller was
able to manage to cause a panic via fault injection deep in set_memory_ro()
path by letting an allocation fail: In x86's __change_page_attr_set_clr()
it was able to change the attributes of the primary mapping but not in
the alias mapping via cpa_process_alias(), so the second, inner call
to the __change_page_attr() via __change_page_attr_set_clr() had to split
a larger page and failed in the alloc_pages() with the artifically triggered
allocation error which is then propagated down to the call site.

Thus, for set_memory_ro() this means that it returned with an error, but
from debugging a probe_kernel_write() revealed EFAULT on that memory since
the primary mapping succeeded to get changed. Therefore the subsequent
hdr-&gt;locked = 0 reset triggered the panic as it was performed on read-only
memory, so call-site assumptions were infact wrong to assume that it would
either succeed /or/ not succeed at all since there's no such rollback in
set_memory_*() calls from partial change of mappings, in other words, we're
left in a state that is "half done". A later undo via set_memory_rw() is
succeeding though due to matching permissions on that part (aka due to the
try_preserve_large_page() succeeding). While reproducing locally with
explicitly triggering this error, the initial splitting only happens on
rare occasions and in real world it would additionally need oom conditions,
but that said, it could partially fail. Therefore, it is definitely wrong
to bail out on set_memory_ro() error and reject the program with the
set_memory_*() semantics we have today. Shouldn't have gone the extra mile
since no other user in tree today infact checks for any set_memory_*()
errors, e.g. neither module_enable_ro() / module_disable_ro() for module
RO/NX handling which is mostly default these days nor kprobes core with
alloc_insn_page() / free_insn_page() as examples that could be invoked long
after bootup and original 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") did neither when it got first introduced to BPF
so "improving" with bailing out was clearly not right when set_memory_*()
cannot handle it today.

Kees suggested that if set_memory_*() can fail, we should annotate it with
__must_check, and all callers need to deal with it gracefully given those
set_memory_*() markings aren't "advisory", but they're expected to actually
do what they say. This might be an option worth to move forward in future
but would at the same time require that set_memory_*() calls from supporting
archs are guaranteed to be "atomic" in that they provide rollback if part
of the range fails, once that happened, the transition from RW -&gt; RO could
be made more robust that way, while subsequent RO -&gt; RW transition /must/
continue guaranteeing to always succeed the undo part.

Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com
Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
Cc: Laura Abbott &lt;labbott@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: reject any prog that failed read-only lock</title>
<updated>2018-06-15T18:14:25Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-06-15T00:30:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9facc336876f7ecf9edba4c67b90426fde4ec898'/>
<id>urn:sha1:9facc336876f7ecf9edba4c67b90426fde4ec898</id>
<content type='text'>
We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
as well as the BPF image as read-only through bpf_prog_lock_ro(). In
the case any of these would fail we throw a WARN_ON_ONCE() in order to
yell loudly to the log. Perhaps, to some extend, this may be comparable
to an allocation where __GFP_NOWARN is explicitly not set.

Added via 65869a47f348 ("bpf: improve read-only handling"), this behavior
is slightly different compared to any of the other in-kernel set_memory_ro()
users who do not check the return code of set_memory_ro() and friends /at
all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
in BPF this is mandatory hardening step, we want to know whether there
are any issues that would leave both BPF data writable. So it happens
that syzkaller enabled fault injection and it triggered memory allocation
failure deep inside x86's change_page_attr_set_clr() which was triggered
from set_memory_ro().

Now, there are two options: i) leaving everything as is, and ii) reworking
the image locking code in order to have a final checkpoint out of the
central bpf_prog_select_runtime() which probes whether any of the calls
during prog setup weren't successful, and then bailing out with an error.
Option ii) is a better approach since this additional paranoia avoids
altogether leaving any potential W+X pages from BPF side in the system.
Therefore, lets be strict about it, and reject programs in such unlikely
occasion. While testing I noticed also that one bpf_prog_lock_ro()
call was missing on the outer dummy prog in case of calls, e.g. in the
destructor we call bpf_prog_free_deferred() on the main prog where we
try to bpf_prog_unlock_free() the program, and since we go via
bpf_prog_select_runtime() do that as well.

Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix panic in prog load calls cleanup</title>
<updated>2018-06-15T18:14:25Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-06-15T00:30:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7d1982b4e335c1b184406b7566f6041bfe313c35'/>
<id>urn:sha1:7d1982b4e335c1b184406b7566f6041bfe313c35</id>
<content type='text'>
While testing I found that when hitting error path in bpf_prog_load()
where we jump to free_used_maps and prog contained BPF to BPF calls
that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add()
done under jit_subprogs(). Add proper API to make BPF kallsyms deletion
more clear and fix that.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: implement bpf_get_current_cgroup_id() helper</title>
<updated>2018-06-04T01:22:41Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2018-06-03T22:59:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bf6fa2c893c5237b48569a13fa3c673041430b6c'/>
<id>urn:sha1:bf6fa2c893c5237b48569a13fa3c673041430b6c</id>
<content type='text'>
bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.

Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.

This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.

The later patch will provide an example to show that
userspace can get the same cgroup id so it could
configure a filter or policy in the bpf program based on
task cgroup id.

The helper is currently implemented for tracing. It can
be added to other program types as well when needed.

Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found</title>
<updated>2018-05-30T10:37:38Z</updated>
<author>
<name>Sean Young</name>
<email>sean@mess.org</email>
</author>
<published>2018-05-27T11:24:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=170a7e3ea0709eae12c8f944b9f33c54fe80c6c1'/>
<id>urn:sha1:170a7e3ea0709eae12c8f944b9f33c54fe80c6c1</id>
<content type='text'>
This makes is it possible for bpf prog detach to return -ENOENT.

Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Sean Young &lt;sean@mess.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2018-05-21T20:01:54Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-05-21T20:01:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6f6e434aa267a6030477876d89444fe3a6b7a48d'/>
<id>urn:sha1:6f6e434aa267a6030477876d89444fe3a6b7a48d</id>
<content type='text'>
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.

TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.

The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.

Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
