<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/cgroup.c, branch v5.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-10-30T19:49:13Z</updated>
<entry>
<title>bpf: Allow narrow loads of bpf_sysctl fields with offset &gt; 0</title>
<updated>2019-10-30T19:49:13Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-10-28T12:29:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7541c87c9b7a7e07c84481f37f2c19063b44469b'/>
<id>urn:sha1:7541c87c9b7a7e07c84481f37f2c19063b44469b</id>
<content type='text'>
"ctx:file_pos sysctl:read read ok narrow" works on s390 by accident: it
reads the wrong byte, which happens to have the expected value of 0.
Improve the test by seeking to the 4th byte and expecting 4 instead of
0.

This makes the latent problem apparent: the test attempts to read the
first byte of bpf_sysctl.file_pos, assuming this is the least-significant
byte, which is not the case on big-endian machines: a non-zero offset is
needed.

The point of the test is to verify narrow loads, so we cannot cheat our
way out by simply using BPF_W. The existence of the test means that such
loads have to be supported, most likely because llvm can generate them.
Fix the test by adding a big-endian variant, which uses an offset to
access the least-significant byte of bpf_sysctl.file_pos.

This reveals the final problem: verifier rejects accesses to bpf_sysctl
fields with offset &gt; 0. Such accesses are already allowed for a wide
range of structs: __sk_buff, bpf_sock_addr and sk_msg_md to name a few.
Extend this support to bpf_sysctl by using bpf_ctx_range instead of
offsetof when matching field offsets.

Fixes: 7b146cebe30c ("bpf: Sysctl hook")
Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx")
Fixes: 9a1027e52535 ("selftests/bpf: Test file_pos field in bpf_sysctl ctx")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/20191028122902.9763-1-iii@linux.ibm.com
</content>
</entry>
<entry>
<title>bpf: fix accessing bpf_sysctl.file_pos on s390</title>
<updated>2019-09-16T09:44:05Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-08-16T10:53:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d895a0f16fadb26d22ab531c49768f7642ae5c3e'/>
<id>urn:sha1:d895a0f16fadb26d22ab531c49768f7642ae5c3e</id>
<content type='text'>
"ctx:file_pos sysctl:read write ok" fails on s390 with "Read value  !=
nux". This is because verifier rewrites a complete 32-bit
bpf_sysctl.file_pos update to a partial update of the first 32 bits of
64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian
systems.

Fix by using an offset on big-endian systems.

Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect
a problem there, since it expects to see 0, which it gets with high
probability in error cases, so change it to seek to offset 3 and expect
3 in bpf_sysctl.file_pos.

Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
</content>
</entry>
<entry>
<title>bpf: always allocate at least 16 bytes for setsockopt hook</title>
<updated>2019-08-01T20:55:52Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-07-29T21:51:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9babe825da769cfbd7080f95e6bddbe98ce6b95d'/>
<id>urn:sha1:9babe825da769cfbd7080f95e6bddbe98ce6b95d</id>
<content type='text'>
Since we always allocate memory, allocate just a little bit more
for the BPF program in case it need to override user input with
bigger value. The canonical example is TCP_CONGESTION where
input string might be too small to override (nv -&gt; bbr or cubic).

16 bytes are chosen to match the size of TCP_CA_NAME_MAX and can
be extended in the future if needed.

Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: cgroup: Fix build error without CONFIG_NET</title>
<updated>2019-07-08T15:17:00Z</updated>
<author>
<name>YueHaibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2019-07-03T08:26:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6705fea0c799a4efb9a9ce2968a2f7a570e33dc2'/>
<id>urn:sha1:6705fea0c799a4efb9a9ce2968a2f7a570e33dc2</id>
<content type='text'>
If CONFIG_NET is not set and CONFIG_CGROUP_BPF=y,
gcc building fails:

kernel/bpf/cgroup.o: In function `cg_sockopt_func_proto':
cgroup.c:(.text+0x237e): undefined reference to `bpf_sk_storage_get_proto'
cgroup.c:(.text+0x2394): undefined reference to `bpf_sk_storage_delete_proto'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_getsockopt':
(.text+0x2a1f): undefined reference to `lock_sock_nested'
(.text+0x2ca2): undefined reference to `release_sock'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_setsockopt':
(.text+0x3006): undefined reference to `lock_sock_nested'
(.text+0x32bb): undefined reference to `release_sock'

Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Suggested-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: YueHaibing &lt;yuehaibing@huawei.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: implement getsockopt and setsockopt hooks</title>
<updated>2019-06-27T22:25:16Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-06-27T20:38:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0d01da6afc5402f60325c5da31b22f7d56689b49'/>
<id>urn:sha1:0d01da6afc5402f60325c5da31b22f7d56689b49</id>
<content type='text'>
Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
passing them down to the kernel or bypass kernel completely.
BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
kernel returns.
Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

The buffer memory is pre-allocated (because I don't think there is
a precedent for working with __user memory from bpf). This might be
slow to do for each {s,g}etsockopt call, that's why I've added
__cgroup_bpf_prog_array_is_empty that exits early if there is nothing
attached to a cgroup. Note, however, that there is a race between
__cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
program layout might have changed; this should not be a problem
because in general there is a race between multiple calls to
{s,g}etsocktop and user adding/removing bpf progs from a cgroup.

The return code of the BPF program is handled as follows:
* 0: EPERM
* 1: success, continue with next BPF program in the cgroup chain

v9:
* allow overwriting setsockopt arguments (Alexei Starovoitov):
  * use set_fs (same as kernel_setsockopt)
  * buffer is always kzalloc'd (no small on-stack buffer)

v8:
* use s32 for optlen (Andrii Nakryiko)

v7:
* return only 0 or 1 (Alexei Starovoitov)
* always run all progs (Alexei Starovoitov)
* use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
  (decided to use optval=-1 instead, optval=0 might be a valid input)
* call getsockopt hook after kernel handlers (Alexei Starovoitov)

v6:
* rework cgroup chaining; stop as soon as bpf program returns
  0 or 2; see patch with the documentation for the details
* drop Andrii's and Martin's Acked-by (not sure they are comfortable
  with the new state of things)

v5:
* skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

v4:
* don't export bpf_sk_fullsock helper (Martin Lau)
* size != sizeof(__u64) for uapi pointers (Martin Lau)
* offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

v3:
* typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
* reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
  Nakryiko)
* use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
* use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
* new CG_SOCKOPT_ACCESS macro to wrap repeated parts

v2:
* moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
* aligned bpf_sockopt_kern-&gt;buf to 8 bytes (Martin Lau)
* bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
* added [0,2] return code check to verifier (Martin Lau)
* dropped unused buf[64] from the stack (Martin Lau)
* use PTR_TO_SOCKET for bpf_sockopt-&gt;sk (Martin Lau)
* dropped bpf_target_off from ctx rewrites (Martin Lau)
* use return code for kernel bypass (Martin Lau &amp; Andrii Nakryiko)

Cc: Andrii Nakryiko &lt;andriin@fb.com&gt;
Cc: Martin Lau &lt;kafai@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix cgroup bpf release synchronization</title>
<updated>2019-06-27T20:51:58Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2019-06-25T21:38:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e5c891a349d7c556b7b9dc231d6dd78e88a29e5c'/>
<id>urn:sha1:e5c891a349d7c556b7b9dc231d6dd78e88a29e5c</id>
<content type='text'>
Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf
from cgroup itself"), cgroup_bpf release occurs asynchronously
(from a worker context), and before the release of the cgroup itself.

This introduced a previously non-existing race between the release
and update paths. E.g. if a leaf's cgroup_bpf is released and a new
bpf program is attached to the one of ancestor cgroups at the same
time. The race may result in double-free and other memory corruptions.

To fix the problem, let's protect the body of cgroup_bpf_release()
with cgroup_mutex, as it was effectively previously, when all this
code was called from the cgroup release path with cgroup mutex held.

Also let's skip cgroups, which have no chances to invoke a bpf
program, on the update path. If the cgroup bpf refcnt reached 0,
it means that the cgroup is offline (no attached processes), and
there are no associated sockets left. It means there is no point
in updating effective progs array! And it can lead to a leak,
if it happens after the release. So, let's skip such cgroups.

Big thanks for Tejun Heo for discovering and debugging of this problem!

Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2019-06-22T12:59:24Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-06-22T12:59:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=92ad6325cb891bb455487bfe90cc47d18aa6ec37'/>
<id>urn:sha1:92ad6325cb891bb455487bfe90cc47d18aa6ec37</id>
<content type='text'>
Minor SPDX change conflict.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 451</title>
<updated>2019-06-19T15:09:08Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-06-04T08:10:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f85d208658468b1a298f31daddb05a7684969cd4'/>
<id>urn:sha1:f85d208658468b1a298f31daddb05a7684969cd4</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this file is subject to the terms and conditions of version 2 of the
  gnu general public license see the file copying in the main
  directory of the linux distribution for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 5 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Enrico Weigelt &lt;info@metux.net&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081200.872755311@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Update __cgroup_bpf_run_filter_skb with cn</title>
<updated>2019-05-31T23:41:29Z</updated>
<author>
<name>brakmo</name>
<email>brakmo@fb.com</email>
</author>
<published>2019-05-28T23:59:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e7a3160d092aa4dd7fb2ef23335cb3a98400aec6'/>
<id>urn:sha1:e7a3160d092aa4dd7fb2ef23335cb3a98400aec6</id>
<content type='text'>
For egress packets, __cgroup_bpf_fun_filter_skb() will now call
BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() instead of PROG_CGROUP_RUN_ARRAY()
in order to propagate congestion notifications (cn) requests to TCP
callers.

For egress packets, this function can return:
   NET_XMIT_SUCCESS    (0)    - continue with packet output
   NET_XMIT_DROP       (1)    - drop packet and notify TCP to call cwr
   NET_XMIT_CN         (2)    - continue with packet output and notify TCP
                                to call cwr
   -EPERM                     - drop packet

For ingress packets, this function will return -EPERM if any attached
program was found and if it returned != 1 during execution. Otherwise 0
is returned.

Signed-off-by: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: cgroup: properly use bpf_prog_array api</title>
<updated>2019-05-29T13:17:35Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-05-28T21:14:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dbcc1ba26e43bd32cb308e50ac4cb4a29d2f5967'/>
<id>urn:sha1:dbcc1ba26e43bd32cb308e50ac4cb4a29d2f5967</id>
<content type='text'>
Now that we don't have __rcu markers on the bpf_prog_array helpers,
let's use proper rcu_dereference_protected to obtain array pointer
under mutex.

We also don't need __rcu annotations on cgroup_bpf.inactive since
it's not read/updated concurrently.

v4:
* drop cgroup_rcu_xyz wrappers and use rcu APIs directly; presumably
  should be more clear to understand which mutex/refcount protects
  each particular place

v3:
* amend cgroup_rcu_dereference to include percpu_ref_is_dying;
  cgroup_bpf is now reference counted and we don't hold cgroup_mutex
  anymore in cgroup_bpf_release

v2:
* replace xchg with rcu_swap_protected

Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
</feed>
