<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/cgroup.c, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-07-08T15:17:00Z</updated>
<entry>
<title>bpf: cgroup: Fix build error without CONFIG_NET</title>
<updated>2019-07-08T15:17:00Z</updated>
<author>
<name>YueHaibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2019-07-03T08:26:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6705fea0c799a4efb9a9ce2968a2f7a570e33dc2'/>
<id>urn:sha1:6705fea0c799a4efb9a9ce2968a2f7a570e33dc2</id>
<content type='text'>
If CONFIG_NET is not set and CONFIG_CGROUP_BPF=y,
gcc building fails:

kernel/bpf/cgroup.o: In function `cg_sockopt_func_proto':
cgroup.c:(.text+0x237e): undefined reference to `bpf_sk_storage_get_proto'
cgroup.c:(.text+0x2394): undefined reference to `bpf_sk_storage_delete_proto'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_getsockopt':
(.text+0x2a1f): undefined reference to `lock_sock_nested'
(.text+0x2ca2): undefined reference to `release_sock'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_setsockopt':
(.text+0x3006): undefined reference to `lock_sock_nested'
(.text+0x32bb): undefined reference to `release_sock'

Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Suggested-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: YueHaibing &lt;yuehaibing@huawei.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: implement getsockopt and setsockopt hooks</title>
<updated>2019-06-27T22:25:16Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-06-27T20:38:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0d01da6afc5402f60325c5da31b22f7d56689b49'/>
<id>urn:sha1:0d01da6afc5402f60325c5da31b22f7d56689b49</id>
<content type='text'>
Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
passing them down to the kernel or bypass kernel completely.
BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
kernel returns.
Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

The buffer memory is pre-allocated (because I don't think there is
a precedent for working with __user memory from bpf). This might be
slow to do for each {s,g}etsockopt call, that's why I've added
__cgroup_bpf_prog_array_is_empty that exits early if there is nothing
attached to a cgroup. Note, however, that there is a race between
__cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
program layout might have changed; this should not be a problem
because in general there is a race between multiple calls to
{s,g}etsocktop and user adding/removing bpf progs from a cgroup.

The return code of the BPF program is handled as follows:
* 0: EPERM
* 1: success, continue with next BPF program in the cgroup chain

v9:
* allow overwriting setsockopt arguments (Alexei Starovoitov):
  * use set_fs (same as kernel_setsockopt)
  * buffer is always kzalloc'd (no small on-stack buffer)

v8:
* use s32 for optlen (Andrii Nakryiko)

v7:
* return only 0 or 1 (Alexei Starovoitov)
* always run all progs (Alexei Starovoitov)
* use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
  (decided to use optval=-1 instead, optval=0 might be a valid input)
* call getsockopt hook after kernel handlers (Alexei Starovoitov)

v6:
* rework cgroup chaining; stop as soon as bpf program returns
  0 or 2; see patch with the documentation for the details
* drop Andrii's and Martin's Acked-by (not sure they are comfortable
  with the new state of things)

v5:
* skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

v4:
* don't export bpf_sk_fullsock helper (Martin Lau)
* size != sizeof(__u64) for uapi pointers (Martin Lau)
* offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

v3:
* typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
* reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
  Nakryiko)
* use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
* use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
* new CG_SOCKOPT_ACCESS macro to wrap repeated parts

v2:
* moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
* aligned bpf_sockopt_kern-&gt;buf to 8 bytes (Martin Lau)
* bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
* added [0,2] return code check to verifier (Martin Lau)
* dropped unused buf[64] from the stack (Martin Lau)
* use PTR_TO_SOCKET for bpf_sockopt-&gt;sk (Martin Lau)
* dropped bpf_target_off from ctx rewrites (Martin Lau)
* use return code for kernel bypass (Martin Lau &amp; Andrii Nakryiko)

Cc: Andrii Nakryiko &lt;andriin@fb.com&gt;
Cc: Martin Lau &lt;kafai@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix cgroup bpf release synchronization</title>
<updated>2019-06-27T20:51:58Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2019-06-25T21:38:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e5c891a349d7c556b7b9dc231d6dd78e88a29e5c'/>
<id>urn:sha1:e5c891a349d7c556b7b9dc231d6dd78e88a29e5c</id>
<content type='text'>
Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf
from cgroup itself"), cgroup_bpf release occurs asynchronously
(from a worker context), and before the release of the cgroup itself.

This introduced a previously non-existing race between the release
and update paths. E.g. if a leaf's cgroup_bpf is released and a new
bpf program is attached to the one of ancestor cgroups at the same
time. The race may result in double-free and other memory corruptions.

To fix the problem, let's protect the body of cgroup_bpf_release()
with cgroup_mutex, as it was effectively previously, when all this
code was called from the cgroup release path with cgroup mutex held.

Also let's skip cgroups, which have no chances to invoke a bpf
program, on the update path. If the cgroup bpf refcnt reached 0,
it means that the cgroup is offline (no attached processes), and
there are no associated sockets left. It means there is no point
in updating effective progs array! And it can lead to a leak,
if it happens after the release. So, let's skip such cgroups.

Big thanks for Tejun Heo for discovering and debugging of this problem!

Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2019-06-22T12:59:24Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-06-22T12:59:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=92ad6325cb891bb455487bfe90cc47d18aa6ec37'/>
<id>urn:sha1:92ad6325cb891bb455487bfe90cc47d18aa6ec37</id>
<content type='text'>
Minor SPDX change conflict.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 451</title>
<updated>2019-06-19T15:09:08Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-06-04T08:10:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f85d208658468b1a298f31daddb05a7684969cd4'/>
<id>urn:sha1:f85d208658468b1a298f31daddb05a7684969cd4</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this file is subject to the terms and conditions of version 2 of the
  gnu general public license see the file copying in the main
  directory of the linux distribution for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 5 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Enrico Weigelt &lt;info@metux.net&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081200.872755311@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Update __cgroup_bpf_run_filter_skb with cn</title>
<updated>2019-05-31T23:41:29Z</updated>
<author>
<name>brakmo</name>
<email>brakmo@fb.com</email>
</author>
<published>2019-05-28T23:59:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e7a3160d092aa4dd7fb2ef23335cb3a98400aec6'/>
<id>urn:sha1:e7a3160d092aa4dd7fb2ef23335cb3a98400aec6</id>
<content type='text'>
For egress packets, __cgroup_bpf_fun_filter_skb() will now call
BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() instead of PROG_CGROUP_RUN_ARRAY()
in order to propagate congestion notifications (cn) requests to TCP
callers.

For egress packets, this function can return:
   NET_XMIT_SUCCESS    (0)    - continue with packet output
   NET_XMIT_DROP       (1)    - drop packet and notify TCP to call cwr
   NET_XMIT_CN         (2)    - continue with packet output and notify TCP
                                to call cwr
   -EPERM                     - drop packet

For ingress packets, this function will return -EPERM if any attached
program was found and if it returned != 1 during execution. Otherwise 0
is returned.

Signed-off-by: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: cgroup: properly use bpf_prog_array api</title>
<updated>2019-05-29T13:17:35Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2019-05-28T21:14:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dbcc1ba26e43bd32cb308e50ac4cb4a29d2f5967'/>
<id>urn:sha1:dbcc1ba26e43bd32cb308e50ac4cb4a29d2f5967</id>
<content type='text'>
Now that we don't have __rcu markers on the bpf_prog_array helpers,
let's use proper rcu_dereference_protected to obtain array pointer
under mutex.

We also don't need __rcu annotations on cgroup_bpf.inactive since
it's not read/updated concurrently.

v4:
* drop cgroup_rcu_xyz wrappers and use rcu APIs directly; presumably
  should be more clear to understand which mutex/refcount protects
  each particular place

v3:
* amend cgroup_rcu_dereference to include percpu_ref_is_dying;
  cgroup_bpf is now reference counted and we don't hold cgroup_mutex
  anymore in cgroup_bpf_release

v2:
* replace xchg with rcu_swap_protected

Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: decouple the lifetime of cgroup_bpf from cgroup itself</title>
<updated>2019-05-28T16:30:02Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2019-05-25T16:37:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4bfc0bb2c60e2f4cc8eb60f03cf8dfa72336272a'/>
<id>urn:sha1:4bfc0bb2c60e2f4cc8eb60f03cf8dfa72336272a</id>
<content type='text'>
Currently the lifetime of bpf programs attached to a cgroup is bound
to the lifetime of the cgroup itself. It means that if a user
forgets (or intentionally avoids) to detach a bpf program before
removing the cgroup, it will stay attached up to the release of the
cgroup. Since the cgroup can stay in the dying state (the state
between being rmdir()'ed and being released) for a very long time, it
leads to a waste of memory. Also, it blocks a possibility to implement
the memcg-based memory accounting for bpf objects, because a circular
reference dependency will occur. Charged memory pages are pinning the
corresponding memory cgroup, and if the memory cgroup is pinning
the attached bpf program, nothing will be ever released.

A dying cgroup can not contain any processes, so the only chance for
an attached bpf program to be executed is a live socket associated
with the cgroup. So in order to release all bpf data early, let's
count associated sockets using a new percpu refcounter. On cgroup
removal the counter is transitioned to the atomic mode, and as soon
as it reaches 0, all bpf programs are detached.

Because cgroup_bpf_release() can block, it can't be called from
the percpu ref counter callback directly, so instead an asynchronous
work is scheduled.

The reference counter is not socket specific, and can be used for any
other types of programs, which can be executed from a cgroup-bpf hook
outside of the process context, had such a need arise in the future.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: jolsa@redhat.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: add map helper functions push, pop, peek in more BPF programs</title>
<updated>2019-04-16T08:24:02Z</updated>
<author>
<name>Alban Crequy</name>
<email>alban@kinvolk.io</email>
</author>
<published>2019-04-14T16:58:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02a8c817a31606b6b37c2b755f6569903f44241e'/>
<id>urn:sha1:02a8c817a31606b6b37c2b755f6569903f44241e</id>
<content type='text'>
commit f1a2e44a3aec ("bpf: add queue and stack maps") introduced new BPF
helper functions:
- BPF_FUNC_map_push_elem
- BPF_FUNC_map_pop_elem
- BPF_FUNC_map_peek_elem

but they were made available only for network BPF programs. This patch
makes them available for tracepoint, cgroup and lirc programs.

Signed-off-by: Alban Crequy &lt;alban@kinvolk.io&gt;
Cc: Mauricio Vasquez B &lt;mauricio.vasquez@polito.it&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: Fix distinct pointer types warning for ARCH=i386</title>
<updated>2019-04-12T23:11:55Z</updated>
<author>
<name>Andrey Ignatov</name>
<email>rdna@fb.com</email>
</author>
<published>2019-04-12T23:01:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51356ac89b5a15e5207e8740d5f4f8b71cb7332f'/>
<id>urn:sha1:51356ac89b5a15e5207e8740d5f4f8b71cb7332f</id>
<content type='text'>
Fix a new warning reported by kbuild for make ARCH=i386:

   In file included from kernel/bpf/cgroup.c:11:0:
   kernel/bpf/cgroup.c: In function '__cgroup_bpf_run_filter_sysctl':
   include/linux/kernel.h:827:29: warning: comparison of distinct pointer types lacks a cast
      (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                                ^
   include/linux/kernel.h:841:4: note: in expansion of macro '__typecheck'
      (__typecheck(x, y) &amp;&amp; __no_side_effects(x, y))
       ^~~~~~~~~~~
   include/linux/kernel.h:851:24: note: in expansion of macro '__safe_cmp'
     __builtin_choose_expr(__safe_cmp(x, y), \
                           ^~~~~~~~~~
   include/linux/kernel.h:860:19: note: in expansion of macro '__careful_cmp'
    #define min(x, y) __careful_cmp(x, y, &lt;)
                      ^~~~~~~~~~~~~
&gt;&gt; kernel/bpf/cgroup.c:837:17: note: in expansion of macro 'min'
      ctx.new_len = min(PAGE_SIZE, *pcount);
                    ^~~

Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
Signed-off-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
