<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf, branch v4.20</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.20</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-12-13T18:35:40Z</updated>
<entry>
<title>bpf: verifier: make sure callees don't prune with caller differences</title>
<updated>2018-12-13T18:35:40Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2018-12-13T00:29:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7640ead939247e91e84b7ec6ec001f30193cc7df'/>
<id>urn:sha1:7640ead939247e91e84b7ec6ec001f30193cc7df</id>
<content type='text'>
Currently for liveness and state pruning the register parentage
chains don't include states of the callee.  This makes some sense
as the callee can't access those registers.  However, this means
that READs done after the callee returns will not propagate into
the states of the callee.  Callee will then perform pruning
disregarding differences in caller state.

Example:

   0: (85) call bpf_user_rnd_u32
   1: (b7) r8 = 0
   2: (55) if r0 != 0x0 goto pc+1
   3: (b7) r8 = 1
   4: (bf) r1 = r8
   5: (85) call pc+4
   6: (15) if r8 == 0x1 goto pc+1
   7: (05) *(u64 *)(r9 - 8) = r3
   8: (b7) r0 = 0
   9: (95) exit

   10: (15) if r1 == 0x0 goto pc+0
   11: (95) exit

Here we acquire unknown state with call to get_random() [1].  Then
we store this random state in r8 (either 0 or 1) [1 - 3], and make
a call on line 5.  Callee does nothing but a trivial conditional
jump (to create a pruning point).  Upon return caller checks the
state of r8 and either performs an unsafe read or not.

Verifier will first explore the path with r8 == 1, creating a pruning
point at [11].  The parentage chain for r8 will include only callers
states so once verifier reaches [6] it will mark liveness only on states
in the caller, and not [11].  Now when verifier walks the paths with
r8 == 0 it will reach [11] and since REG_LIVE_READ on r8 was not
propagated there it will prune the walk entirely (stop walking
the entire program, not just the callee).  Since [6] was never walked
with r8 == 0, [7] will be considered dead and replaced with "goto -1"
causing hang at runtime.

This patch weaves the callee's explored states onto the callers
parentage chain.  Rough parentage for r8 would have looked like this
before:

[0] [1] [2] [3] [4] [5]   [10]      [11]      [6]      [7]
     |           |      ,---|----.    |        |        |
  sl0:         sl0:    / sl0:     \ sl0:      sl0:     sl0:
  fr0: r8 &lt;-- fr0: r8&lt;+--fr0: r8   `fr0: r8  ,fr0: r8&lt;-fr0: r8
                       \ fr1: r8 &lt;- fr1: r8 /
                        \__________________/

after:

[0] [1] [2] [3] [4] [5]   [10]      [11]      [6]      [7]
     |           |          |         |        |        |
   sl0:         sl0:      sl0:       sl0:      sl0:     sl0:
   fr0: r8 &lt;-- fr0: r8 &lt;- fr0: r8 &lt;- fr0: r8 &lt;-fr0: r8&lt;-fr0: r8
                          fr1: r8 &lt;- fr1: r8

Now the mark from instruction 6 will travel through callees states.

Note that we don't have to connect r0 because its overwritten by
callees state on return and r1 - r5 because those are not alive
any more once a call is made.

v2:
 - don't connect the callees registers twice (Alexei: suggestion &amp; code)
 - add more details to the comment (Ed &amp; Alexei)
v1: don't unnecessarily link caller saved regs (Jiong)

Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)")
Reported-by: David Beckett &lt;david.beckett@netronome.com&gt;
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: Jiong Wang &lt;jiong.wang@netronome.com&gt;
Reviewed-by: Edward Cree &lt;ecree@solarflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix bpf_jit_limit knob for PAGE_SIZE &gt;= 64K</title>
<updated>2018-12-12T03:12:21Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-12-11T11:14:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3'/>
<id>urn:sha1:fdadd04931c2d7cd294dc5b2b342863f94be53a3</id>
<content type='text'>
Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Reported-by: Michael Roth &lt;mdroth@linux.vnet.ibm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Tested-by: Michael Roth &lt;mdroth@linux.vnet.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: add per-insn complexity limit</title>
<updated>2018-12-04T16:22:02Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-12-04T06:46:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ceefbc96fa5c5b975d87bf8e89ba8416f6b764d9'/>
<id>urn:sha1:ceefbc96fa5c5b975d87bf8e89ba8416f6b764d9</id>
<content type='text'>
malicious bpf program may try to force the verifier to remember
a lot of distinct verifier states.
Put a limit to number of per-insn 'struct bpf_verifier_state'.
Note that hitting the limit doesn't reject the program.
It potentially makes the verifier do more steps to analyze the program.
It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner
instead of spending cpu time walking long link list.

The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs
with slight increase in number of "steps" it takes to successfully verify
the programs:
                       before    after
bpf_lb-DLB_L3.o         1940      1940
bpf_lb-DLB_L4.o         3089      3089
bpf_lb-DUNKNOWN.o       1065      1065
bpf_lxc-DDROP_ALL.o     28052  |  28162
bpf_lxc-DUNKNOWN.o      35487  |  35541
bpf_netdev.o            10864     10864
bpf_overlay.o           6643      6643
bpf_lcx_jit.o           38437     38437

But it also makes malicious program to be rejected in 0.4 seconds vs 6.5
Hence apply this limit to unprivileged programs only.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Edward Cree &lt;ecree@solarflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: improve verifier branch analysis</title>
<updated>2018-12-04T16:22:02Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-12-04T06:46:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4f7b3e82589e0de723780198ec7983e427144c0a'/>
<id>urn:sha1:4f7b3e82589e0de723780198ec7983e427144c0a</id>
<content type='text'>
pathological bpf programs may try to force verifier to explode in
the number of branch states:
  20: (d5) if r1 s&lt;= 0x24000028 goto pc+0
  21: (b5) if r0 &lt;= 0xe1fa20 goto pc+2
  22: (d5) if r1 s&lt;= 0x7e goto pc+0
  23: (b5) if r0 &lt;= 0xe880e000 goto pc+0
  24: (c5) if r0 s&lt; 0x2100ecf4 goto pc+0
  25: (d5) if r1 s&lt;= 0xe880e000 goto pc+1
  26: (c5) if r0 s&lt; 0xf4041810 goto pc+0
  27: (d5) if r1 s&lt;= 0x1e007e goto pc+0
  28: (b5) if r0 &lt;= 0xe86be000 goto pc+0
  29: (07) r0 += 16614
  30: (c5) if r0 s&lt; 0x6d0020da goto pc+0
  31: (35) if r0 &gt;= 0x2100ecf4 goto pc+0

Teach verifier to recognize always taken and always not taken branches.
This analysis is already done for == and != comparison.
Expand it to all other branches.

It also helps real bpf programs to be verified faster:
                       before  after
bpf_lb-DLB_L3.o         2003    1940
bpf_lb-DLB_L4.o         3173    3089
bpf_lb-DUNKNOWN.o       1080    1065
bpf_lxc-DDROP_ALL.o     29584   28052
bpf_lxc-DUNKNOWN.o      36916   35487
bpf_netdev.o            11188   10864
bpf_overlay.o           6679    6643
bpf_lcx_jit.o           39555   38437

Reported-by: Anatoly Trosinenko &lt;anatoly.trosinenko@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Edward Cree &lt;ecree@solarflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: check pending signals while verifying programs</title>
<updated>2018-12-04T16:22:02Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-12-04T06:46:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c3494801cd1785e2c25f1a5735fa19ddcf9665da'/>
<id>urn:sha1:c3494801cd1785e2c25f1a5735fa19ddcf9665da</id>
<content type='text'>
Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.

Reported-by: Anatoly Trosinenko &lt;anatoly.trosinenko@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Edward Cree &lt;ecree@solarflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: btf: check name validity for various types</title>
<updated>2018-11-29T00:03:04Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2018-11-27T21:23:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eb04bbb608e683f8fd3ef7f716e2fa32dd90861f'/>
<id>urn:sha1:eb04bbb608e683f8fd3ef7f716e2fa32dd90861f</id>
<content type='text'>
This patch added name checking for the following types:
 . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
   BTF_KIND_CONST, BTF_KIND_RESTRICT:
     the name must be null
 . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
     is either null or a valid identifier
 . BTF_KIND_ENUM: the enum type name is either null or a valid
     identifier; the enumerator name must be a valid identifier.
 . BTF_KIND_FWD: the name must be a valid identifier
 . BTF_KIND_TYPEDEF: the name must be a valid identifier

For those places a valid name is required, the name must be
a valid C identifier. This can be relaxed later if we found
use cases for a different (non-C) frontend.

Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: btf: implement btf_name_valid_identifier()</title>
<updated>2018-11-29T00:03:04Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2018-11-27T21:23:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cdbb096adddb3f42584cecb5ec2e07c26815b71f'/>
<id>urn:sha1:cdbb096adddb3f42584cecb5ec2e07c26815b71f</id>
<content type='text'>
Function btf_name_valid_identifier() have been implemented in
bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and
BTF_KIND_FUNC_PROTO"). Backport this function so later patch
can use it.

Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr</title>
<updated>2018-11-27T01:34:24Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-11-26T13:05:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e2c95a61656d29ceaac97b6a975c8a1f26e26f15'/>
<id>urn:sha1:e2c95a61656d29ceaac97b6a975c8a1f26e26f15</id>
<content type='text'>
Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp-&gt;aux-&gt;func[off]-&gt;bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Tested-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix integer overflow in queue_stack_map</title>
<updated>2018-11-22T20:29:40Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-11-22T18:49:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=813961de3ee6474dd5703e883471fd941d6c8f69'/>
<id>urn:sha1:813961de3ee6474dd5703e883471fd941d6c8f69</id>
<content type='text'>
Fix the following issues:

- allow queue_stack_map for root only
- fix u32 max_entries overflow
- disallow value_size == 0

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Reported-by: Wei Wu &lt;ww9210@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Mauricio Vasquez B &lt;mauricio.vasquez@polito.it&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: allocate local storage buffers using GFP_ATOMIC</title>
<updated>2018-11-17T05:17:49Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guroan@gmail.com</email>
</author>
<published>2018-11-14T18:00:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=569a933b03f3c48b392fe67c0086b3a6b9306b5a'/>
<id>urn:sha1:569a933b03f3c48b392fe67c0086b3a6b9306b5a</id>
<content type='text'>
Naresh reported an issue with the non-atomic memory allocation of
cgroup local storage buffers:

[   73.047526] BUG: sleeping function called from invalid context at
/srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
[   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
[   73.068342] INFO: lockdep is turned off.
[   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
4.20.0-rc2-next-20181113 #1
[   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   73.088018] Call Trace:
[   73.090463]  dump_stack+0x70/0xa5
[   73.093783]  ___might_sleep+0x152/0x240
[   73.097619]  __might_sleep+0x4a/0x80
[   73.101191]  __kmalloc_node+0x1cf/0x2f0
[   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
[   73.109909]  cgroup_storage_update_elem+0x46/0x90

cgroup_storage_update_elem() (as well as other update map update
callbacks) is called with disabled preemption, so GFP_ATOMIC
allocation should be used: e.g. alloc_htab_elem() in hashtab.c.

Reported-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
