<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-09-05T12:06:58Z</updated>
<entry>
<title>bpf: fix precision tracking of stack slots</title>
<updated>2019-09-05T12:06:58Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-09-03T22:16:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2339cd6cd0b5401fa3fe886bf1c0cb8822041957'/>
<id>urn:sha1:2339cd6cd0b5401fa3fe886bf1c0cb8822041957</id>
<content type='text'>
The problem can be seen in the following two tests:
0: (bf) r3 = r10
1: (55) if r3 != 0x7b goto pc+0
2: (7a) *(u64 *)(r3 -8) = 0
3: (79) r4 = *(u64 *)(r10 -8)
..
0: (85) call bpf_get_prandom_u32#7
1: (bf) r3 = r10
2: (55) if r3 != 0x7b goto pc+0
3: (7b) *(u64 *)(r3 -8) = r0
4: (79) r4 = *(u64 *)(r10 -8)

When backtracking need to mark R4 it will mark slot fp-8.
But ST or STX into fp-8 could belong to the same block of instructions.
When backtracing is done the parent state may have fp-8 slot
as "unallocated stack". Which will cause verifier to warn
and incorrectly reject such programs.

Writes into stack via non-R10 register are rare. llvm always
generates canonical stack spill/fill.
For such pathological case fall back to conservative precision
tracking instead of rejecting.

Reported-by: syzbot+c8d66267fd2b5955287e@syzkaller.appspotmail.com
Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: handle 32-bit zext during constant blinding</title>
<updated>2019-08-26T21:05:01Z</updated>
<author>
<name>Naveen N. Rao</name>
<email>naveen.n.rao@linux.vnet.ibm.com</email>
</author>
<published>2019-08-21T19:23:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ede7c460b1da5be7b8ef4efe47f1687babf06408'/>
<id>urn:sha1:ede7c460b1da5be7b8ef4efe47f1687babf06408</id>
<content type='text'>
Since BPF constant blinding is performed after the verifier pass, the
ALU32 instructions inserted for doubleword immediate loads don't have a
corresponding zext instruction. This is causing a kernel oops on powerpc
and can be reproduced by running 'test_cgroup_storage' with
bpf_jit_harden=2.

Fix this by emitting BPF_ZEXT during constant blinding if
prog-&gt;aux-&gt;verifier_zext is set.

Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result")
Reported-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Reviewed-by: Jiong Wang &lt;jiong.wang@netronome.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix use after free in prog symbol exposure</title>
<updated>2019-08-23T23:17:47Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-08-23T20:14:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c751798aa224fadc5124b49eeb38fb468c0fa039'/>
<id>urn:sha1:c751798aa224fadc5124b49eeb38fb468c0fa039</id>
<content type='text'>
syzkaller managed to trigger the warning in bpf_jit_free() which checks via
bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
in kallsyms, and subsequently trips over GPF when walking kallsyms entries:

  [...]
  8021q: adding VLAN 0 to HW filter on device batadv0
  8021q: adding VLAN 0 to HW filter on device batadv0
  WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events bpf_prog_free_deferred
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x113/0x167 lib/dump_stack.c:113
   panic+0x212/0x40b kernel/panic.c:214
   __warn.cold.8+0x1b/0x38 kernel/panic.c:571
   report_bug+0x1a4/0x200 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
   do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
  RIP: 0010:bpf_jit_free+0x1e8/0x2a0
  Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff &lt;0f&gt; 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
  RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
  RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
  RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
  RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
  R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
  R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
  BUG: unable to handle kernel paging request at fffffbfff400d000
  #PF error: [normal kernel read fault]
  PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events bpf_prog_free_deferred
  RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
  RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
  RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
  RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
  RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
  Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 &lt;42&gt; 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
  [...]

Upon further debugging, it turns out that whenever we trigger this
issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
but yet bpf_jit_free() reported that the entry is /in use/.

Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
fd is exposed to the public, a parallel close request came in right
before we attempted to do the bpf_prog_kallsyms_add().

Given at this time the prog reference count is one, we start to rip
everything underneath us via bpf_prog_release() -&gt; bpf_prog_put().
The memory is eventually released via deferred free, so we're seeing
that bpf_jit_free() has a kallsym entry because we added it from
bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.

Therefore, move both notifications /before/ we install the fd. The
issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
because upon bpf_prog_get_fd_by_id() we'll take another reference to
the BPF prog, so we're still holding the original reference from the
bpf_prog_load().

Fixes: 6ee52e2a3fe4 ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
Fixes: 74451e66d516 ("bpf: make jited programs visible in traces")
Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>bpf: fix precision tracking in presence of bpf2bpf calls</title>
<updated>2019-08-23T23:17:12Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-08-21T21:07:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6754172c208d9d3dae208c6494611ac167d56688'/>
<id>urn:sha1:6754172c208d9d3dae208c6494611ac167d56688</id>
<content type='text'>
While adding extra tests for precision tracking and extra infra
to adjust verifier heuristics the existing test
"calls: cross frame pruning - liveness propagation" started to fail.
The root cause is the same as described in verifer.c comment:

 * Also if parent's curframe &gt; frame where backtracking started,
 * the verifier need to mark registers in both frames, otherwise callees
 * may incorrectly prune callers. This is similar to
 * commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with caller differences")
 * For now backtracking falls back into conservative marking.

Turned out though that returning -ENOTSUPP from backtrack_insn() and
doing mark_all_scalars_precise() in the current parentage chain is not enough.
Depending on how is_state_visited() heuristic is creating parentage chain
it's possible that callee will incorrectly prune caller.
Fix the issue by setting precise=true earlier and more aggressively.
Before this fix the precision tracking _within_ functions that don't do
bpf2bpf calls would still work. Whereas now precision tracking is completely
disabled when bpf2bpf calls are present anywhere in the program.

No difference in cilium tests (they don't have bpf2bpf calls).
No difference in test_progs though some of them have bpf2bpf calls,
but precision tracking wasn't effective there.

Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2019-07-26T00:35:03Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-07-26T00:35:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28ba934d28cf46c1ea92f6aff28e8d9ec52e194e'/>
<id>urn:sha1:28ba934d28cf46c1ea92f6aff28e8d9ec52e194e</id>
<content type='text'>
Alexei Starovoitov says:

====================
pull-request: bpf 2019-07-25

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix segfault in libbpf, from Andrii.

2) fix gso_segs access, from Eric.

3) tls/sockmap fixes, from Jakub and John.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix narrower loads on s390</title>
<updated>2019-07-23T20:59:33Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-07-19T09:18:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d9b8aadaffa65809d146cf0f8632a22a946367d7'/>
<id>urn:sha1:d9b8aadaffa65809d146cf0f8632a22a946367d7</id>
<content type='text'>
The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.

The preprocessed code of the check is:

{
	__u8 tmp = *((volatile __u8 *)&amp;skb-&gt;len +
		((sizeof(skb-&gt;len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&amp;skb-&gt;len) &amp; 0xFF)) return 2;
};

clang generates the following code for it:

      0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
      1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
      2:	57 30 00 00 00 00 00 ff	r3 &amp;= 255
      3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 &lt;LBB0_10&gt;

Finally, verifier transforms it to:

  0: (61) r2 = *(u32 *)(r1 +104)
  1: (bc) w2 = w2
  2: (74) w2 &gt;&gt;= 24
  3: (bc) w2 = w2
  4: (54) w2 &amp;= 255
  5: (bc) w2 = w2

The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.

Adjust shift count calculation to account for endianness.

Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-07-20T17:45:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-20T17:45:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e6023adc5c6af79ac8ac5b17939f58091fa0d870'/>
<id>urn:sha1:e6023adc5c6af79ac8ac5b17939f58091fa0d870</id>
<content type='text'>
Pull core fixes from Thomas Gleixner:

 - A collection of objtool fixes which address recent fallout partially
   exposed by newer toolchains, clang, BPF and general code changes.

 - Force USER_DS for user stack traces

[ Note: the "objtool fixes" are not all to objtool itself, but for
  kernel code that triggers objtool warnings.

  Things like missing function size annotations, or code that confuses
  the unwinder etc.   - Linus]

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  objtool: Support conditional retpolines
  objtool: Convert insn type to enum
  objtool: Fix seg fault on bad switch table entry
  objtool: Support repeated uses of the same C jump table
  objtool: Refactor jump table code
  objtool: Refactor sibling call detection logic
  objtool: Do frame pointer check before dead end check
  objtool: Change dead_end_function() to return boolean
  objtool: Warn on zero-length functions
  objtool: Refactor function alias logic
  objtool: Track original function across branches
  objtool: Add mcsafe_handle_tail() to the uaccess safe list
  bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
  x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
  x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
  x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
  x86/head/64: Annotate start_cpu0() as non-callable
  x86/entry: Fix thunk function ELF sizes
  x86/kvm: Don't call kvm_spurious_fault() from .fixup
  x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
  ...
</content>
</entry>
<entry>
<title>bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()</title>
<updated>2019-07-18T19:01:06Z</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2019-07-18T01:36:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3193c0836f203a91bef96d88c64cccf0be090d9c'/>
<id>urn:sha1:3193c0836f203a91bef96d88c64cccf0be090d9c</id>
<content type='text'>
On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:

	select_insn:
		jmp *jumptable(, %rax, 8)
		...
	ALU64_ADD_X:
		...
		jmp *jumptable(, %rax, 8)
	ALU_ADD_X:
		...
		jmp *jumptable(, %rax, 8)

to this:

	select_insn:
		mov jumptable, %r12
		jmp *(%r12, %rax, 8)
		...
	ALU64_ADD_X:
		...
		jmp *(%r12, %rax, 8)
	ALU_ADD_X:
		...
		jmp *(%r12, %rax, 8)

The jumptable address is placed in a register once, at the beginning of
the function.  The function execution can then go through multiple
indirect jumps which rely on that same register value.  This has a few
issues:

1) Objtool isn't smart enough to be able to track such a register value
   across multiple recursive indirect jumps through the jump table.

2) With CONFIG_RETPOLINE enabled, this optimization actually results in
   a small slowdown.  I measured a ~4.7% slowdown in the test_bpf
   "tcpdump port 22" selftest.

   This slowdown is actually predicted by the GCC manual:

     Note: When compiling a program using computed gotos, a GCC
     extension, you may get better run-time performance if you
     disable the global common subexpression elimination pass by
     adding -fno-gcse to the command line.

So just disable the optimization for this function.

Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com

</content>
</entry>
<entry>
<title>Merge branch 'x86/debug' into core/urgent</title>
<updated>2019-07-18T18:50:48Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-07-18T18:50:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=54f698f31e595f3f0caf28bfe2bde0c3373f9a60'/>
<id>urn:sha1:54f698f31e595f3f0caf28bfe2bde0c3373f9a60</id>
<content type='text'>
Pick up the two pending objtool patches as the next round of objtool fixes
depend on them.
</content>
</entry>
<entry>
<title>bpf: fix BTF verifier size resolution logic</title>
<updated>2019-07-15T21:02:17Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andriin@fb.com</email>
</author>
<published>2019-07-12T17:25:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1acc5d5c5832da9a98b22374a8fae08ffe31b3f8'/>
<id>urn:sha1:1acc5d5c5832da9a98b22374a8fae08ffe31b3f8</id>
<content type='text'>
BTF verifier has a size resolution bug which in some circumstances leads to
invalid size resolution for, e.g., TYPEDEF modifier.  This happens if we have
[1] PTR -&gt; [2] TYPEDEF -&gt; [3] ARRAY, in which case due to being in pointer
context ARRAY size won't be resolved (because for pointer it doesn't matter, so
it's a sink in pointer context), but it will be permanently remembered as zero
for TYPEDEF and TYPEDEF will be marked as RESOLVED. Eventually ARRAY size will
be resolved correctly, but TYPEDEF resolved_size won't be updated anymore.
This, subsequently, will lead to erroneous map creation failure, if that
TYPEDEF is specified as either key or value, as key_size/value_size won't
correspond to resolved size of TYPEDEF (kernel will believe it's zero).

Note, that if BTF was ordered as [1] ARRAY &lt;- [2] TYPEDEF &lt;- [3] PTR, this
won't be a problem, as by the time we get to TYPEDEF, ARRAY's size is already
calculated and stored.

This bug manifests itself in rejecting BTF-defined maps that use array
typedef as a value type:

typedef int array_t[16];

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __type(value, array_t); /* i.e., array_t *value; */
} test_map SEC(".maps");

The fix consists on not relying on modifier's resolved_size and instead using
modifier's resolved_id (type ID for "concrete" type to which modifier
eventually resolves) and doing size determination for that resolved type. This
allow to preserve existing "early DFS termination" logic for PTR or
STRUCT_OR_ARRAY contexts, but still do correct size determination for modifier
types.

Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
Cc: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
</feed>
