<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/core.c, branch v5.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-03-17T23:22:51Z</updated>
<entry>
<title>bpf: Fix fexit trampoline.</title>
<updated>2021-03-17T23:22:51Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2021-03-16T21:00:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e21aa341785c679dd409c8cb71f864c00fe6c463'/>
<id>urn:sha1:e21aa341785c679dd409c8cb71f864c00fe6c463</id>
<content type='text'>
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half-&gt;second_half-&gt;first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.

Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline")
Reported-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;  # for RCU
Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Explicitly zero-extend R0 after 32-bit cmpxchg</title>
<updated>2021-03-05T03:06:03Z</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2021-03-05T02:56:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=39491867ace594b4912c35f576864d204beed2b3'/>
<id>urn:sha1:39491867ace594b4912c35f576864d204beed2b3</id>
<content type='text'>
As pointed out by Ilya and explained in the new comment, there's a
discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
the value from memory into r0, while x86 only does so when r0 and the
value in memory are different. The same issue affects s390.

At first this might sound like pure semantics, but it makes a real
difference when the comparison is 32-bit, since the load will
zero-extend r0/rax.

The fix is to explicitly zero-extend rax after doing such a
CMPXCHG. Since this problem affects multiple archs, this is done in
the verifier by patching in a BPF_ZEXT_REG instruction after every
32-bit cmpxchg. Any archs that don't need such manual zero-extension
can do a look-ahead with insn_is_zext to skip the unnecessary mov.

Note this still goes on top of Ilya's patch:

https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com/T/#u

Differences v5-&gt;v6[1]:
 - Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it
   and removed 'inline' to match the style of the is_*_function helpers.
 - Fixed up comments in verifier test (thanks for the careful review, Martin!)

Differences v4-&gt;v5[1]:
 - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin
   for suggesting this.

Differences v3-&gt;v4[1]:
 - Moved the optimization against pointless zext into the correct place:
   opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.

Differences v2-&gt;v3[1]:
 - Moved patching into fixup_bpf_calls (patch incoming to rename this function)
 - Added extra commentary on bpf_jit_needs_zext
 - Added check to avoid adding a pointless zext(r0) if there's already one there.

Difference v1-&gt;v2[1]: Now solved centrally in the verifier instead of
  specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!

[1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
    v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
    v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t

Reported-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Tested-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Clear percpu pointers in bpf_prog_clone_free()</title>
<updated>2021-02-22T17:08:35Z</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-02-18T00:16:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53f523f3052ac16bbc7718032aa6b848f971d28c'/>
<id>urn:sha1:53f523f3052ac16bbc7718032aa6b848f971d28c</id>
<content type='text'>
Similar to bpf_prog_realloc(), bpf_prog_clone_create() also copies
the percpu pointers, but the clone still shares them with the original
prog, so we have to clear these two percpu pointers in
bpf_prog_clone_free(). Otherwise we would get a double free:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G                W    OE
  5.11.0-rc4.bm.1-amd64+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
 test_bpf: #1 TXA
 Workqueue: events bpf_prog_free_deferred
 RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
 Code: [...]
 RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
 RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
 RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
 R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
 FS:    0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
 CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
    refill_obj_stock+0x5e/0xd0
    free_percpu+0xee/0x550
    __bpf_prog_free+0x4d/0x60
    process_one_work+0x26a/0x590
    worker_thread+0x3c/0x390
    ? process_one_work+0x590/0x590
    kthread+0x130/0x150
    ? kthread_park+0x80/0x80
    ret_from_fork+0x1f/0x30

This bug is 100% reproducible with test_kmod.sh.

Fixes: 700d4796ef59 ("bpf: Optimize program stats")
Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
Reported-by: Jiang Wang &lt;jiang.wang@bytedance.com&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210218001647.71631-1-xiyou.wangcong@gmail.com
</content>
</entry>
<entry>
<title>bpf: Clear per_cpu pointers during bpf_prog_realloc</title>
<updated>2021-02-12T03:35:00Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2021-02-12T03:35:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1336c662474edec3966c96c8de026f794d16b804'/>
<id>urn:sha1:1336c662474edec3966c96c8de026f794d16b804</id>
<content type='text'>
bpf_prog_realloc copies contents of struct bpf_prog.
The pointers have to be cleared before freeing old struct.

Reported-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Fixes: 700d4796ef59 ("bpf: Optimize program stats")
Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add per-program recursion prevention mechanism</title>
<updated>2021-02-11T15:19:13Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2021-02-10T03:36:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ca06f55b90020cd97f4cc6d52db95436162e7dcf'/>
<id>urn:sha1:ca06f55b90020cd97f4cc6d52db95436162e7dcf</id>
<content type='text'>
Since both sleepable and non-sleepable programs execute under migrate_disable
add recursion prevention mechanism to both types of programs when they're
executed via bpf trampoline.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210210033634.62081-5-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Optimize program stats</title>
<updated>2021-02-11T15:17:50Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2021-02-10T03:36:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=700d4796ef59f5faf240d307839bd419e2b6bdff'/>
<id>urn:sha1:700d4796ef59f5faf240d307839bd419e2b6bdff</id>
<content type='text'>
Move bpf_prog_stats from prog-&gt;aux into prog to avoid one extra load
in critical path of program execution.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210210033634.62081-2-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Add bitwise atomic instructions</title>
<updated>2021-01-15T02:34:29Z</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2021-01-14T18:17:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=981f94c3e92146705baf97fb417a5ed1ab1a79a5'/>
<id>urn:sha1:981f94c3e92146705baf97fb417a5ed1ab1a79a5</id>
<content type='text'>
This adds instructions for

atomic[64]_[fetch_]and
atomic[64]_[fetch_]or
atomic[64]_[fetch_]xor

All these operations are isomorphic enough to implement with the same
verifier, interpreter, and x86 JIT code, hence being a single commit.

The main interesting thing here is that x86 doesn't directly support
the fetch_ version these operations, so we need to generate a CMPXCHG
loop in the JIT. This requires the use of two temporary registers,
IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.

Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com
</content>
</entry>
<entry>
<title>bpf: Pull out a macro for interpreting atomic ALU operations</title>
<updated>2021-01-15T02:34:29Z</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2021-01-14T18:17:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=462910670e4ac91509829c5549bd0227668176fb'/>
<id>urn:sha1:462910670e4ac91509829c5549bd0227668176fb</id>
<content type='text'>
Since the atomic operations that are added in subsequent commits are
all isomorphic with BPF_ADD, pull out a macro to avoid the
interpreter becoming dominated by lines of atomic-related code.

Note that this sacrificies interpreter performance (combining
STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we
need an extra conditional branch to differentiate them) in favour of
compact and (relatively!) simple C code.

Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210114181751.768687-9-jackmanb@google.com
</content>
</entry>
<entry>
<title>bpf: Add instructions for atomic_[cmp]xchg</title>
<updated>2021-01-15T02:34:29Z</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2021-01-14T18:17:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5ffa25502b5ab3d639829a2d1e316cff7f59a41e'/>
<id>urn:sha1:5ffa25502b5ab3d639829a2d1e316cff7f59a41e</id>
<content type='text'>
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.

There are two significant design decisions made for the CMPXCHG
instruction:

 - To solve the issue that this operation fundamentally has 3
   operands, but we only have two register fields. Therefore the
   operand we compare against (the kernel's API calls it 'old') is
   hard-coded to be R0. x86 has similar design (and A64 doesn't
   have this problem).

   A potential alternative might be to encode the other operand's
   register number in the immediate field.

 - The kernel's atomic_cmpxchg returns the old value, while the C11
   userspace APIs return a boolean indicating the comparison
   result. Which should BPF do? A64 returns the old value. x86 returns
   the old value in the hard-coded register (and also sets a
   flag). That means return-old-value is easier to JIT, so that's
   what we use.

Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
</content>
</entry>
<entry>
<title>bpf: Add BPF_FETCH field / create atomic_fetch_add instruction</title>
<updated>2021-01-15T02:34:29Z</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2021-01-14T18:17:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5ca419f2864a2c60940dcf4bbaeb69546200e36f'/>
<id>urn:sha1:5ca419f2864a2c60940dcf4bbaeb69546200e36f</id>
<content type='text'>
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.

Suggested-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
</content>
</entry>
</feed>
