<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/core.c, branch v6.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-08-23T22:27:14Z</updated>
<entry>
<title>bpf: Fix a data-race around bpf_jit_limit.</title>
<updated>2022-08-23T22:27:14Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-08-23T21:58:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0947ae1121083d363d522ff7518ee72b55bd8d29'/>
<id>urn:sha1:0947ae1121083d363d522ff7518ee72b55bd8d29</id>
<content type='text'>
While reading bpf_jit_limit, it can be changed concurrently via sysctl,
WRITE_ONCE() in __do_proc_doulongvec_minmax(). The size of bpf_jit_limit
is long, so we need to add a paired READ_ONCE() to avoid load-tearing.

Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20220823215804.2177-1-kuniyu@amazon.com
</content>
</entry>
<entry>
<title>Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2022-07-22T23:55:44Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-07-22T23:55:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3fce974d4239bd46ae81bba07b59f255eb979d7'/>
<id>urn:sha1:b3fce974d4239bd46ae81bba07b59f255eb979d7</id>
<content type='text'>
Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu &amp; Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi &amp; Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Simplify bpf_prog_pack_[size|mask]</title>
<updated>2022-07-22T20:08:27Z</updated>
<author>
<name>Song Liu</name>
<email>song@kernel.org</email>
</author>
<published>2022-07-13T20:49:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea2babac63d40e59926dc5de4550dac94cc3c6d2'/>
<id>urn:sha1:ea2babac63d40e59926dc5de4550dac94cc3c6d2</id>
<content type='text'>
Simplify the logic that selects bpf_prog_pack_size, and always use
(PMD_SIZE * num_possible_nodes()). This is a good tradeoff, as most of
the performance benefit observed is from less direct map fragmentation [0].

Also, module_alloc(4MB) may not allocate 4MB aligned memory. Therefore,
we cannot use (ptr &amp; bpf_prog_pack_mask) to find the correct address of
bpf_prog_pack. Fix this by checking the header address falls in the range
of pack-&gt;ptr and (pack-&gt;ptr + bpf_prog_pack_size).

  [0] https://lore.kernel.org/bpf/20220707223546.4124919-1-song@kernel.org/

Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/bpf/20220713204950.3015201-1-song@kernel.org
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2022-07-14T22:27:35Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-07-14T21:19:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=816cd1688331e0ffa1927889c15e7ed56650a183'/>
<id>urn:sha1:816cd1688331e0ffa1927889c15e7ed56650a183</id>
<content type='text'>
include/net/sock.h
  310731e2f161 ("net: Fix data-races around sysctl_mem.")
  e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"")
https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/

net/ipv4/fib_semantics.c
  747c14307214 ("ip: fix dflt addr selection for connected nexthop")
  d62607c3fe45 ("net: rename reference+tracking helpers")

net/tls/tls.h
include/net/tls.h
  3d8c51b25a23 ("net/tls: Check for errors in tls_device_init")
  587903142308 ("tls: create an internal header")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, x86: fix freeing of not-finalized bpf_prog_pack</title>
<updated>2022-07-13T00:33:22Z</updated>
<author>
<name>Song Liu</name>
<email>song@kernel.org</email>
</author>
<published>2022-07-06T00:26:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1d5f82d9dd477d5c66e0214a68c3e4f308eadd6d'/>
<id>urn:sha1:1d5f82d9dd477d5c66e0214a68c3e4f308eadd6d</id>
<content type='text'>
syzbot reported a few issues with bpf_prog_pack [1], [2]. This only happens
with multiple subprogs. In jit_subprogs(), we first call bpf_int_jit_compile()
on each sub program. And then, we call it on each sub program again. jit_data
is not freed in the first call of bpf_int_jit_compile(). Similarly we don't
call bpf_jit_binary_pack_finalize() in the first call of bpf_int_jit_compile().

If bpf_int_jit_compile() failed for one sub program, we will call
bpf_jit_binary_pack_finalize() for this sub program. However, we don't have a
chance to call it for other sub programs. Then we will hit "goto out_free" in
jit_subprogs(), and call bpf_jit_free on some subprograms that haven't got
bpf_jit_binary_pack_finalize() yet.

At this point, bpf_jit_binary_pack_free() is called and the whole 2MB page is
freed erroneously.

Fix this with a custom bpf_jit_free() for x86_64, which calls
bpf_jit_binary_pack_finalize() if necessary. Also, with custom
bpf_jit_free(), bpf_prog_aux-&gt;use_bpf_prog_pack is not needed any more,
remove it.

Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
[1] https://syzkaller.appspot.com/bug?extid=2f649ec6d2eea1495a8f
[2] https://syzkaller.appspot.com/bug?extid=87f65c75f4a72db05445
Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20220706002612.4013790-1-song@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Make sure mac_header was set before using it</title>
<updated>2022-07-07T18:13:13Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-07-07T12:39:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0326195f523a549e0a9d7fd44c70b26fd7265090'/>
<id>urn:sha1:0326195f523a549e0a9d7fd44c70b26fd7265090</id>
<content type='text'>
Classic BPF has a way to load bytes starting from the mac header.

Some skbs do not have a mac header, and skb_mac_header()
in this case is returning a pointer that 65535 bytes after
skb-&gt;head.

Existing range check in bpf_internal_load_pointer_neg_helper()
was properly kicking and no illegal access was happening.

New sanity check in skb_mac_header() is firing, so we need
to avoid it.

WARNING: CPU: 1 PID: 28990 at include/linux/skbuff.h:2785 skb_mac_header include/linux/skbuff.h:2785 [inline]
WARNING: CPU: 1 PID: 28990 at include/linux/skbuff.h:2785 bpf_internal_load_pointer_neg_helper+0x1b1/0x1c0 kernel/bpf/core.c:74
Modules linked in:
CPU: 1 PID: 28990 Comm: syz-executor.0 Not tainted 5.19.0-rc4-syzkaller-00865-g4874fb9484be #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
RIP: 0010:skb_mac_header include/linux/skbuff.h:2785 [inline]
RIP: 0010:bpf_internal_load_pointer_neg_helper+0x1b1/0x1c0 kernel/bpf/core.c:74
Code: ff ff 45 31 f6 e9 5a ff ff ff e8 aa 27 40 00 e9 3b ff ff ff e8 90 27 40 00 e9 df fe ff ff e8 86 27 40 00 eb 9e e8 2f 2c f3 ff &lt;0f&gt; 0b eb b1 e8 96 27 40 00 e9 79 fe ff ff 90 41 57 41 56 41 55 41
RSP: 0018:ffffc9000309f668 EFLAGS: 00010216
RAX: 0000000000000118 RBX: ffffffffffeff00c RCX: ffffc9000e417000
RDX: 0000000000040000 RSI: ffffffff81873f21 RDI: 0000000000000003
RBP: ffff8880842878c0 R08: 0000000000000003 R09: 000000000000ffff
R10: 000000000000ffff R11: 0000000000000001 R12: 0000000000000004
R13: ffff88803ac56c00 R14: 000000000000ffff R15: dffffc0000000000
FS: 00007f5c88a16700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdaa9f6c058 CR3: 000000003a82c000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
&lt;TASK&gt;
____bpf_skb_load_helper_32 net/core/filter.c:276 [inline]
bpf_skb_load_helper_32+0x191/0x220 net/core/filter.c:264

Fixes: f9aefd6b2aa3 ("net: warn if mac header was not set")
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20220707123900.945305-1-edumazet@google.com
</content>
</entry>
<entry>
<title>bpf: minimize number of allocated lsm slots per program</title>
<updated>2022-06-29T20:21:52Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2022-06-28T17:43:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c0e19f2c9a3edd38e4b1bdae98eb44555d02bc31'/>
<id>urn:sha1:c0e19f2c9a3edd38e4b1bdae98eb44555d02bc31</id>
<content type='text'>
Previous patch adds 1:1 mapping between all 211 LSM hooks
and bpf_cgroup program array. Instead of reserving a slot per
possible hook, reserve 10 slots per cgroup for lsm programs.
Those slots are dynamically allocated on demand and reclaimed.

struct cgroup_bpf {
	struct bpf_prog_array *    effective[33];        /*     0   264 */
	/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
	struct hlist_head          progs[33];            /*   264   264 */
	/* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */
	u8                         flags[33];            /*   528    33 */

	/* XXX 7 bytes hole, try to pack */

	struct list_head           storages;             /*   568    16 */
	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
	struct bpf_prog_array *    inactive;             /*   584     8 */
	struct percpu_ref          refcnt;               /*   592    16 */
	struct work_struct         release_work;         /*   608    72 */

	/* size: 680, cachelines: 11, members: 7 */
	/* sum members: 673, holes: 1, sum holes: 7 */
	/* last cacheline: 40 bytes */
};

Reviewed-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/r/20220628174314.1216643-5-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: per-cgroup lsm flavor</title>
<updated>2022-06-29T20:21:51Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2022-06-28T17:43:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=69fd337a975c7e690dfe49d9cb4fe5ba1e6db44e'/>
<id>urn:sha1:69fd337a975c7e690dfe49d9cb4fe5ba1e6db44e</id>
<content type='text'>
Allow attaching to lsm hooks in the cgroup context.

Attaching to per-cgroup LSM works exactly like attaching
to other per-cgroup hooks. New BPF_LSM_CGROUP is added
to trigger new mode; the actual lsm hook we attach to is
signaled via existing attach_btf_id.

For the hooks that have 'struct socket' or 'struct sock' as its first
argument, we use the cgroup associated with that socket. For the rest,
we use 'current' cgroup (this is all on default hierarchy == v2 only).
Note that for some hooks that work on 'struct sock' we still
take the cgroup from 'current' because some of them work on the socket
that hasn't been properly initialized yet.

Behind the scenes, we allocate a shim program that is attached
to the trampoline and runs cgroup effective BPF programs array.
This shim has some rudimentary ref counting and can be shared
between several programs attaching to the same lsm hook from
different cgroups.

Note that this patch bloats cgroup size because we add 211
cgroup_bpf_attach_type(s) for simplicity sake. This will be
addressed in the subsequent patch.

Also note that we only add non-sleepable flavor for now. To enable
sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
shim programs have to be freed via trace rcu, cgroup_bpf.effective
should be also trace-rcu-managed + maybe some other changes that
I'm not aware of.

Reviewed-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, x64: Add predicate for bpf2bpf with tailcalls support in JIT</title>
<updated>2022-06-21T16:52:04Z</updated>
<author>
<name>Tony Ambardar</name>
<email>tony.ambardar@gmail.com</email>
</author>
<published>2022-06-17T10:57:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=95acd8817e66d031d2e6ee7def3f1e1874819317'/>
<id>urn:sha1:95acd8817e66d031d2e6ee7def3f1e1874819317</id>
<content type='text'>
The BPF core/verifier is hard-coded to permit mixing bpf2bpf and tail
calls for only x86-64. Change the logic to instead rely on a new weak
function 'bool bpf_jit_supports_subprog_tailcalls(void)', which a capable
JIT backend can override.

Update the x86-64 eBPF JIT to reflect this.

Signed-off-by: Tony Ambardar &lt;Tony.Ambardar@gmail.com&gt;
[jakub: drop MIPS bits and tweak patch subject]
Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20220617105735.733938-2-jakub@cloudflare.com
</content>
</entry>
<entry>
<title>bpf: implement sleepable uprobes by chaining gps</title>
<updated>2022-06-17T02:27:29Z</updated>
<author>
<name>Delyan Kratunov</name>
<email>delyank@fb.com</email>
</author>
<published>2022-06-14T23:10:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8c7dcb84e3b744b2b70baa7a44a9b1881c33a9c9'/>
<id>urn:sha1:8c7dcb84e3b744b2b70baa7a44a9b1881c33a9c9</id>
<content type='text'>
uprobes work by raising a trap, setting a task flag from within the
interrupt handler, and processing the actual work for the uprobe on the
way back to userspace. As a result, uprobe handlers already execute in a
might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe
programs is therefore on the bpf side.

Namely, the bpf_prog_array attached to the uprobe is protected by normal
rcu. In order for uprobe bpf programs to become sleepable, it has to be
protected by the tasks_trace rcu flavor instead (and kfree() called after
a corresponding grace period).

Therefore, the free path for bpf_prog_array now chains a tasks_trace and
normal grace periods one after the other.

Users who iterate under tasks_trace read section would
be safe, as would users who iterate under normal read sections (from
non-sleepable locations).

The downside is that the tasks_trace latency affects all perf_event-attached
bpf programs (and not just uprobe ones). This is deemed safe given the
possible attach rates for kprobe/uprobe/tp programs.

Separately, non-sleepable programs need access to dynamically sized
rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes
an rcu read section, in addition to the overarching tasks_trace section.

Signed-off-by: Delyan Kratunov &lt;delyank@fb.com&gt;
Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
