<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/core.c, branch v5.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-12-17T16:58:02Z</updated>
<entry>
<title>bpf: Fix cgroup local storage prog tracking</title>
<updated>2019-12-17T16:58:02Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-12-17T12:28:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e47304232b373362228bf233f17bd12b11c9aafc'/>
<id>urn:sha1:e47304232b373362228bf233f17bd12b11c9aafc</id>
<content type='text'>
Recently noticed that we're tracking programs related to local storage maps
through their prog pointer. This is a wrong assumption since the prog pointer
can still change throughout the verification process, for example, whenever
bpf_patch_insn_single() is called.

Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
and the map would therefore remain in busy state forever. Fix this by using
the prog's aux pointer which is stable throughout verification and beyond.

Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/1471c69eca3022218666f909bc927a92388fd09e.1576580332.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Fix missing prog untrack in release_maps</title>
<updated>2019-12-16T18:59:29Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-12-16T16:49:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a2ea07465c8d7984cc6b8b1f0b3324f9b138094a'/>
<id>urn:sha1:a2ea07465c8d7984cc6b8b1f0b3324f9b138094a</id>
<content type='text'>
Commit da765a2f5993 ("bpf: Add poke dependency tracking for prog array
maps") wrongly assumed that in case of prog load errors, we're cleaning
up all program tracking via bpf_free_used_maps().

However, it can happen that we're still at the point where we didn't copy
map pointers into the prog's aux section such that env-&gt;prog-&gt;aux-&gt;used_maps
is still zero, running into a UAF. In such case, the verifier has similar
release_maps() helper that drops references to used maps from its env.

Consolidate the release code into __bpf_free_used_maps() and call it from
all sides to fix it.

Fixes: da765a2f5993 ("bpf: Add poke dependency tracking for prog array maps")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/1c2909484ca524ae9f55109b06f22b6213e76376.1576514756.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Add poke dependency tracking for prog array maps</title>
<updated>2019-11-25T01:04:11Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-11-22T20:07:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=da765a2f599304a81a25e77908d1790414ecdbb6'/>
<id>urn:sha1:da765a2f599304a81a25e77908d1790414ecdbb6</id>
<content type='text'>
This work adds program tracking to prog array maps. This is needed such
that upon prog array updates/deletions we can fix up all programs which
make use of this tail call map. We add ops-&gt;map_poke_{un,}track()
helpers to maps to maintain the list of programs and ops-&gt;map_poke_run()
for triggering the actual update.

bpf_array_aux is extended to contain the list head and poke_mutex in
order to serialize program patching during updates/deletions.
bpf_free_used_maps() will untrack the program shortly before dropping
the reference to the map. For clearing out the prog array once all urefs
are dropped we need to use schedule_work() to have a sleepable context.

The prog_array_map_poke_run() is triggered during updates/deletions and
walks the maintained prog list. It checks in their poke_tabs whether the
map and key is matching and runs the actual bpf_arch_text_poke() for
patching in the nop or new jmp location. Depending on the type of update,
we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/1fb364bb3c565b3e415d5ea348f036ff379e779d.1574452833.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Add initial poke descriptor table for jit images</title>
<updated>2019-11-25T01:04:11Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-11-22T20:07:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a66886fe6c24ebeeb6dc10fbd9b75158029eacf7'/>
<id>urn:sha1:a66886fe6c24ebeeb6dc10fbd9b75158029eacf7</id>
<content type='text'>
Add initial poke table data structures and management to the BPF
prog that can later be used by JITs. Also add an instance of poke
specific data for tail call maps; plan for later work is to extend
this also for BPF static keys.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/1db285ec2ea4207ee0455b3f8e191a4fc58b9ade.1574452833.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Move owner type, jited info into array auxiliary data</title>
<updated>2019-11-25T01:04:11Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-11-22T20:07:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2beee5f57441413b64a9c2bd657e17beabb98d1c'/>
<id>urn:sha1:2beee5f57441413b64a9c2bd657e17beabb98d1c</id>
<content type='text'>
We're going to extend this with further information which is only
relevant for prog array at this point. Given this info is not used
in critical path, move it into its own structure such that the main
array map structure can be kept on diet.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/b9ddccdb0f6f7026489ee955f16c96381e1e7238.1574452833.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Move bpf_free_used_maps into sleepable section</title>
<updated>2019-11-25T01:03:44Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-11-22T20:07:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6332be04c039a72fca32ed0a4265bac58d606bb6'/>
<id>urn:sha1:6332be04c039a72fca32ed0a4265bac58d606bb6</id>
<content type='text'>
We later on are going to need a sleepable context as opposed to plain
RCU callback in order to untrack programs we need to poke at runtime
and tracking as well as image update is performed under mutex.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/09823b1d5262876e9b83a8e75df04cf0467357a4.1574452833.git.daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Support attaching tracing BPF program to other BPF programs</title>
<updated>2019-11-15T22:45:24Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-11-14T18:57:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b92a28aae4dd0f88778d540ecfdcdaec5a41723'/>
<id>urn:sha1:5b92a28aae4dd0f88778d540ecfdcdaec5a41723</id>
<content type='text'>
Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type
including their subprograms. This feature allows snooping on input and output
packets in XDP, TC programs including their return values. In order to do that
the verifier needs to track types not only of vmlinux, but types of other BPF
programs as well. The verifier also needs to translate uapi/linux/bpf.h types
used by networking programs into kernel internal BTF types used by FENTRY/FEXIT
BPF programs. In some cases LLVM optimizations can remove arguments from BPF
subprograms without adjusting BTF info that LLVM backend knows. When BTF info
disagrees with actual types that the verifiers sees the BPF trampoline has to
fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT
program can still attach to such subprograms, but it won't be able to recognize
pointer types like 'struct sk_buff *' and it won't be able to pass them to
bpf_skb_output() for dumping packets to user space. The FENTRY/FEXIT program
would need to use bpf_probe_read_kernel() instead.

The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set
to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd
points to previously loaded BPF program the attach_btf_id is BTF type id of
main function or one of its subprograms.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20191114185720.1641606-18-ast@kernel.org
</content>
</entry>
<entry>
<title>bpf: Introduce BPF trampoline</title>
<updated>2019-11-15T22:41:51Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-11-14T18:57:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fec56f5890d93fc2ed74166c397dc186b1c25951'/>
<id>urn:sha1:fec56f5890d93fc2ed74166c397dc186b1c25951</id>
<content type='text'>
Introduce BPF trampoline concept to allow kernel code to call into BPF programs
with practically zero overhead.  The trampoline generation logic is
architecture dependent.  It's converting native calling convention into BPF
calling convention.  BPF ISA is 64-bit (even on 32-bit architectures). The
registers R1 to R5 are used to pass arguments into BPF functions. The main BPF
program accepts only single argument "ctx" in R1. Whereas CPU native calling
convention is different. x86-64 is passing first 6 arguments in registers
and the rest on the stack. x86-32 is passing first 3 arguments in registers.
sparc64 is passing first 6 in registers. And so on.

The trampolines between BPF and kernel already exist.  BPF_CALL_x macros in
include/linux/filter.h statically compile trampolines from BPF into kernel
helpers. They convert up to five u64 arguments into kernel C pointers and
integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On
32-bit architecture they're meaningful.

The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and
__bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert
kernel function arguments into array of u64s that BPF program consumes via
R1=ctx pointer.

This patch set is doing the same job as __bpf_trace_##call() static
trampolines, but dynamically for any kernel function. There are ~22k global
kernel functions that are attachable via nop at function entry. The function
arguments and types are described in BTF.  The job of btf_distill_func_proto()
function is to extract useful information from BTF into "function model" that
architecture dependent trampoline generators will use to generate assembly code
to cast kernel function arguments into array of u64s.  For example the kernel
function eth_type_trans has two pointers. They will be casted to u64 and stored
into stack of generated trampoline. The pointer to that stack space will be
passed into BPF program in R1. On x86-64 such generated trampoline will consume
16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will
make sure that only two u64 are accessed read-only by BPF program. The verifier
will also recognize the precise type of the pointers being accessed and will
not allow typecasting of the pointer to a different type within BPF program.

The tracing use case in the datacenter demonstrated that certain key kernel
functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always
active.  Other functions have both kprobe and kretprobe.  So it is essential to
keep both kernel code and BPF programs executing at maximum speed. Hence
generated BPF trampoline is re-generated every time new program is attached or
detached to maintain maximum performance.

To avoid the high cost of retpoline the attached BPF programs are called
directly. __bpf_prog_enter/exit() are used to support per-program execution
stats.  In the future this logic will be optimized further by adding support
for bpf_stats_enabled_key inside generated assembly code. Introduction of
preemptible and sleepable BPF programs will completely remove the need to call
to __bpf_prog_enter/exit().

Detach of a BPF program from the trampoline should not fail. To avoid memory
allocation in detach path the half of the page is used as a reserve and flipped
after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly
which is enough for BPF tracing use cases. This limit can be increased in the
future.

BPF_TRACE_FENTRY programs have access to raw kernel function arguments while
BPF_TRACE_FEXIT programs have access to kernel return value as well. Often
kprobe BPF program remembers function arguments in a map while kretprobe
fetches arguments from a map and analyzes them together with return value.
BPF_TRACE_FEXIT accelerates this typical use case.

Recursion prevention for kprobe BPF programs is done via per-cpu
bpf_prog_active counter. In practice that turned out to be a mistake. It
caused programs to randomly skip execution. The tracing tools missed results
they were looking for. Hence BPF trampoline doesn't provide builtin recursion
prevention. It's a job of BPF program itself and will be addressed in the
follow up patches.

BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases
in the future. For example to remove retpoline cost from XDP programs.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org
</content>
</entry>
<entry>
<title>bpf: Add bpf_arch_text_poke() helper</title>
<updated>2019-11-15T22:41:28Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-11-14T18:57:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5964b2000f283ff5df366f718e0f083ebbaae977'/>
<id>urn:sha1:5964b2000f283ff5df366f718e0f083ebbaae977</id>
<content type='text'>
Add bpf_arch_text_poke() helper that is used by BPF trampoline logic to patch
nops/calls in kernel text into calls into BPF trampoline and to patch
calls/nops inside BPF programs too.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/20191114185720.1641606-4-ast@kernel.org
</content>
</entry>
<entry>
<title>bpf: Support doubleword alignment in bpf_jit_binary_alloc</title>
<updated>2019-11-15T21:25:00Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-11-15T12:37:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b7b3fc8dd95bc02bd30680da258e09dda55270db'/>
<id>urn:sha1:b7b3fc8dd95bc02bd30680da258e09dda55270db</id>
<content type='text'>
Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
not work: in such cases it silently aligns only to 4 bytes.

On s390, in order to load a constant from memory in a large (&gt;512k) BPF
program, one must use lgrl instruction, whose memory operand must be
aligned on an 8-byte boundary.

This patch makes it possible to request 8-byte alignment from
bpf_jit_binary_alloc, and also makes it issue a warning when an
unsupported alignment is requested.

Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20191115123722.58462-1-iii@linux.ibm.com
</content>
</entry>
</feed>
