<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/verifier.c, branch v6.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-10-30T02:43:16Z</updated>
<entry>
<title>bpf: disallow 40-bytes extra stack for bpf_fastcall patterns</title>
<updated>2024-10-30T02:43:16Z</updated>
<author>
<name>Eduard Zingerman</name>
<email>eddyz87@gmail.com</email>
</author>
<published>2024-10-29T19:39:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d0b98f6a17a5cb336121302bce0c97eb5fe32d16'/>
<id>urn:sha1:d0b98f6a17a5cb336121302bce0c97eb5fe32d16</id>
<content type='text'>
Hou Tao reported an issue with bpf_fastcall patterns allowing extra
stack space above MAX_BPF_STACK limit. This extra stack allowance is
not integrated properly with the following verifier parts:
- backtracking logic still assumes that stack can't exceed
  MAX_BPF_STACK;
- bpf_verifier_env-&gt;scratched_stack_slots assumes only 64 slots are
  available.

Here is an example of an issue with precision tracking
(note stack slot -8 tracked as precise instead of -520):

    0: (b7) r1 = 42                       ; R1_w=42
    1: (b7) r2 = 42                       ; R2_w=42
    2: (7b) *(u64 *)(r10 -512) = r1       ; R1_w=42 R10=fp0 fp-512_w=42
    3: (7b) *(u64 *)(r10 -520) = r2       ; R2_w=42 R10=fp0 fp-520_w=42
    4: (85) call bpf_get_smp_processor_id#8       ; R0_w=scalar(...)
    5: (79) r2 = *(u64 *)(r10 -520)       ; R2_w=42 R10=fp0 fp-520_w=42
    6: (79) r1 = *(u64 *)(r10 -512)       ; R1_w=42 R10=fp0 fp-512_w=42
    7: (bf) r3 = r10                      ; R3_w=fp0 R10=fp0
    8: (0f) r3 += r2
    mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx -1
    mark_precise: frame0: regs=r2 stack= before 7: (bf) r3 = r10
    mark_precise: frame0: regs=r2 stack= before 6: (79) r1 = *(u64 *)(r10 -512)
    mark_precise: frame0: regs=r2 stack= before 5: (79) r2 = *(u64 *)(r10 -520)
    mark_precise: frame0: regs= stack=-8 before 4: (85) call bpf_get_smp_processor_id#8
    mark_precise: frame0: regs= stack=-8 before 3: (7b) *(u64 *)(r10 -520) = r2
    mark_precise: frame0: regs=r2 stack= before 2: (7b) *(u64 *)(r10 -512) = r1
    mark_precise: frame0: regs=r2 stack= before 1: (b7) r2 = 42
    9: R2_w=42 R3_w=fp42
    9: (95) exit

This patch disables the additional allowance for the moment.
Also, two test cases are removed:
- bpf_fastcall_max_stack_ok:
  it fails w/o additional stack allowance;
- bpf_fastcall_max_stack_fail:
  this test is no longer necessary, stack size follows
  regular rules, pattern invalidation is checked by other
  test cases.

Reported-by: Hou Tao &lt;houtao@huaweicloud.com&gt;
Closes: https://lore.kernel.org/bpf/20241023022752.172005-1-houtao@huaweicloud.com/
Fixes: 5b5f51bff1b6 ("bpf: no_caller_saved_registers attribute for helper calls")
Signed-off-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Tested-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20241029193911.1575719-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Force checkpoint when jmp history is too long</title>
<updated>2024-10-29T18:42:21Z</updated>
<author>
<name>Eduard Zingerman</name>
<email>eddyz87@gmail.com</email>
</author>
<published>2024-10-29T17:26:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa30eb3260b2dea3a68d3c42a39f9a09c5e99cee'/>
<id>urn:sha1:aa30eb3260b2dea3a68d3c42a39f9a09c5e99cee</id>
<content type='text'>
A specifically crafted program might trick verifier into growing very
long jump history within a single bpf_verifier_state instance.
Very long jump history makes mark_chain_precision() unreasonably slow,
especially in case if verifier processes a loop.

Mitigate this by forcing new state in is_state_visited() in case if
current state's jump history is too long.

Use same constant as in `skip_inf_loop_check`, but multiply it by
arbitrarily chosen value 2 to account for jump history containing not
only information about jumps, but also information about stack access.

For an example of problematic program consider the code below,
w/o this patch the example is processed by verifier for ~15 minutes,
before failing to allocate big-enough chunk for jmp_history.

    0: r7 = *(u16 *)(r1 +0);"
    1: r7 += 0x1ab064b9;"
    2: if r7 &amp; 0x702000 goto 1b;
    3: r7 &amp;= 0x1ee60e;"
    4: r7 += r1;"
    5: if r7 s&gt; 0x37d2 goto +0;"
    6: r0 = 0;"
    7: exit;"

Perf profiling shows that most of the time is spent in
mark_chain_precision() ~95%.

The easiest way to explain why this program causes problems is to
apply the following patch:

    diff --git a/include/linux/bpf.h b/include/linux/bpf.h
    index 0c216e71cec7..4b4823961abe 100644
    \--- a/include/linux/bpf.h
    \+++ b/include/linux/bpf.h
    \@@ -1926,7 +1926,7 @@ struct bpf_array {
            };
     };

    -#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
    +#define BPF_COMPLEXITY_LIMIT_INSNS      256 /* yes. 1M insns */
     #define MAX_TAIL_CALL_CNT 33

     /* Maximum number of loops for bpf_loop and bpf_iter_num.
    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
    index f514247ba8ba..75e88be3bb3e 100644
    \--- a/kernel/bpf/verifier.c
    \+++ b/kernel/bpf/verifier.c
    \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
     skip_inf_loop_check:
                            if (!force_new_state &amp;&amp;
                                env-&gt;jmps_processed - env-&gt;prev_jmps_processed &lt; 20 &amp;&amp;
    -                           env-&gt;insn_processed - env-&gt;prev_insn_processed &lt; 100)
    +                           env-&gt;insn_processed - env-&gt;prev_insn_processed &lt; 100) {
    +                               verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur-&gt;jmp_history_cnt is %d\n",
    +                                       env-&gt;insn_idx,
    +                                       env-&gt;jmps_processed - env-&gt;prev_jmps_processed,
    +                                       cur-&gt;jmp_history_cnt);
                                    add_new_state = false;
    +                       }
                            goto miss;
                    }
                    /* If sl-&gt;state is a part of a loop and this loop's entry is a part of
    \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
            if (!add_new_state)
                    return 0;

    +       verbose(env, "is_state_visited: new checkpoint at %d, resetting env-&gt;jmps_processed\n",
    +               env-&gt;insn_idx);
    +
            /* There were no equivalent states, remember the current one.
             * Technically the current state is not proven to be safe yet,
             * but it will either reach outer most bpf_exit (which means it's safe)

And observe verification log:

    ...
    is_state_visited: new checkpoint at 5, resetting env-&gt;jmps_processed
    5: R1=ctx() R7=ctx(...)
    5: (65) if r7 s&gt; 0x37d2 goto pc+0     ; R7=ctx(...)
    6: (b7) r0 = 0                        ; R0_w=0
    7: (95) exit

    from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0
    6: R1=ctx() R7=ctx(...) R10=fp0
    6: (b7) r0 = 0                        ; R0_w=0
    7: (95) exit
    is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur-&gt;jmp_history_cnt is 74

    from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0
    1: R1=ctx() R7_w=scalar(...) R10=fp0
    1: (07) r7 += 447767737
    is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur-&gt;jmp_history_cnt is 75
    2: R7_w=scalar(...)
    2: (45) if r7 &amp; 0x702000 goto pc-2
    ... mark_precise 152 steps for r7 ...
    2: R7_w=scalar(...)
    is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur-&gt;jmp_history_cnt is 75
    1: (07) r7 += 447767737
    is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur-&gt;jmp_history_cnt is 76
    2: R7_w=scalar(...)
    2: (45) if r7 &amp; 0x702000 goto pc-2
    ...
    BPF program is too large. Processed 257 insn

The log output shows that checkpoint at label (1) is never created,
because it is suppressed by `skip_inf_loop_check` logic:
a. When 'if' at (2) is processed it pushes a state with insn_idx (1)
   onto stack and proceeds to (3);
b. At (5) checkpoint is created, and this resets
   env-&gt;{jmps,insns}_processed.
c. Verification proceeds and reaches `exit`;
d. State saved at step (a) is popped from stack and is_state_visited()
   considers if checkpoint needs to be added, but because
   env-&gt;{jmps,insns}_processed had been just reset at step (b)
   the `skip_inf_loop_check` logic forces `add_new_state` to false.
e. Verifier proceeds with current state, which slowly accumulates
   more and more entries in the jump history.

The accumulation of entries in the jump history is a problem because
of two factors:
- it eventually exhausts memory available for kmalloc() allocation;
- mark_chain_precision() traverses the jump history of a state,
  meaning that if `r7` is marked precise, verifier would iterate
  ever growing jump history until parent state boundary is reached.

(note: the log also shows a REG INVARIANTS VIOLATION warning
       upon jset processing, but that's another bug to fix).

With this patch applied, the example above is rejected by verifier
under 1s of time, reaching 1M instructions limit.

The program is a simplified reproducer from syzbot report.
Previous discussion could be found at [1].
The patch does not cause any changes in verification performance,
when tested on selftests from veristat.cfg and cilium programs taken
from [2].

[1] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@gmail.com/
[2] https://github.com/anakryiko/cilium

Changelog:
- v1 -&gt; v2:
  - moved patch to bpf tree;
  - moved force_new_state variable initialization after declaration and
    shortened the comment.
v1: https://lore.kernel.org/bpf/20241018020307.1766906-1-eddyz87@gmail.com/

Fixes: 2589726d12a1 ("bpf: introduce bounded loops")
Reported-by: syzbot+7e46cdef14bf496a3ab4@syzkaller.appspotmail.com
Signed-off-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20241029172641.1042523-1-eddyz87@gmail.com

Closes: https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/
</content>
</entry>
<entry>
<title>bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()</title>
<updated>2024-10-24T05:16:45Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2024-10-23T16:19:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9806f283140ef3e4d259b7646bd8c66026bbaac5'/>
<id>urn:sha1:9806f283140ef3e4d259b7646bd8c66026bbaac5</id>
<content type='text'>
We need `goto next_insn;` at the end of patching instead of `continue;`.
It currently works by accident by making verifier re-process patched
instructions.

Reported-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Fixes: 314a53623cd4 ("bpf: inline bpf_get_branch_snapshot() helper")
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Link: https://lore.kernel.org/r/20241023161916.2896274-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix overloading of MEM_UNINIT's meaning</title>
<updated>2024-10-22T22:42:56Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2024-10-21T15:28:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ea607330a39184f51737c6ae706db7fdca7628e'/>
<id>urn:sha1:8ea607330a39184f51737c6ae706db7fdca7628e</id>
<content type='text'>
Lonial reported an issue in the BPF verifier where check_mem_size_reg()
has the following code:

    if (!tnum_is_const(reg-&gt;var_off))
        /* For unprivileged variable accesses, disable raw
         * mode so that the program is required to
         * initialize all the memory that the helper could
         * just partially fill up.
         */
         meta = NULL;

This means that writes are not checked when the register containing the
size of the passed buffer has not a fixed size. Through this bug, a BPF
program can write to a map which is marked as read-only, for example,
.rodata global maps.

The problem is that MEM_UNINIT's initial meaning that "the passed buffer
to the BPF helper does not need to be initialized" which was added back
in commit 435faee1aae9 ("bpf, verifier: add ARG_PTR_TO_RAW_STACK type")
got overloaded over time with "the passed buffer is being written to".

The problem however is that checks such as the above which were added later
via 06c1c049721a ("bpf: allow helpers access to variable memory") set meta
to NULL in order force the user to always initialize the passed buffer to
the helper. Due to the current double meaning of MEM_UNINIT, this bypasses
verifier write checks to the memory (not boundary checks though) and only
assumes the latter memory is read instead.

Fix this by reverting MEM_UNINIT back to its original meaning, and having
MEM_WRITE as an annotation to BPF helpers in order to then trigger the
BPF verifier checks for writing to memory.

Some notes: check_arg_pair_ok() ensures that for ARG_CONST_SIZE{,_OR_ZERO}
we can access fn-&gt;arg_type[arg - 1] since it must contain a preceding
ARG_PTR_TO_MEM. For check_mem_reg() the meta argument can be removed
altogether since we do check both BPF_READ and BPF_WRITE. Same for the
equivalent check_kfunc_mem_size_reg().

Fixes: 7b3552d3f9f6 ("bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access")
Fixes: 97e6d7dab1ca ("bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access")
Fixes: 15baa55ff5b0 ("bpf/verifier: allow all functions to read user provided context")
Reported-by: Lonial Con &lt;kongln9170@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20241021152809.33343-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix incorrect delta propagation between linked registers</title>
<updated>2024-10-17T18:06:34Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2024-10-16T13:49:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3878ae04e9fc24dacb77a1d32bd87e7d8108599e'/>
<id>urn:sha1:3878ae04e9fc24dacb77a1d32bd87e7d8108599e</id>
<content type='text'>
Nathaniel reported a bug in the linked scalar delta tracking, which can lead
to accepting a program with OOB access. The specific code is related to the
sync_linked_regs() function and the BPF_ADD_CONST flag, which signifies a
constant offset between two scalar registers tracked by the same register id.

The verifier attempts to track "similar" scalars in order to propagate bounds
information learned about one scalar to others. For instance, if r1 and r2
are known to contain the same value, then upon encountering 'if (r1 != 0x1234)
goto xyz', not only does it know that r1 is equal to 0x1234 on the path where
that conditional jump is not taken, it also knows that r2 is.

Additionally, with env-&gt;bpf_capable set, the verifier will track scalars
which should be a constant delta apart (if r1 is known to be one greater than
r2, then if r1 is known to be equal to 0x1234, r2 must be equal to 0x1233.)
The code path for the latter in adjust_reg_min_max_vals() is reached when
processing both 32 and 64-bit addition operations. While adjust_reg_min_max_vals()
knows whether dst_reg was produced by a 32 or a 64-bit addition (based on the
alu32 bool), the only information saved in dst_reg is the id of the source
register (reg-&gt;id, or'ed by BPF_ADD_CONST) and the value of the constant
offset (reg-&gt;off).

Later, the function sync_linked_regs() will attempt to use this information
to propagate bounds information from one register (known_reg) to others,
meaning, for all R in linked_regs, it copies known_reg range (and possibly
adjusting delta) into R for the case of R-&gt;id == known_reg-&gt;id.

For the delta adjustment, meaning, matching reg-&gt;id with BPF_ADD_CONST, the
verifier adjusts the register as reg = known_reg; reg += delta where delta
is computed as (s32)reg-&gt;off - (s32)known_reg-&gt;off and placed as a scalar
into a fake_reg to then simulate the addition of reg += fake_reg. This is
only correct, however, if the value in reg was created by a 64-bit addition.
When reg contains the result of a 32-bit addition operation, its upper 32
bits will always be zero. sync_linked_regs() on the other hand, may cause
the verifier to believe that the addition between fake_reg and reg overflows
into those upper bits. For example, if reg was generated by adding the
constant 1 to known_reg using a 32-bit alu operation, then reg-&gt;off is 1
and known_reg-&gt;off is 0. If known_reg is known to be the constant 0xFFFFFFFF,
sync_linked_regs() will tell the verifier that reg is equal to the constant
0x100000000. This is incorrect as the actual value of reg will be 0, as the
32-bit addition will wrap around.

Example:

  0: (b7) r0 = 0;             R0_w=0
  1: (18) r1 = 0x80000001;    R1_w=0x80000001
  3: (37) r1 /= 1;            R1_w=scalar()
  4: (bf) r2 = r1;            R1_w=scalar(id=1) R2_w=scalar(id=1)
  5: (bf) r4 = r1;            R1_w=scalar(id=1) R4_w=scalar(id=1)
  6: (04) w2 += 2147483647;   R2_w=scalar(id=1+2147483647,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  7: (04) w4 += 0 ;           R4_w=scalar(id=1+0,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  8: (15) if r2 == 0x0 goto pc+1
 10: R0=0 R1=0xffffffff80000001 R2=0x7fffffff R4=0xffffffff80000001 R10=fp0

What can be seen here is that r1 is copied to r2 and r4, such that {r1,r2,r4}.id
are all the same which later lets sync_linked_regs() to be invoked. Then, in
a next step constants are added with alu32 to r2 and r4, setting their -&gt;off,
as well as id |= BPF_ADD_CONST. Next, the conditional will bind r2 and
propagate ranges to its linked registers. The verifier now believes the upper
32 bits of r4 are r4=0xffffffff80000001, while actually r4=r1=0x80000001.

One approach for a simple fix suitable also for stable is to limit the constant
delta tracking to only 64-bit alu addition. If necessary at some later point,
BPF_ADD_CONST could be split into BPF_ADD_CONST64 and BPF_ADD_CONST32 to avoid
mixing the two under the tradeoff to further complicate sync_linked_regs().
However, none of the added tests from dedf56d775c0 ("selftests/bpf: Add tests
for add_const") make this necessary at this point, meaning, BPF CI also passes
with just limiting tracking to 64-bit alu addition.

Fixes: 98d7ca374ba4 ("bpf: Track delta between "linked" registers.")
Reported-by: Nathaniel Theis &lt;nathaniel.theis@nccgroup.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Reviewed-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20241016134913.32249-1-daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Fix truncation bug in coerce_reg_to_size_sx()</title>
<updated>2024-10-15T18:16:24Z</updated>
<author>
<name>Dimitar Kanaliev</name>
<email>dimitar.kanaliev@siteground.com</email>
</author>
<published>2024-10-14T12:11:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ae67b9fb8c4e981e929a665dcaa070f4b05ebdb4'/>
<id>urn:sha1:ae67b9fb8c4e981e929a665dcaa070f4b05ebdb4</id>
<content type='text'>
coerce_reg_to_size_sx() updates the register state after a sign-extension
operation. However, there's a bug in the assignment order of the unsigned
min/max values, leading to incorrect truncation:

  0: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
  1: (57) r0 &amp;= 1                       ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
  2: (07) r0 += 254                     ; R0_w=scalar(smin=umin=smin32=umin32=254,smax=umax=smax32=umax32=255,var_off=(0xfe; 0x1))
  3: (bf) r0 = (s8)r0                   ; R0_w=scalar(smin=smin32=-2,smax=smax32=-1,umin=umin32=0xfffffffe,umax=0xffffffff,var_off=(0xfffffffffffffffe; 0x1))

In the current implementation, the unsigned 32-bit min/max values
(u32_min_value and u32_max_value) are assigned directly from the 64-bit
signed min/max values (s64_min and s64_max):

  reg-&gt;umin_value = reg-&gt;u32_min_value = s64_min;
  reg-&gt;umax_value = reg-&gt;u32_max_value = s64_max;

Due to the chain assigmnent, this is equivalent to:

  reg-&gt;u32_min_value = s64_min;  // Unintended truncation
  reg-&gt;umin_value = reg-&gt;u32_min_value;
  reg-&gt;u32_max_value = s64_max;  // Unintended truncation
  reg-&gt;umax_value = reg-&gt;u32_max_value;

Fixes: 1f9a1ea821ff ("bpf: Support new sign-extension load insns")
Reported-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Reported-by: Zac Ecob &lt;zacecob@protonmail.com&gt;
Signed-off-by: Dimitar Kanaliev &lt;dimitar.kanaliev@siteground.com&gt;
Acked-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Reviewed-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Link: https://lore.kernel.org/r/20241014121155.92887-2-dimitar.kanaliev@siteground.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix kfunc btf caching for modules</title>
<updated>2024-10-10T17:44:03Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2024-10-10T13:27:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6cb86a0fdece87e126323ec1bb19deb16a52aedf'/>
<id>urn:sha1:6cb86a0fdece87e126323ec1bb19deb16a52aedf</id>
<content type='text'>
The verifier contains a cache for looking up module BTF objects when
calling kfuncs defined in modules. This cache uses a 'struct
bpf_kfunc_btf_tab', which contains a sorted list of BTF objects that
were already seen in the current verifier run, and the BTF objects are
looked up by the offset stored in the relocated call instruction using
bsearch().

The first time a given offset is seen, the module BTF is loaded from the
file descriptor passed in by libbpf, and stored into the cache. However,
there's a bug in the code storing the new entry: it stores a pointer to
the new cache entry, then calls sort() to keep the cache sorted for the
next lookup using bsearch(), and then returns the entry that was just
stored through the stored pointer. However, because sort() modifies the
list of entries in place *by value*, the stored pointer may no longer
point to the right entry, in which case the wrong BTF object will be
returned.

The end result of this is an intermittent bug where, if a BPF program
calls two functions with the same signature in two different modules,
the function from the wrong module may sometimes end up being called.
Whether this happens depends on the order of the calls in the BPF
program (as that affects whether sort() reorders the array of BTF
objects), making it especially hard to track down. Simon, credited as
reporter below, spent significant effort analysing and creating a
reproducer for this issue. The reproducer is added as a selftest in a
subsequent patch.

The fix is straight forward: simply don't use the stored pointer after
calling sort(). Since we already have an on-stack pointer to the BTF
object itself at the point where the function return, just use that, and
populate it from the cache entry in the branch where the lookup
succeeds.

Fixes: 2357672c54c3 ("bpf: Introduce BPF support for kernel module function calls")
Reported-by: Simon Sundberg &lt;simon.sundberg@kau.se&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-1-745af6c1af98@redhat.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: use kvzmalloc to allocate BPF verifier environment</title>
<updated>2024-10-10T01:13:05Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@surriel.com</email>
</author>
<published>2024-10-08T21:07:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=434247637c66e1be2bc71a9987d4c3f0d8672387'/>
<id>urn:sha1:434247637c66e1be2bc71a9987d4c3f0d8672387</id>
<content type='text'>
The kzmalloc call in bpf_check can fail when memory is very fragmented,
which in turn can lead to an OOM kill.

Use kvzmalloc to fall back to vmalloc when memory is too fragmented to
allocate an order 3 sized bpf verifier environment.

Admittedly this is not a very common case, and only happens on systems
where memory has already been squeezed close to the limit, but this does
not seem like much of a hot path, and it's a simple enough fix.

Signed-off-by: Rik van Riel &lt;riel@surriel.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Link: https://lore.kernel.org/r/20241008170735.16766766@imladris.surriel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: sync_linked_regs() must preserve subreg_def</title>
<updated>2024-10-01T15:18:52Z</updated>
<author>
<name>Eduard Zingerman</name>
<email>eddyz87@gmail.com</email>
</author>
<published>2024-09-24T21:08:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e9bd9c498cb0f5843996dbe5cbce7a1836a83c70'/>
<id>urn:sha1:e9bd9c498cb0f5843996dbe5cbce7a1836a83c70</id>
<content type='text'>
Range propagation must not affect subreg_def marks, otherwise the
following example is rewritten by verifier incorrectly when
BPF_F_TEST_RND_HI32 flag is set:

  0: call bpf_ktime_get_ns                   call bpf_ktime_get_ns
  1: r0 &amp;= 0x7fffffff       after verifier   r0 &amp;= 0x7fffffff
  2: w1 = w0                rewrites         w1 = w0
  3: if w0 &lt; 10 goto +0     --------------&gt;  r11 = 0x2f5674a6     (r)
  4: r1 &gt;&gt;= 32                               r11 &lt;&lt;= 32           (r)
  5: r0 = r1                                 r1 |= r11            (r)
  6: exit;                                   if w0 &lt; 0xa goto pc+0
                                             r1 &gt;&gt;= 32
                                             r0 = r1
                                             exit

(or zero extension of w1 at (2) is missing for architectures that
 require zero extension for upper register half).

The following happens w/o this patch:
- r0 is marked as not a subreg at (0);
- w1 is marked as subreg at (2);
- w1 subreg_def is overridden at (3) by copy_register_state();
- w1 is read at (5) but mark_insn_zext() does not mark (2)
  for zero extension, because w1 subreg_def is not set;
- because of BPF_F_TEST_RND_HI32 flag verifier inserts random
  value for hi32 bits of (2) (marked (r));
- this random value is read at (5).

Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
Reported-by: Lonial Con &lt;kongln9170@gmail.com&gt;
Signed-off-by: Lonial Con &lt;kongln9170@gmail.com&gt;
Signed-off-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Closes: https://lore.kernel.org/bpf/7e2aa30a62d740db182c170fdd8f81c596df280d.camel@gmail.com
Link: https://lore.kernel.org/bpf/20240924210844.1758441-1-eddyz87@gmail.com
</content>
</entry>
<entry>
<title>Merge tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2024-09-24T21:54:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-09-24T21:54:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa8380a06bd0523e51f826520aac1beb8c585521'/>
<id>urn:sha1:fa8380a06bd0523e51f826520aac1beb8c585521</id>
<content type='text'>
Pull bpf 'struct fd' updates from Alexei Starovoitov:
 "This includes struct_fd BPF changes from Al and Andrii"

* tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  bpf: convert bpf_token_create() to CLASS(fd, ...)
  security,bpf: constify struct path in bpf_token_create() LSM hook
  bpf: more trivial fdget() conversions
  bpf: trivial conversions for fdget()
  bpf: switch maps to CLASS(fd, ...)
  bpf: factor out fetching bpf_map from FD and adding it to used_maps list
  bpf: switch fdget_raw() uses to CLASS(fd_raw, ...)
  bpf: convert __bpf_prog_get() to CLASS(fd, ...)
</content>
</entry>
</feed>
