<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/kprobes.c, branch v6.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-02-08T14:29:29Z</updated>
<entry>
<title>kprobes: Remove unnecessary initial values of variables</title>
<updated>2024-02-08T14:29:29Z</updated>
<author>
<name>Li zeming</name>
<email>zeming@nfschina.com</email>
</author>
<published>2023-09-19T01:28:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9efd24ec5599ed485b7c4d9aeb731141f6285167'/>
<id>urn:sha1:9efd24ec5599ed485b7c4d9aeb731141f6285167</id>
<content type='text'>
ri and sym is assigned first, so it does not need to initialize the
assignment.

Link: https://lore.kernel.org/all/20230919012823.7815-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>kprobes: consistent rcu api usage for kretprobe holder</title>
<updated>2023-12-01T05:53:55Z</updated>
<author>
<name>JP Kobryn</name>
<email>inwardvessel@gmail.com</email>
</author>
<published>2023-12-01T05:53:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d839a656d0f3caca9f96e9bf912fd394ac6a11bc'/>
<id>urn:sha1:d839a656d0f3caca9f96e9bf912fd394ac6a11bc</id>
<content type='text'>
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is
RCU-managed, based on the (non-rethook) implementation of get_kretprobe().
The thought behind this patch is to make use of the RCU API where possible
when accessing this pointer so that the needed barriers are always in place
and to self-document the code.

The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes
done to the "rp" pointer are changed to make use of the RCU macro for
assignment. For the single read, the implementation of get_kretprobe()
is simplified by making use of an RCU macro which accomplishes the same,
but note that the log warning text will be more generic.

I did find that there is a difference in assembly generated between the
usage of the RCU macros vs without. For example, on arm64, when using
rcu_assign_pointer(), the corresponding store instruction is a
store-release (STLR) which has an implicit barrier. When normal assignment
is done, a regular store (STR) is found. In the macro case, this seems to
be a result of rcu_assign_pointer() using smp_store_release() when the
value to write is not NULL.

Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/

Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash")
Cc: stable@vger.kernel.org
Signed-off-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>kprobes: kretprobe scalability improvement</title>
<updated>2023-10-18T14:59:54Z</updated>
<author>
<name>wuqiang.matt</name>
<email>wuqiang.matt@bytedance.com</email>
</author>
<published>2023-10-17T13:56:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4bbd9345565933823f38a419df65661f12adbe5e'/>
<id>urn:sha1:4bbd9345565933823f38a419df65661f12adbe5e</id>
<content type='text'>
kretprobe is using freelist to manage return-instances, but freelist,
as LIFO queue based on singly linked list, scales badly and reduces
the overall throughput of kretprobed routines, especially for high
contention scenarios.

Here's a typical throughput test of sys_prctl (counts in 10 seconds,
measured with perf stat -a -I 10000 -e syscalls:sys_enter_prctl):

OS: Debian 10 X86_64, Linux 6.5rc7 with freelist
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s

         1T       2T       4T       8T      16T      24T
   24150045 29317964 15446741 12494489 18287272 17708768
        32T      48T      64T      72T      96T     128T
   16200682 13737658 11645677 11269858 10470118  9931051

This patch introduces objpool to replace freelist. objpool is a
high performance queue, which can bring near-linear scalability
to kretprobed routines. Tests of kretprobe throughput show the
biggest ratio as 159x of original freelist. Here's the result:

                  1T         2T         4T         8T        16T
native:     41186213   82336866  164250978  328662645  658810299
freelist:   24150045   29317964   15446741   12494489   18287272
objpool:    23926730   48010314   96125218  191782984  385091769
                 32T        48T        64T        96T       128T
native:   1330338351 1969957941 2512291791 2615754135 2671040914
freelist:   16200682   13737658   11645677   10470118    9931051
objpool:   764481096 1147149781 1456220214 1502109662 1579015050

Testings on 96-core ARM64 output similarly, but with the biggest
ratio up to 448x:

OS: Debian 10 AARCH64, Linux 6.5rc7
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s

                  1T         2T         4T         8T        16T
native: .   30066096   63569843  126194076  257447289  505800181
freelist:   16152090   11064397   11124068    7215768    5663013
objpool:    13997541   28032100   55726624  110099926  221498787
                 24T        32T        48T        64T        96T
native:    763305277 1015925192 1521075123 2033009392 3021013752
freelist:    5015810    4602893    3766792    3382478    2945292
objpool:   328192025  439439564  668534502  887401381 1319972072

Link: https://lore.kernel.org/all/20231017135654.82270-4-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang.matt &lt;wuqiang.matt@bytedance.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel: kprobes: Use struct_size()</title>
<updated>2023-08-23T00:38:17Z</updated>
<author>
<name>Ruan Jinjie</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2023-07-25T19:54:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8865aea0471c512c2d94220c60a0083cefcb9348'/>
<id>urn:sha1:8865aea0471c512c2d94220c60a0083cefcb9348</id>
<content type='text'>
Use struct_size() instead of hand-writing it, when allocating a structure
with a flex array.

This is less verbose.

Link: https://lore.kernel.org/all/20230725195424.3469242-1-ruanjinjie@huawei.com/

Signed-off-by: Ruan Jinjie &lt;ruanjinjie@huawei.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>kprobes: Prohibit probing on CFI preamble symbol</title>
<updated>2023-07-29T14:32:26Z</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2023-07-11T01:50:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=de02f2ac5d8cfb311f44f2bf144cc20002f1fbbd'/>
<id>urn:sha1:de02f2ac5d8cfb311f44f2bf144cc20002f1fbbd</id>
<content type='text'>
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.

Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2023-07-12T19:01:16Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-07-12T19:01:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9a3236ce48406c3190dfa06137636525001b32f5'/>
<id>urn:sha1:9a3236ce48406c3190dfa06137636525001b32f5</id>
<content type='text'>
Pull probes fixes from Masami Hiramatsu:

 - Fix fprobe's rethook release issues:

     - Release rethook after ftrace_ops is unregistered so that the
       rethook is not accessed after free.

     - Stop rethook before ftrace_ops is unregistered so that the
       rethook is NOT used after exiting unregister_fprobe()

 - Fix eprobe cleanup logic. If it attaches to multiple events and
   failes to enable one of them, rollback all enabled events correctly.

 - Fix fprobe to unlock ftrace recursion lock correctly when it missed
   by another running kprobe.

 - Cleanup kprobe to remove unnecessary NULL.

 - Cleanup kprobe to remove unnecessary 0 initializations.

* tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
  kernel: kprobes: Remove unnecessary ‘0’ values
  kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr
  fprobe: add unlock to match a succeeded ftrace_test_recursion_trylock
  kernel/trace: Fix cleanup logic of enable_trace_eprobe
  fprobe: Release rethook after the ftrace_ops is unregistered
</content>
</entry>
<entry>
<title>kernel: kprobes: Remove unnecessary ‘0’ values</title>
<updated>2023-07-10T15:50:51Z</updated>
<author>
<name>Li zeming</name>
<email>zeming@nfschina.com</email>
</author>
<published>2023-07-11T18:53:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ed9492dfef8738ca68879f5690dda5a04f1897dc'/>
<id>urn:sha1:ed9492dfef8738ca68879f5690dda5a04f1897dc</id>
<content type='text'>
it is assigned first, so it does not need to initialize the assignment.

Link: https://lore.kernel.org/all/20230711185353.3218-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr</title>
<updated>2023-07-10T15:50:35Z</updated>
<author>
<name>Li zeming</name>
<email>zeming@nfschina.com</email>
</author>
<published>2023-07-04T19:43:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e1164787f22b51010cfdc6a5e41b744435836b79'/>
<id>urn:sha1:e1164787f22b51010cfdc6a5e41b744435836b79</id>
<content type='text'>
The 'correct_ret_addr' pointer is always set in the later code, no need
to initialize it at definition time.

Link: https://lore.kernel.org/all/20230704194359.3124-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>fprobe: Pass return address to the handlers</title>
<updated>2023-06-06T12:39:55Z</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2023-06-06T12:39:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cb16330d12741f6dae56aad5acf62f5be3a06c4e'/>
<id>urn:sha1:cb16330d12741f6dae56aad5acf62f5be3a06c4e</id>
<content type='text'>
Pass return address as 'ret_ip' to the fprobe entry and return handlers
so that the fprobe user handler can get the reutrn address without
analyzing arch-dependent pt_regs.

Link: https://lore.kernel.org/all/168507467664.913472.11642316698862778600.stgit@mhiramat.roam.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range</title>
<updated>2023-02-20T23:49:16Z</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-02-20T23:49:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6'/>
<id>urn:sha1:f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6</id>
<content type='text'>
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
&lt;func+9&gt; and &lt;func+11&gt; in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 &lt;+9&gt;:     push   %r12
   0xffffffff816cb5eb &lt;+11&gt;:    xor    %r12d,%r12d
   0xffffffff816cb5ee &lt;+14&gt;:    test   %rdi,%rdi
   0xffffffff816cb5f1 &lt;+17&gt;:    setne  %r12b
   0xffffffff816cb5f5 &lt;+21&gt;:    push   %rbp

1.Register the kprobe for &lt;func+11&gt;, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 &lt;+9&gt;:     push   %r12
   0xffffffff816cc07b &lt;+11&gt;:    jmp    0xffffffffa0210000
   0xffffffff816cc080 &lt;+16&gt;:    incl   0xf(%rcx)
   0xffffffff816cc083 &lt;+19&gt;:    xchg   %eax,%ebp
   0xffffffff816cc084 &lt;+20&gt;:    (bad)
   0xffffffff816cc085 &lt;+21&gt;:    push   %rbp

Now op1-&gt;flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for &lt;func+9&gt;, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1-&gt;optinsn.copied_insn,
                                      // jump address = &lt;func+14&gt;

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p-&gt;flags |= KPROBE_FLAG_DISABLED;  // op1-&gt;flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) &amp;&amp; !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) &lt; 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p-&gt;kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 &lt;+9&gt;:     int3
   0xffffffff816cc07a &lt;+10&gt;:    push   %rsp
   0xffffffff816cc07b &lt;+11&gt;:    jmp    0xffffffffa0210000
   0xffffffff816cc080 &lt;+16&gt;:    incl   0xf(%rcx)
   0xffffffff816cc083 &lt;+19&gt;:    xchg   %eax,%ebp
   0xffffffff816cc084 &lt;+20&gt;:    (bad)
   0xffffffff816cc085 &lt;+21&gt;:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p-&gt;flags &amp; KPROBE_FLAG_OPTIMIZED) {
    ...
    regs-&gt;ip = (unsigned long)op-&gt;optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee &lt;func+14&gt;

However, &lt;func+14&gt; is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
</entry>
</feed>
