<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/trace/ring_buffer.c, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-04-03T15:51:38Z</updated>
<entry>
<title>ring-buffer: Fix race while reader and writer are on the same page</title>
<updated>2023-04-03T15:51:38Z</updated>
<author>
<name>Zheng Yejian</name>
<email>zhengyejian1@huawei.com</email>
</author>
<published>2023-03-25T02:12:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6455b6163d8c680366663cdb8c679514d55fc30c'/>
<id>urn:sha1:6455b6163d8c680366663cdb8c679514d55fc30c</id>
<content type='text'>
When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer-&gt;reader_page-&gt;read &gt; rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.

  Call trace:
   rb_get_reader_page+0x248/0x1300
   rb_buffer_peek+0x34/0x160
   ring_buffer_peek+0xbc/0x224
   peek_next_entry+0x98/0xbc
   __find_next_entry+0xc4/0x1c0
   trace_find_next_entry_inc+0x30/0x94
   tracing_read_pipe+0x198/0x304
   vfs_read+0xb4/0x1e0
   ksys_read+0x74/0x100
   __arm64_sys_read+0x24/0x30
   el0_svc_common.constprop.0+0x7c/0x1bc
   do_el0_svc+0x2c/0x94
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page-&gt;read is obviously abnormal:
  tail_page == commit_page == reader_page == {
    .write = 0x100d20,
    .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
    .entries = 0x10004c,
    .real_end = 0x0,
    .page = {
      .time_stamp = 0x857257416af0,
      .commit = 0xd20,  // This page hasn't been full filled.
      // .data[0...0xd20] seems normal.
    }
 }

The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.

To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed0c46 ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.

Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 77ae365eca89 ("ring-buffer: make lockless")
Suggested-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2023-03-19T17:46:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-03-19T17:46:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eaba52d63bfcf0047ce3a1bb011b35d4f066df8e'/>
<id>urn:sha1:eaba52d63bfcf0047ce3a1bb011b35d4f066df8e</id>
<content type='text'>
Pull tracing fixes from Steven Rostedt:

 - Fix setting affinity of hwlat threads in containers

   Using sched_set_affinity() has unwanted side effects when being
   called within a container. Use set_cpus_allowed_ptr() instead

 - Fix per cpu thread management of the hwlat tracer:
    - Do not start per_cpu threads if one is already running for the CPU
    - When starting per_cpu threads, do not clear the kthread variable
      as it may already be set to running per cpu threads

 - Fix return value for test_gen_kprobe_cmd()

   On error the return value was overwritten by being set to the result
   of the call from kprobe_event_delete(), which would likely succeed,
   and thus have the function return success

 - Fix splice() reads from the trace file that was broken by commit
   36e2c7421f02 ("fs: don't allow splice read/write without explicit
   ops")

 - Remove obsolete and confusing comment in ring_buffer.c

   The original design of the ring buffer used struct page flags for
   tricks to optimize, which was shortly removed due to them being
   tricks. But a comment for those tricks remained

 - Set local functions and variables to static

* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
  ring-buffer: remove obsolete comment for free_buffer_page()
  tracing: Make splice_read available again
  ftrace: Set direct_ops storage-class-specifier to static
  trace/hwlat: Do not start per-cpu thread if it is already running
  trace/hwlat: Do not wipe the contents of per-cpu thread data
  tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
  tracing: Fix wrong return in kprobe_event_gen_test.c
</content>
</entry>
<entry>
<title>ring-buffer: remove obsolete comment for free_buffer_page()</title>
<updated>2023-03-19T17:23:22Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2023-03-15T14:24:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a98151ad53b53f010ee364ec2fd06445b328578b'/>
<id>urn:sha1:a98151ad53b53f010ee364ec2fd06445b328578b</id>
<content type='text'>
The comment refers to mm/slob.c which is being removed. It comes from
commit ed56829cb319 ("ring_buffer: reset buffer page when freeing") and
according to Steven the borrowed code was a page mapcount and mapping
reset, which was later removed by commit e4c2ce82ca27 ("ring_buffer:
allocate buffer page pointer"). Thus the comment is not accurate anyway,
remove it.

Link: https://lore.kernel.org/linux-trace-kernel/20230315142446.27040-1-vbabka@suse.cz

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Reported-by: Mike Rapoport &lt;mike.rapoport@gmail.com&gt;
Suggested-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Fixes: e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer")
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2023-02-23T18:20:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-23T18:20:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b72b5fecc1b8a2e595bd03d7d257c88ea3f9fd45'/>
<id>urn:sha1:b72b5fecc1b8a2e595bd03d7d257c88ea3f9fd45</id>
<content type='text'>
Pull tracing updates from Steven Rostedt:

 - Add function names as a way to filter function addresses

 - Add sample module to test ftrace ops and dynamic trampolines

 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a
   task is scheduled out and recorded when it gets scheduled back in.

 - Add trace event helper __get_buf() to use as a temporary buffer when
   printing out trace event output.

 - Add kernel command line to create trace instances on boot up.

 - Add enabling of events to instances created at boot up.

 - Add trace_array_puts() to write into instances.

 - Allow boot instances to take a snapshot at the end of boot up.

 - Allow live patch modules to include trace events

 - Minor fixes and clean ups

* tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  tracing: Remove unnecessary NULL assignment
  tracepoint: Allow livepatch module add trace event
  tracing: Always use canonical ftrace path
  tracing/histogram: Fix stacktrace histogram Documententation
  tracing/histogram: Fix stacktrace key
  tracing/histogram: Fix a few problems with stacktrace variable printing
  tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
  tracing/histogram: Don't use strlen to find length of stacktrace variables
  tracing: Allow boot instances to have snapshot buffers
  tracing: Add trace_array_puts() to write into instance
  tracing: Add enabling of events to boot instances
  tracing: Add creation of instances at boot command line
  tracing: Fix trace_event_raw_event_synth() if else statement
  samples: ftrace: Make some global variables static
  ftrace: sample: avoid open-coded 64-bit division
  samples: ftrace: Include the nospec-branch.h only for x86
  tracing: Acquire buffer from temparary trace sequence
  tracing/histogram: Wrap remaining shell snippets in code blocks
  tracing/osnoise: No need for schedule_hrtimeout range
  bpf/tracing: Use stage6 of tracing to not duplicate macros
  ...
</content>
</entry>
<entry>
<title>tracing: Always use canonical ftrace path</title>
<updated>2023-02-18T19:34:09Z</updated>
<author>
<name>Ross Zwisler</name>
<email>zwisler@chromium.org</email>
</author>
<published>2023-02-15T22:33:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2455f0e124d317dd08d337a7550a78a224d4ba41'/>
<id>urn:sha1:2455f0e124d317dd08d337a7550a78a224d4ba41</id>
<content type='text'>
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many comments and Kconfig help messages in the tracing code still refer
to this older debugfs path, so let's update them to avoid confusion.

Link: https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-2-zwisler@google.com

Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Signed-off-by: Ross Zwisler &lt;zwisler@google.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Handle race between rb_move_tail and rb_check_pages</title>
<updated>2023-02-15T16:59:16Z</updated>
<author>
<name>Mukesh Ojha</name>
<email>quic_mojha@quicinc.com</email>
</author>
<published>2023-02-14T12:06:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8843e06f67b14f71c044bf6267b2387784c7e198'/>
<id>urn:sha1:8843e06f67b14f71c044bf6267b2387784c7e198</id>
<content type='text'>
It seems a data race between ring_buffer writing and integrity check.
That is, RB_FLAG of head_page is been updating, while at same time
RB_FLAG was cleared when doing integrity check rb_check_pages():

  rb_check_pages()            rb_handle_head_page():
  --------                    --------
  rb_head_page_deactivate()
                              rb_head_page_set_normal()
  rb_head_page_activate()

We do intergrity test of the list to check if the list is corrupted and
it is still worth doing it. So, let's refactor rb_check_pages() such that
we no longer clear and set flag during the list sanity checking.

[1] and [2] are the test to reproduce and the crash report respectively.

1:
``` read_trace.sh
  while true;
  do
    # the "trace" file is closed after read
    head -1 /sys/kernel/tracing/trace &gt; /dev/null
  done
```
``` repro.sh
  sysctl -w kernel.panic_on_warn=1
  # function tracer will writing enough data into ring_buffer
  echo function &gt; /sys/kernel/tracing/current_tracer
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
  ./read_trace.sh &amp;
```

2:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653
rb_move_tail+0x450/0x470
Modules linked in:
CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G        W          6.2.0-rc6+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:rb_move_tail+0x450/0x470
Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24
83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 &lt;0f&gt; 0b 83
f8 02 0f 84 ce fb ff ff e9 db
RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203
RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18
RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400
RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2
R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000
R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108
FS:  0000000000000000(0000) GS:ffff9db3bdcc0000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0
Call Trace:
 &lt;TASK&gt;
 ring_buffer_lock_reserve+0x136/0x360
 ? __do_softirq+0x287/0x2df
 ? __pfx_rcu_softirq_qs+0x10/0x10
 trace_function+0x21/0x110
 ? __pfx_rcu_softirq_qs+0x10/0x10
 ? __do_softirq+0x287/0x2df
 function_trace_call+0xf6/0x120
 0xffffffffc038f097
 ? rcu_softirq_qs+0x5/0x140
 rcu_softirq_qs+0x5/0x140
 __do_softirq+0x287/0x2df
 run_ksoftirqd+0x2a/0x30
 smpboot_thread_fn+0x188/0x220
 ? __pfx_smpboot_thread_fn+0x10/0x10
 kthread+0xe7/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 &lt;/TASK&gt;
---[ end trace 0000000000000000 ]---

[ crash report and test reproducer credit goes to Zheng Yejian]

Link: https://lore.kernel.org/linux-trace-kernel/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com

Cc: &lt;mhiramat@kernel.org&gt;
Cc: stable@vger.kernel.org
Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
Reported-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Add NULL checks for buffer in ring_buffer_free_read_page()</title>
<updated>2023-01-25T15:31:24Z</updated>
<author>
<name>Jia-Ju Bai</name>
<email>baijiaju1990@gmail.com</email>
</author>
<published>2023-01-13T12:55:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3e4272b9954094907f16861199728f14002fcaf6'/>
<id>urn:sha1:3e4272b9954094907f16861199728f14002fcaf6</id>
<content type='text'>
In a previous commit 7433632c9ff6, buffer, buffer-&gt;buffers and
buffer-&gt;buffers[cpu] in ring_buffer_wake_waiters() can be NULL,
and thus the related checks are added.

However, in the same call stack, these variables are also used in
ring_buffer_free_read_page():

tracing_buffers_release()
  ring_buffer_wake_waiters(iter-&gt;array_buffer-&gt;buffer)
    cpu_buffer = buffer-&gt;buffers[cpu] -&gt; Add checks by previous commit
  ring_buffer_free_read_page(iter-&gt;array_buffer-&gt;buffer)
    cpu_buffer = buffer-&gt;buffers[cpu] -&gt; No check

Thus, to avod possible null-pointer derefernces, the related checks
should be added.

These results are reported by a static tool designed by myself.

Link: https://lkml.kernel.org/r/20230113125501.760324-1-baijiaju1990@gmail.com

Reported-by: TOTE Robot &lt;oslab@tsinghua.edu.cn&gt;
Signed-off-by: Jia-Ju Bai &lt;baijiaju1990@gmail.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Handle resize in early boot up</title>
<updated>2022-12-10T18:36:04Z</updated>
<author>
<name>Steven Rostedt</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2022-12-09T15:11:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=88ca6a71dcab4a4ba6e6e2ff66415a5c4f86e874'/>
<id>urn:sha1:88ca6a71dcab4a4ba6e6e2ff66415a5c4f86e874</id>
<content type='text'>
With the new command line option that allows trace event triggers to be
added at boot, the "snapshot" trigger will allocate the snapshot buffer
very early, when interrupts can not be enabled. Allocating the ring buffer
is not the problem, but it also resizes it, which is, as the resize code
does synchronization that can not be preformed at early boot.

To handle this, first change the raw_spin_lock_irq() in rb_insert_pages()
to raw_spin_lock_irqsave(), such that the unlocking of that spin lock will
not enable interrupts.

Next, where it calls schedule_work_on(), disable migration and check if
the CPU to update is the current CPU, and if so, perform the work
directly, otherwise re-enable migration and call the schedule_work_on() to
the CPU that is being updated. The rb_insert_pages() just needs to be run
on the CPU that it is updating, and does not need preemption nor
interrupts disabled when calling it.

Link: https://lore.kernel.org/lkml/Y5J%2FCajlNh1gexvo@google.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221209101151.1fec1167@gandalf.local.home
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Fixes: a01fdc897fa5 ("tracing: Add trace_trigger kernel command line option")
Reported-by: Ross Zwisler &lt;zwisler@google.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Tested-by: Ross Zwisler &lt;zwisler@google.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring_buffer: Remove unused "event" parameter</title>
<updated>2022-11-24T00:08:30Z</updated>
<author>
<name>Song Chen</name>
<email>chensong_2000@189.cn</email>
</author>
<published>2022-10-20T14:06:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04aabc32fb677f91d676fd306bca1043805e78d5'/>
<id>urn:sha1:04aabc32fb677f91d676fd306bca1043805e78d5</id>
<content type='text'>
After commit a389d86f7fd0 ("ring-buffer: Have nested events still record
running time stamp"), the "event" parameter is no longer used in either
ring_buffer_unlock_commit() or rb_commit(). Best to remove it.

Link: https://lkml.kernel.org/r/1666274811-24138-1-git-send-email-chensong_2000@189.cn

Signed-off-by: Song Chen &lt;chensong_2000@189.cn&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring_buffer: Do not deactivate non-existant pages</title>
<updated>2022-11-17T22:10:40Z</updated>
<author>
<name>Daniil Tatianin</name>
<email>d-tatianin@yandex-team.ru</email>
</author>
<published>2022-11-14T14:31:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=56f4ca0a79a9f1af98f26c54b9b89ba1f9bcc6bd'/>
<id>urn:sha1:56f4ca0a79a9f1af98f26c54b9b89ba1f9bcc6bd</id>
<content type='text'>
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of
-&gt;pages, so verify that the list is actually present before calling it.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru

Cc: stable@vger.kernel.org
Fixes: 77ae365eca895 ("ring-buffer: make lockless")
Signed-off-by: Daniil Tatianin &lt;d-tatianin@yandex-team.ru&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
</feed>
