<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/trace/trace.c, branch v6.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-03-10T16:27:47Z</updated>
<entry>
<title>tracing: Use .flush() call to wake up readers</title>
<updated>2024-03-10T16:27:47Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-03-08T20:24:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e5d7c1916562f0e856eb3d6f569629fcd535fed2'/>
<id>urn:sha1:e5d7c1916562f0e856eb3d6f569629fcd535fed2</id>
<content type='text'>
The .release() function does not get called until all readers of a file
descriptor are finished.

If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.

The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.

When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.

This is what the .flush() callback is for. Have the .flush() wake up the
readers.

Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: linke li &lt;lilinke99@qq.com&gt;
Cc: Rabin Vincent &lt;rabin@rab.in&gt;
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Limit trace_marker writes to just 4K</title>
<updated>2024-03-06T18:27:26Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-03-05T03:34:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=095fe4891282be510af0db1b03587b512c0de31d'/>
<id>urn:sha1:095fe4891282be510af0db1b03587b512c0de31d</id>
<content type='text'>
Limit the max print event of trace_marker to just 4K string size. This must
also be less than the amount that can be held by a trace_seq along with
the text that is before the output (like the task name, PID, CPU, state,
etc). As trace_seq is made to handle large events (some greater than 4K).
Make the max size of a trace_marker write event be 4K which is guaranteed
to fit in the trace_seq buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20240304223433.4ba47dff@gandalf.local.home

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Inform kmemleak of saved_cmdlines allocation</title>
<updated>2024-02-14T17:36:34Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-02-14T16:20:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2394ac4145ea91b92271e675a09af2a9ea6840b7'/>
<id>urn:sha1:2394ac4145ea91b92271e675a09af2a9ea6840b7</id>
<content type='text'>
The allocation of the struct saved_cmdlines_buffer structure changed from:

        s = kmalloc(sizeof(*s), GFP_KERNEL);
	s-&gt;saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);

to:

	orig_size = sizeof(*s) + val * TASK_COMM_LEN;
	order = get_order(orig_size);
	size = 1 &lt;&lt; (order + PAGE_SHIFT);
	page = alloc_pages(GFP_KERNEL, order);
	if (!page)
		return NULL;

	s = page_address(page);
	memset(s, 0, sizeof(*s));

	s-&gt;saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);

Where that s-&gt;saved_cmdlines allocation looks to be a dangling allocation
to kmemleak. That's because kmemleak only keeps track of kmalloc()
allocations. For allocations that use page_alloc() directly, the kmemleak
needs to be explicitly informed about it.

Add kmemleak_alloc() and kmemleak_free() around the page allocation so
that it doesn't give the following false positive:

unreferenced object 0xffff8881010c8000 (size 32760):
  comm "swapper", pid 0, jiffies 4294667296
  hex dump (first 32 bytes):
    ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
    ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
  backtrace (crc ae6ec1b9):
    [&lt;ffffffff86722405&gt;] kmemleak_alloc+0x45/0x80
    [&lt;ffffffff8414028d&gt;] __kmalloc_large_node+0x10d/0x190
    [&lt;ffffffff84146ab1&gt;] __kmalloc+0x3b1/0x4c0
    [&lt;ffffffff83ed7103&gt;] allocate_cmdlines_buffer+0x113/0x230
    [&lt;ffffffff88649c34&gt;] tracer_alloc_buffers.isra.0+0x124/0x460
    [&lt;ffffffff8864a174&gt;] early_trace_init+0x14/0xa0
    [&lt;ffffffff885dd5ae&gt;] start_kernel+0x12e/0x3c0
    [&lt;ffffffff885f5758&gt;] x86_64_start_reservations+0x18/0x30
    [&lt;ffffffff885f582b&gt;] x86_64_start_kernel+0x7b/0x80
    [&lt;ffffffff83a001c3&gt;] secondary_startup_64_no_verify+0x15e/0x16b

Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r.fsf@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240214112046.09a322d6@gandalf.local.home

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic")
Reported-by: Kalle Valo &lt;kvalo@kernel.org&gt;
Tested-by: Kalle Valo &lt;kvalo@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on()</title>
<updated>2024-02-13T17:04:17Z</updated>
<author>
<name>Sven Schnelle</name>
<email>svens@linux.ibm.com</email>
</author>
<published>2024-02-05T06:53:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a6eaa24f1cc2c7aecec6047556bdfe32042094c3'/>
<id>urn:sha1:a6eaa24f1cc2c7aecec6047556bdfe32042094c3</id>
<content type='text'>
tracer_tracing_is_on() checks whether record_disabled is not zero. This
checks both the record_disabled counter and the RB_BUFFER_OFF flag.
Reading the source it looks like this function should only check for
the RB_BUFFER_OFF flag. Therefore use ring_buffer_record_is_set_on().
This fixes spurious fails in the 'test for function traceon/off triggers'
test from the ftrace testsuite when the system is under load.

Link: https://lore.kernel.org/linux-trace-kernel/20240205065340.2848065-1-svens@linux.ibm.com

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Tested-By: Mete Durlu &lt;meted@linux.ibm.com&gt;
Signed-off-by: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Fix wasted memory in saved_cmdlines logic</title>
<updated>2024-02-09T11:43:21Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-02-09T11:36:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=44dc5c41b5b1267d4dd037d26afc0c4d3a568acb'/>
<id>urn:sha1:44dc5c41b5b1267d4dd037d26afc0c4d3a568acb</id>
<content type='text'>
While looking at improving the saved_cmdlines cache I found a huge amount
of wasted memory that should be used for the cmdlines.

The tracing data saves pids during the trace. At sched switch, if a trace
occurred, it will save the comm of the task that did the trace. This is
saved in a "cache" that maps pids to comms and exposed to user space via
the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
default 128 comms.

The structure that uses this creates an array to store the pids using
PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
to be of the size of 131104 bytes on 64 bit machines.

In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
this structure. That leaves 131040 bytes of wasted space.

Worse, the structure points to an allocated array to store the comm names,
which is 16 bytes times the amount of names to save (currently 128), which
is 2048 bytes. Instead of allocating a separate array, make the structure
end with a variable length string and use the extra space for that.

This is similar to a recommendation that Linus had made about eventfs_inode names:

  https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/

Instead of allocating a separate string array to hold the saved comms,
have the structure end with: char saved_cmdlines[]; and round up to the
next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
It will use this extra space for the saved_cmdline portion.

Now, instead of saving only 128 comms by default, by using this wasted
space at the end of the structure it can save over 8000 comms and even
saves space by removing the need for allocating the other array.

Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Mete Durlu &lt;meted@linux.ibm.com&gt;
Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2024-01-18T22:35:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-18T22:35:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a2ded784cd7fd83e567829637068cd86aeffbeaf'/>
<id>urn:sha1:a2ded784cd7fd83e567829637068cd86aeffbeaf</id>
<content type='text'>
Pull tracing updates from Steven Rostedt:

 - Allow kernel trace instance creation to specify what events are
   created

   Inside the kernel, a subsystem may create a tracing instance that it
   can use to send events to user space. This sub-system may not care
   about the thousands of events that exist in eventfs. Allow the
   sub-system to specify what sub-systems of events it cares about, and
   only those events are exposed to this instance.

 - Allow the ring buffer to be broken up into bigger sub-buffers than
   just the architecture page size.

   A new tracefs file called "buffer_subbuf_size_kb" is created. The
   user can now specify a minimum size the sub-buffer may be in
   kilobytes. Note, that the implementation currently make the
   sub-buffer size a power of 2 pages (1, 2, 4, 8, 16, ...) but the user
   only writes in kilobyte size, and the sub-buffer will be updated to
   the next size that it will can accommodate it. If the user writes in
   10, it will change the size to be 4 pages on x86 (16K), as that is
   the next available size that can hold 10K pages.

 - Update the debug output when a corrupt time is detected in the ring
   buffer. If the ring buffer detects inconsistent timestamps, there's a
   debug config options that will dump the contents of the meta data of
   the sub-buffer that is used for debugging. Add some more information
   to this dump that helps with debugging.

 - Add more timestamp debugging checks (only triggers when the config is
   enabled)

 - Increase the trace_seq iterator to 2 page sizes.

 - Allow strings written into tracefs_marker to be larger. Up to just
   under 2 page sizes (based on what trace_seq can hold).

 - Increase the trace_maker_raw write to be as big as a sub-buffer can
   hold.

 - Remove 32 bit time stamp logic, now that the rb_time_cmpxchg() has
   been removed.

 - More selftests were added.

 - Some code clean ups as well.

* tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
  ring-buffer: Remove stale comment from ring_buffer_size()
  tracing histograms: Simplify parse_actions() function
  tracing/selftests: Remove exec permissions from trace_marker.tc test
  ring-buffer: Use subbuf_order for buffer page masking
  tracing: Update subbuffer with kilobytes not page order
  ringbuffer/selftest: Add basic selftest to test changing subbuf order
  ring-buffer: Add documentation on the buffer_subbuf_order file
  ring-buffer: Just update the subbuffers when changing their allocation order
  ring-buffer: Keep the same size when updating the order
  tracing: Stop the tracing while changing the ring buffer subbuf size
  tracing: Update snapshot order along with main buffer order
  ring-buffer: Make sure the spare sub buffer used for reads has same size
  ring-buffer: Do no swap cpu buffers if order is different
  ring-buffer: Clear pages on error in ring_buffer_subbuf_order_set() failure
  ring-buffer: Read and write to ring buffers with custom sub buffer size
  ring-buffer: Set new size of the ring buffer sub page
  ring-buffer: Add interface for configuring trace sub buffer size
  ring-buffer: Page size per ring buffer
  ring-buffer: Have ring_buffer_print_page_header() be able to access ring_buffer_iter
  ring-buffer: Check if absolute timestamp goes backwards
  ...
</content>
</entry>
<entry>
<title>tracing: Fix blocked reader of snapshot buffer</title>
<updated>2023-12-29T14:18:49Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-12-28T14:51:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=39a7dc23a1ed0fe81141792a09449d124c5953bd'/>
<id>urn:sha1:39a7dc23a1ed0fe81141792a09449d124c5953bd</id>
<content type='text'>
If an application blocks on the snapshot or snapshot_raw files, expecting
to be woken up when a snapshot occurs, it will not happen. Or it may
happen with an unexpected result.

That result is that the application will be reading the main buffer
instead of the snapshot buffer. That is because when the snapshot occurs,
the main and snapshot buffers are swapped. But the reader has a descriptor
still pointing to the buffer that it originally connected to.

This is fine for the main buffer readers, as they may be blocked waiting
for a watermark to be hit, and when a snapshot occurs, the data that the
main readers want is now on the snapshot buffer.

But for waiters of the snapshot buffer, they are waiting for an event to
occur that will trigger the snapshot and they can then consume it quickly
to save the snapshot before the next snapshot occurs. But to do this, they
need to read the new snapshot buffer, not the old one that is now
receiving new data.

Also, it does not make sense to have a watermark "buffer_percent" on the
snapshot buffer, as the snapshot buffer is static and does not receive new
data except all at once.

Link: https://lore.kernel.org/linux-trace-kernel/20231228095149.77f5b45d@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Fixes: debdd57f5145f ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Update subbuffer with kilobytes not page order</title>
<updated>2023-12-21T16:04:15Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-12-19T18:54:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2f84b39f48476615186af5b3220ad5a2c756679b'/>
<id>urn:sha1:2f84b39f48476615186af5b3220ad5a2c756679b</id>
<content type='text'>
Using page order for deciding what the size of the ring buffer sub buffers
are is exposing a bit too much of the implementation. Although the sub
buffers are only allocated in orders of pages, allow the user to specify
the minimum size of each sub-buffer via kilobytes like they can with the
buffer size itself.

If the user specifies 3 via:

  echo 3 &gt; buffer_subbuf_size_kb

Then the sub-buffer size will round up to 4kb (on a 4kb page size system).

If they specify:

  echo 6 &gt; buffer_subbuf_size_kb

The sub-buffer size will become 8kb.

and so on.

Link: https://lore.kernel.org/linux-trace-kernel/20231219185631.809766769@goodmis.org

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Tzvetomir Stoyanov &lt;tz.stoyanov@gmail.com&gt;
Cc: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Stop the tracing while changing the ring buffer subbuf size</title>
<updated>2023-12-21T16:00:56Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-12-19T18:54:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa4b54af5ba17a59ce082e9d710eb638bdce7637'/>
<id>urn:sha1:fa4b54af5ba17a59ce082e9d710eb638bdce7637</id>
<content type='text'>
Because the main buffer and the snapshot buffer need to be the same for
some tracers, otherwise it will fail and disable all tracing, the tracers
need to be stopped while updating the sub buffer sizes so that the tracers
see the main and snapshot buffers with the same sub buffer size.

Link: https://lore.kernel.org/linux-trace-kernel/20231219185630.353222794@goodmis.org

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Tzvetomir Stoyanov &lt;tz.stoyanov@gmail.com&gt;
Cc: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Fixes: 2808e31ec12e ("ring-buffer: Add interface for configuring trace sub buffer size")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Update snapshot order along with main buffer order</title>
<updated>2023-12-21T15:55:57Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-12-19T18:54:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa067682adf19ca78f81cd57539282dbfd7cce21'/>
<id>urn:sha1:aa067682adf19ca78f81cd57539282dbfd7cce21</id>
<content type='text'>
When updating the order of the sub buffers for the main buffer, make sure
that if the snapshot buffer exists, that it gets its order updated as
well.

Link: https://lore.kernel.org/linux-trace-kernel/20231219185630.054668186@goodmis.org

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Tzvetomir Stoyanov &lt;tz.stoyanov@gmail.com&gt;
Cc: Vincent Donnefort &lt;vdonnefort@google.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
</feed>
