<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/events/ring_buffer.c, branch v6.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-01-10T17:16:50Z</updated>
<entry>
<title>perf: map pages in advance</title>
<updated>2025-01-10T17:16:50Z</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2025-01-03T15:31:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b709eb872e19a19607bbb6d2975bc264d59735cf'/>
<id>urn:sha1:b709eb872e19a19607bbb6d2975bc264d59735cf</id>
<content type='text'>
We are adjusting struct page to make it smaller, removing unneeded fields
which correctly belong to struct folio.

Two of those fields are page-&gt;index and page-&gt;mapping. Perf is currently
making use of both of these. This is unnecessary. This patch eliminates
this.

Perf establishes its own internally controlled memory-mapped pages using
vm_ops hooks. The first page in the mapping is the read/write user control
page, and the rest of the mapping consists of read-only pages.

The VMA is backed by kernel memory either from the buddy allocator or
vmalloc depending on configuration. It is intended to be mapped read/write,
but because it has a page_mkwrite() hook, vma_wants_writenotify() indicates
that it should be mapped read-only.

When a write fault occurs, the provided page_mkwrite() hook,
perf_mmap_fault() (doing double duty handing faults as well) uses the
vmf-&gt;pgoff field to determine if this is the first page, allowing for the
desired read/write first page, read-only rest mapping.

For this to work the implementation has to carefully work around faulting
logic. When a page is write-faulted, the fault() hook is called first, then
its page_mkwrite() hook is called (to allow for dirty tracking in file
systems).

On fault we set the folio's mapping in perf_mmap_fault(), this is because
when do_page_mkwrite() is subsequently invoked, it treats a missing mapping
as an indicator that the fault should be retried.

We also set the folio's index so, given the folio is being treated as faux
user memory, it correctly references its offset within the VMA.

This explains why the mapping and index fields are used - but it's not
necessary.

We preallocate pages when perf_mmap() is called for the first time via
rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as
needed if the mapping requires it.

This allocation is done in the f_ops-&gt;mmap() hook provided in perf_mmap(),
and so we can instead simply map all the memory right away here - there's
no point in handling (read) page faults when we don't demand page nor need
to be notified about them (perf does not).

This patch therefore changes this logic to map everything when the mmap()
hook is called, establishing a PFN map. It implements vm_ops-&gt;pfn_mkwrite()
to provide the required read/write vs. read-only behaviour, which does not
require the previously implemented workarounds.

While it is not ideal to use a VM_PFNMAP here, doing anything else will
result in the page_mkwrite() hook need to be provided, which requires the
same page-&gt;mapping hack this patch seeks to undo.

It will also result in the pages being treated as folios and placed on the
rmap, which really does not make sense for these mappings.

Semantically it makes sense to establish this as some kind of special
mapping, as the pages are managed by perf and are not strictly user pages,
but currently the only means by which we can do so functionally while
maintaining the required R/W and R/O behaviour is a PFN map.

There should be no change to actual functionality as a result of this
change.

Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20250103153151.124163-1-lorenzo.stoakes@oracle.com
</content>
</entry>
<entry>
<title>perf/aux: Fix AUX buffer serialization</title>
<updated>2024-09-04T16:22:56Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2024-09-02T08:14:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ab9d830262c132ab5db2f571003d80850d56b2a'/>
<id>urn:sha1:2ab9d830262c132ab5db2f571003d80850d56b2a</id>
<content type='text'>
Ole reported that event-&gt;mmap_mutex is strictly insufficient to
serialize the AUX buffer, add a per RB mutex to fully serialize it.

Note that in the lock order comment the perf_event::mmap_mutex order
was already wrong, that is, it nesting under mmap_lock is not new with
this patch.

Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams")
Reported-by: Ole &lt;ole@binarygecko.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Make rb_alloc_aux() return an error immediately if nr_pages &lt;= 0</title>
<updated>2024-07-04T14:00:23Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2024-06-24T20:11:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0ca4da2412da05fb9dd0b5d90dcc8026219f0f29'/>
<id>urn:sha1:0ca4da2412da05fb9dd0b5d90dcc8026219f0f29</id>
<content type='text'>
rb_alloc_aux() should not be called with nr_pages &lt;= 0. Make it more robust
and readable by returning an error immediately in that case.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20240624201101.60186-8-adrian.hunter@intel.com
</content>
</entry>
<entry>
<title>perf: Fix default aux_watermark calculation</title>
<updated>2024-07-04T14:00:23Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2024-06-24T20:11:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43deb76b19663a96ec2189d8f4eb9a9dc2d7623f'/>
<id>urn:sha1:43deb76b19663a96ec2189d8f4eb9a9dc2d7623f</id>
<content type='text'>
The default aux_watermark is half the AUX area buffer size. In general,
on a 64-bit architecture, the AUX area buffer size could be a bigger than
fits in a 32-bit type, but the calculation does not allow for that
possibility.

However the aux_watermark value is recorded in a u32, so should not be
more than U32_MAX either.

Fix by doing the calculation in a correctly sized type, and limiting the
result to U32_MAX.

Fixes: d68e6799a5c8 ("perf: Cap allocation order at aux_watermark")
Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20240624201101.60186-7-adrian.hunter@intel.com
</content>
</entry>
<entry>
<title>perf/ring_buffer: Trigger IO signals for watermark_wakeup</title>
<updated>2024-04-14T20:26:32Z</updated>
<author>
<name>Kyle Huey</name>
<email>me@kylehuey.com</email>
</author>
<published>2024-04-13T14:16:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd20bb51ed3913e0d25085eb79e8c0babfb4ee28'/>
<id>urn:sha1:fd20bb51ed3913e0d25085eb79e8c0babfb4ee28</id>
<content type='text'>
perf_output_wakeup() already marks the perf event fd available for polling.
Trigger IO signals with FASYNC too.

Signed-off-by: Kyle Huey &lt;khuey@kylehuey.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20240413141618.4160-3-khuey@kylehuey.com
</content>
</entry>
<entry>
<title>mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER</title>
<updated>2024-01-08T23:27:15Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2023-12-28T14:47:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e0a760b44417f7cadd79de2204d6247109558a0'/>
<id>urn:sha1:5e0a760b44417f7cadd79de2204d6247109558a0</id>
<content type='text'>
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive.  This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>perf/core: Bail out early if the request AUX area is out of bound</title>
<updated>2023-09-12T15:46:31Z</updated>
<author>
<name>Shuai Xue</name>
<email>xueshuai@linux.alibaba.com</email>
</author>
<published>2023-09-07T00:43:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=54aee5f15b83437f23b2b2469bcf21bdd9823916'/>
<id>urn:sha1:54aee5f15b83437f23b2b2469bcf21bdd9823916</id>
<content type='text'>
When perf-record with a large AUX area, e.g 4GB, it fails with:

    #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
    failed to mmap with 12 (Cannot allocate memory)

and it reveals a WARNING with __alloc_pages():

	------------[ cut here ]------------
	WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
	Call trace:
	 __alloc_pages+0x1ec/0x248
	 __kmalloc_large_node+0xc0/0x1f8
	 __kmalloc_node+0x134/0x1e8
	 rb_alloc_aux+0xe0/0x298
	 perf_mmap+0x440/0x660
	 mmap_region+0x308/0x8a8
	 do_mmap+0x3c0/0x528
	 vm_mmap_pgoff+0xf4/0x1b8
	 ksys_mmap_pgoff+0x18c/0x218
	 __arm64_sys_mmap+0x38/0x58
	 invoke_syscall+0x50/0x128
	 el0_svc_common.constprop.0+0x58/0x188
	 do_el0_svc+0x34/0x50
	 el0_svc+0x34/0x108
	 el0t_64_sync_handler+0xb8/0xc0
	 el0t_64_sync+0x1a4/0x1a8

'rb-&gt;aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.

So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:

    #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
    failed to mmap with 12 (Cannot allocate memory)

Signed-off-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin</title>
<updated>2023-07-10T07:52:36Z</updated>
<author>
<name>Uros Bizjak</name>
<email>ubizjak@gmail.com</email>
</author>
<published>2023-07-08T09:00:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1af61adb3a23192023fec1733bd4c8500f53e546'/>
<id>urn:sha1:1af61adb3a23192023fec1733bd4c8500f53e546</id>
<content type='text'>
Use local_try_cmpxchg instead of local_cmpxchg (*ptr, old, new) == old
in __perf_output_begin.  x86 CMPXCHG instruction returns success in ZF
flag, so this change saves a compare after cmpxchg (and related move
instruction in front of cmpxchg).

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails. There is no need to re-read the value in the loop.

No functional change intended.

Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230708090048.63046-2-ubizjak@gmail.com
</content>
</entry>
<entry>
<title>mm, treewide: redefine MAX_ORDER sanely</title>
<updated>2023-04-06T02:42:46Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2023-03-15T11:31:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=23baf831a32c04f9a968812511540b1b3e648bf5'/>
<id>urn:sha1:23baf831a32c04f9a968812511540b1b3e648bf5</id>
<content type='text'>
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;	[powerpc]
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>perf/core: fix MAX_ORDER usage in rb_alloc_aux_page()</title>
<updated>2023-04-06T02:42:45Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2023-03-15T11:31:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=934487e98fdd2d7762e893af7cbe788cfd39ff84'/>
<id>urn:sha1:934487e98fdd2d7762e893af7cbe788cfd39ff84</id>
<content type='text'>
MAX_ORDER is not inclusive: the maximum allocation order buddy allocator
can deliver is MAX_ORDER-1.

Fix MAX_ORDER usage in rb_alloc_aux_page().

Link: https://lkml.kernel.org/r/20230315113133.11326-7-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
