<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/events/core.c, branch v6.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-12-15T11:33:23Z</updated>
<entry>
<title>perf: Fix perf_event_validate_size() lockdep splat</title>
<updated>2023-12-15T11:33:23Z</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2023-12-15T11:24:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e2c1e4b34f07d9aa8937fab88359d4a0fce468e'/>
<id>urn:sha1:7e2c1e4b34f07d9aa8937fab88359d4a0fce468e</id>
<content type='text'>
When lockdep is enabled, the for_each_sibling_event(sibling, event)
macro checks that event-&gt;ctx-&gt;mutex is held. When creating a new group
leader event, we call perf_event_validate_size() on a partially
initialized event where event-&gt;ctx is NULL, and so when
for_each_sibling_event() attempts to check event-&gt;ctx-&gt;mutex, we get a
splat, as reported by Lucas De Marchi:

  WARNING: CPU: 8 PID: 1471 at kernel/events/core.c:1950 __do_sys_perf_event_open+0xf37/0x1080

This only happens for a new event which is its own group_leader, and in
this case there cannot be any sibling events. Thus it's safe to skip the
check for siblings, which avoids having to make invasive and ugly
changes to for_each_sibling_event().

Avoid the splat by bailing out early when the new event is its own
group_leader.

Fixes: 382c27f4ed28f803 ("perf: Fix perf_event_validate_size()")
Closes: https://lore.kernel.org/lkml/20231214000620.3081018-1-lucas.demarchi@intel.com/
Closes: https://lore.kernel.org/lkml/ZXpm6gQ%2Fd59jGsuW@xpf.sh.intel.com/
Reported-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Reported-by: Pengfei Xu &lt;pengfei.xu@intel.com&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20231215112450.3972309-1-mark.rutland@arm.com
</content>
</entry>
<entry>
<title>perf: Fix perf_event_validate_size()</title>
<updated>2023-11-29T14:43:50Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-11-29T14:24:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=382c27f4ed28f803b1f1473ac2d8db0afc795a1b'/>
<id>urn:sha1:382c27f4ed28f803b1f1473ac2d8db0afc795a1b</id>
<content type='text'>
Budimir noted that perf_event_validate_size() only checks the size of
the newly added event, even though the sizes of all existing events
can also change due to not all events having the same read_format.

When we attach the new event, perf_group_attach(), we do re-compute
the size for all events.

Fixes: a723968c0ed3 ("perf: Fix u16 overflows")
Reported-by: Budimir Markovic &lt;markovicbudimir@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>perf/core: Fix cpuctx refcounting</title>
<updated>2023-11-15T03:18:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-06-09T10:34:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=889c58b3155ff4c8e8671c95daef63d6fabbb6b1'/>
<id>urn:sha1:889c58b3155ff4c8e8671c95daef63d6fabbb6b1</id>
<content type='text'>
Audit of the refcounting turned up that perf_pmu_migrate_context()
fails to migrate the ctx refcount.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20230612093539.085862001@infradead.org
Cc: &lt;stable@vger.kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2023-10-30T23:44:35Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-10-30T23:44:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bceb7accb7b60f9844807c7433af06493ed058b7'/>
<id>urn:sha1:bceb7accb7b60f9844807c7433af06493ed058b7</id>
<content type='text'>
Pull performance event updates from Ingo Molnar:
 - Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
 - Simplify &amp; clean up the uncore management code
 - Fall back from RDPMC to RDMSR on certain uncore PMUs
 - Improve per-package and cstate event reading
 - Extend the Intel ref-cycles event to GP counters
 - Fix Intel MTL event constraints
 - Improve the Intel hybrid CPU handling code
 - Micro-optimize the RAPL code
 - Optimize perf_cgroup_switch()
 - Improve large AUX area error handling
 - Misc fixes and cleanups

* tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
  perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
  x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
  perf: Optimize perf_cgroup_switch()
  perf/x86/amd/uncore: Add memory controller support
  perf/x86/amd/uncore: Add group exclusivity
  perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
  perf/x86/amd/uncore: Move discovery and registration
  perf/x86/amd/uncore: Refactor uncore management
  perf/core: Allow reading package events from perf_event_read_local
  perf/x86/cstate: Allow reading the package statistics from local CPU
  perf/x86/intel/pt: Fix kernel-doc comments
  perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
  perf/core: Rename perf_proc_update_handler() -&gt; perf_event_max_sample_rate_handler(), for readability
  perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
  perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
  perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
  perf/core: Bail out early if the request AUX area is out of bound
  perf/x86/intel: Extend the ref-cycles event to GP counters
  perf/x86/intel: Fix broken fixed event constraints extension
  ...
</content>
</entry>
<entry>
<title>perf/core: Fix potential NULL deref</title>
<updated>2023-10-24T10:15:12Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-10-24T09:42:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a71ef31485bb51b846e8db8b3a35e432cc15afb5'/>
<id>urn:sha1:a71ef31485bb51b846e8db8b3a35e432cc15afb5</id>
<content type='text'>
Smatch is awesome.

Fixes: 32671e3799ca ("perf: Disallow mis-matched inherited group reads")
Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Disallow mis-matched inherited group reads</title>
<updated>2023-10-19T08:09:42Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-10-18T11:56:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=32671e3799ca2e4590773fd0e63aaa4229e50c06'/>
<id>urn:sha1:32671e3799ca2e4590773fd0e63aaa4229e50c06</id>
<content type='text'>
Because group consistency is non-atomic between parent (filedesc) and children
(inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum
non-matching counter groups -- with non-sensical results.

Add group_generation to distinguish the case where a parent group removes and
adds an event and thus has the same number, but a different configuration of
events as inherited groups.

This became a problem when commit fa8c269353d5 ("perf/core: Invert
perf_read_group() loops") flipped the order of child_list and sibling_list.
Previously it would iterate the group (sibling_list) first, and for each
sibling traverse the child_list. In this order, only the group composition of
the parent is relevant. By flipping the order the group composition of the
child (inherited) events becomes an issue and the mis-match in group
composition becomes evident.

That said; even prior to this commit, while reading of a group that is not
equally inherited was not broken, it still made no sense.

(Ab)use ECHILD as error return to indicate issues with child process group
composition.

Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops")
Reported-by: Budimir Markovic &lt;markovicbudimir@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>perf: Optimize perf_cgroup_switch()</title>
<updated>2023-10-12T17:28:38Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-10-09T21:04:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f06cc667f79909e9175460b167c277b7c64d3df0'/>
<id>urn:sha1:f06cc667f79909e9175460b167c277b7c64d3df0</id>
<content type='text'>
Namhyung reported that bd2756811766 ("perf: Rewrite core context handling")
regresses context switch overhead when perf-cgroup is in use together
with 'slow' PMUs like uncore.

Specifically, perf_cgroup_switch()'s perf_ctx_disable() /
ctx_sched_out() etc.. all iterate the full list of active PMUs for
that CPU, even if they don't have cgroup events.

Previously there was cgrp_cpuctx_list which linked the relevant PMUs
together, but that got lost in the rework. Instead of re-instruducing
a similar list, let the perf_event_pmu_context iteration skip those
that do not have cgroup events. This avoids growing multiple versions
of the perf_event_pmu_context iteration.

Measured performance (on a slightly different patch):

Before)

  $ taskset -c 0 ./perf bench sched pipe -l 10000 -G AAA,BBB
  # Running 'sched/pipe' benchmark:
  # Executed 10000 pipe operations between two processes

       Total time: 0.901 [sec]

        90.128700 usecs/op
            11095 ops/sec

After)

  $ taskset -c 0 ./perf bench sched pipe -l 10000 -G AAA,BBB
  # Running 'sched/pipe' benchmark:
  # Executed 10000 pipe operations between two processes

       Total time: 0.065 [sec]

         6.560100 usecs/op
           152436 ops/sec

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Debugged-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20231009210425.GC6307@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>perf/core: Allow reading package events from perf_event_read_local</title>
<updated>2023-10-09T14:12:22Z</updated>
<author>
<name>Tero Kristo</name>
<email>tero.kristo@linux.intel.com</email>
</author>
<published>2023-09-13T12:59:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1765bb61bb18a7b81f68806de6e8b8f5000f65bf'/>
<id>urn:sha1:1765bb61bb18a7b81f68806de6e8b8f5000f65bf</id>
<content type='text'>
Per-package perf events are typically registered with a single CPU only,
however they can be read across all the CPUs within the package.
Currently perf_event_read maps the event CPU according to the topology
information to avoid an unnecessary SMP call, however
perf_event_read_local deals with hard values and rejects a read with a
failure if the CPU is not the one exactly registered. Allow similar
mapping within the perf_event_read_local if the perf event in question
can support this.

This allows users like BPF code to read the package perf events properly
across different CPUs within a package.

Signed-off-by: Tero Kristo &lt;tero.kristo@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230913125956.3652667-1-tero.kristo@linux.intel.com
</content>
</entry>
<entry>
<title>perf/core: Rename perf_proc_update_handler() -&gt; perf_event_max_sample_rate_handler(), for readability</title>
<updated>2023-10-05T08:26:50Z</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2023-07-21T09:06:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e6814ec3ba1994561db9b1c05a80227d30cc18fa'/>
<id>urn:sha1:e6814ec3ba1994561db9b1c05a80227d30cc18fa</id>
<content type='text'>
Follow the naming pattern of the other sysctl handlers in perf.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20230721090607.172002-1-xiujianfeng@huawei.com
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-08-29T21:25:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-29T21:25:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b96a3e9142fdf346b05b20e867b4f0dfca119e96'/>
<id>urn:sha1:b96a3e9142fdf346b05b20e867b4f0dfca119e96</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page-&gt;private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for -&gt;huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow -&gt;huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio-&gt;swap instead of page-&gt;private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page-&gt;private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
</content>
</entry>
</feed>
