<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/perf/Makefile, branch master</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=master</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-09-22T12:05:11Z</updated>
<entry>
<title>perf: Fujitsu: Add the Uncore PMU driver</title>
<updated>2025-09-22T12:05:11Z</updated>
<author>
<name>Koichi Okuno</name>
<email>fj2767dz@fujitsu.com</email>
</author>
<published>2025-09-09T03:02:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bad11557eed2592c017a06752e58df49080b4a6a'/>
<id>urn:sha1:bad11557eed2592c017a06752e58df49080b4a6a</id>
<content type='text'>
This adds a new dynamic PMU to the Perf Events framework to program and
control the Uncore PMUs in Fujitsu chips.

This driver exports formatting and event information to sysfs so it can
be used by the perf user space tools with the syntaxes:

perf stat -e pci_iod0_pci0/ea-pci/ ls
perf stat -e pci_iod0_pci0/event=0x80/ ls
perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls
perf stat -e mac_iod0_mac0_ch0/event=0x80/ ls

FUJITSU-MONAKA PMU Events Specification v1.1 URL:
https://github.com/fujitsu/FUJITSU-MONAKA

Reviewed-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Signed-off-by: Koichi Okuno &lt;fj2767dz@fujitsu.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)</title>
<updated>2025-07-08T16:58:49Z</updated>
<author>
<name>Rob Herring (Arm)</name>
<email>robh@kernel.org</email>
</author>
<published>2025-06-11T18:01:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=58074a0fce66c6c97b35ce8a28ed4e7b780f9a8f'/>
<id>urn:sha1:58074a0fce66c6c97b35ce8a28ed4e7b780f9a8f</id>
<content type='text'>
The ARMv9.2 architecture introduces the optional Branch Record Buffer
Extension (BRBE), which records information about branches as they are
executed into set of branch record registers. BRBE is similar to x86's
Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
(BHRB).

BRBE supports filtering by exception level and can filter just the
source or target address if excluded to avoid leaking privileged
addresses. The h/w filter would be sufficient except when there are
multiple events with disjoint filtering requirements. In this case, BRBE
is configured with a union of all the events' desired branches, and then
the recorded branches are filtered based on each event's filter. For
example, with one event capturing kernel events and another event
capturing user events, BRBE will be configured to capture both kernel
and user branches. When handling event overflow, the branch records have
to be filtered by software to only include kernel or user branch
addresses for that event. In contrast, x86 simply configures LBR using
the last installed event which seems broken.

It is possible on x86 to configure branch filter such that no branches
are ever recorded (e.g. -j save_type). For BRBE, events with a
configuration that will result in no samples are rejected.

Recording branches in KVM guests is not supported like x86. However,
perf on x86 allows requesting branch recording in guests. The guest
events are recorded, but the resulting branches are all from the host.
For BRBE, events with branch recording and "exclude_host" set are
rejected. Requiring "exclude_guest" to be set did not work. The default
for the perf tool does set "exclude_guest" if no exception level
options are specified. However, specifying kernel or user events
defaults to including both host and guest. In this case, only host
branches are recorded.

BRBE can support some additional exception branch types compared to
x86. On x86, all exceptions other than syscalls are recorded as IRQ.
With BRBE, it is possible to better categorize these exceptions. One
limitation relative to x86 is we cannot distinguish a syscall return
from other exception returns. So all exception returns are recorded as
ERET type. The FIQ branch type is omitted as the only FIQ user is Apple
platforms which don't support BRBE. The debug branch types are omitted
as there is no clear need for them.

BRBE records are invalidated whenever events are reconfigured, a new
task is scheduled in, or after recording is paused (and the records
have been recorded for the event). The architecture allows branch
records to be invalidated by the PE under implementation defined
conditions. It is expected that these conditions are rare.

Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Co-developed-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Co-developed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Tested-by: James Clark &lt;james.clark@linaro.org&gt;
Signed-off-by: Rob Herring (Arm) &lt;robh@kernel.org&gt;
tested-by: Adam Young &lt;admiyo@os.amperecomputing.com&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-4-e7775563036e@kernel.org
[will: Fix sparse warnings about mixed declarations and code.
       Fix C99 comment syntax.]
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/marvell: Marvell PEM performance monitor support</title>
<updated>2024-10-28T17:35:35Z</updated>
<author>
<name>Gowthami Thiagarajan</name>
<email>gthiagarajan@marvell.com</email>
</author>
<published>2024-10-28T05:53:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e1dce56443a4a18978fe39ee4af663e5b6b31422'/>
<id>urn:sha1:e1dce56443a4a18978fe39ee4af663e5b6b31422</id>
<content type='text'>
PCI Express Interface PMU includes various performance counters
to monitor the data that is transmitted over the PCIe link. The
counters track various inbound and outbound transactions which
includes separate counters for posted/non-posted/completion TLPs.
Also, inbound and outbound memory read requests along with their
latencies can also be monitored. Address Translation Services(ATS)events
such as ATS Translation, ATS Page Request, ATS Invalidation along with
their corresponding latencies are also supported.

The performance counters are 64 bits wide.

For instance,
perf stat -e ib_tlp_pr &lt;workload&gt;
tracks the inbound posted TLPs for the workload.

Co-developed-by: Linu Cherian &lt;lcherian@marvell.com&gt;
Signed-off-by: Linu Cherian &lt;lcherian@marvell.com&gt;
Signed-off-by: Gowthami Thiagarajan &lt;gthiagarajan@marvell.com&gt;
Link: https://lore.kernel.org/r/20241028055309.17893-1-gthiagarajan@marvell.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Add driver for Arm NI-700 interconnect PMU</title>
<updated>2024-09-06T11:58:28Z</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2024-09-04T17:34:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4d5a7680f2b4d0c2955e1d9f9a594b050d637436'/>
<id>urn:sha1:4d5a7680f2b4d0c2955e1d9f9a594b050d637436</id>
<content type='text'>
The Arm NI-700 Network-on-Chip Interconnect has a relatively
straightforward design with a hierarchy of voltage, power, and clock
domains, where each clock domain then contains a number of interface
units and a PMU which can monitor events thereon. As such, it begets a
relatively straightforward driver to interface those PMUs with perf.

Even more so than with arm-cmn, users will require detailed knowledge of
the wider system topology in order to meaningfully analyse anything,
since the interconnect itself cannot know what lies beyond the boundary
of each inscrutably-numbered interface. Given that, for now they are
also expected to refer to the NI-700 documentation for the relevant
event IDs to provide as well. An identifier is implemented so we can
come back and add jevents if anyone really wants to.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/9933058d0ab8138c78a61cd6852ea5d5ff48e393.1725470837.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/arm: Move 32-bit PMU drivers to drivers/perf/</title>
<updated>2024-07-03T13:07:14Z</updated>
<author>
<name>Rob Herring (Arm)</name>
<email>robh@kernel.org</email>
</author>
<published>2024-06-26T22:32:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d75537bebfa14fcd493b1f840b07a8cff5a640d'/>
<id>urn:sha1:8d75537bebfa14fcd493b1f840b07a8cff5a640d</id>
<content type='text'>
It is preferred to put drivers under drivers/ rather than under arch/.
The PMU drivers also depend on arm_pmu.c, so it's better to place them
all together.

Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Rob Herring (Arm) &lt;robh@kernel.org&gt;
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-3-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: starfive: Add StarLink PMU support</title>
<updated>2024-03-04T14:19:48Z</updated>
<author>
<name>Ji Sheng Teoh</name>
<email>jisheng.teoh@starfivetech.com</email>
</author>
<published>2024-02-29T07:27:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2b24812f7bc5fbd6f2f92af070856fbe4c37b40'/>
<id>urn:sha1:c2b24812f7bc5fbd6f2f92af070856fbe4c37b40</id>
<content type='text'>
This patch adds support for StarFive's StarLink PMU (Performance
Monitor Unit). StarLink PMU integrates one or more CPU cores with
a shared L3 memory system. The PMU supports overflow interrupt,
up to 16 programmable 64bit event counters, and an independent
64bit cycle counter. StarLink PMU is accessed via MMIO.

Example Perf stat output:
[root@user]# perf stat -a -e /starfive_starlink_pmu/cycles/ \
	-e /starfive_starlink_pmu/read_miss/ \
	-e /starfive_starlink_pmu/read_hit/ \
	-e /starfive_starlink_pmu/release_request/  \
	-e /starfive_starlink_pmu/write_hit/ \
	-e /starfive_starlink_pmu/write_miss/ \
	-e /starfive_starlink_pmu/write_request/ \
	-e /starfive_starlink_pmu/writeback/ \
	-e /starfive_starlink_pmu/read_request/ \
	-- openssl speed rsa2048
Doing 2048 bits private rsa's for 10s: 5 2048 bits private RSA's in
2.84s
Doing 2048 bits public rsa's for 10s: 169 2048 bits public RSA's in
2.42s
version: 3.0.11
built on: Tue Sep 19 13:02:31 2023 UTC
options: bn(64,64)
CPUINFO: N/A
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.568000s 0.014320s      1.8     69.8
/////////
 Performance counter stats for 'system wide':

         649991998      starfive_starlink_pmu/cycles/
           1009690      starfive_starlink_pmu/read_miss/
           1079750      starfive_starlink_pmu/read_hit/
           2089405      starfive_starlink_pmu/release_request/
               129      starfive_starlink_pmu/write_hit/
                70      starfive_starlink_pmu/write_miss/
               194      starfive_starlink_pmu/write_request/
            150080      starfive_starlink_pmu/writeback/
           2089423      starfive_starlink_pmu/read_request/

      27.062755678 seconds time elapsed

Signed-off-by: Ji Sheng Teoh &lt;jisheng.teoh@starfivetech.com&gt;
Link: https://lore.kernel.org/r/20240229072720.3987876-2-jisheng.teoh@starfivetech.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>drivers/perf: add DesignWare PCIe PMU driver</title>
<updated>2023-12-13T15:35:28Z</updated>
<author>
<name>Shuai Xue</name>
<email>xueshuai@linux.alibaba.com</email>
</author>
<published>2023-12-08T02:56:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af9597adc2f1e3609c67c9792a2469bb64e43ae9'/>
<id>urn:sha1:af9597adc2f1e3609c67c9792a2469bb64e43ae9</id>
<content type='text'>
This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
Core controller IP which provides statistics feature. The PMU is a PCIe
configuration space register block provided by each PCIe Root Port in a
Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error
injection, and Statistics).

To facilitate collection of statistics the controller provides the
following two features for each Root Port:

- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
  time spent in each low-power LTSSM state) and
- one 32-bit counter for Event Counting (error and non-error events for
  a specified lane)

Note: There is no interrupt for counter overflow.

This driver adds PMU devices for each PCIe Root Port. And the PMU device is
named based the BDF of Root Port. For example,

    30:03.0 PCI bridge: Device 1ded:8000 (rev 01)

the PMU device name for this Root Port is dwc_rootport_3018.

Example usage of counting PCIe RX TLP data payload (Units of bytes)::

    $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/

average RX bandwidth can be calculated like this:

    PCIe TX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window

Signed-off-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Reviewed-and-tested-by: Ilkka Koskinen &lt;ilkka@os.amperecomputing.com&gt;
Link: https://lore.kernel.org/r/20231208025652.87192-5-xueshuai@linux.alibaba.com
[will: Fix sparse error due to use of uninitialised 'vsec' symbol in
 dwc_pcie_match_des_cap()]
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl</title>
<updated>2023-07-01T15:58:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-07-01T15:58:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d25f002575146d67b5ebea541e6db3696c957c25'/>
<id>urn:sha1:d25f002575146d67b5ebea541e6db3696c957c25</id>
<content type='text'>
Pull CXL updates from Dan Williams:
 "The highlights in terms of new functionality are support for the
  standard CXL Performance Monitor definition that appeared in CXL 3.0,
  support for device sanitization (wiping all data from a device),
  secure-erase (re-keying encryption of user data), and support for
  firmware update. The firmware update support is notable as it reuses
  the simple sysfs_upload interface to just cat(1) a blob to a sysfs
  file and pipe that to the device.

  Additionally there are a substantial number of cleanups and
  reorganizations to get ready for RCH error handling (RCH == Restricted
  CXL Host == current shipping hardware generation / pre CXL-2.0
  topologies) and type-2 (accelerator / vendor specific) devices.

  For vendor specific devices they implement a subset of what the
  generic type-3 (generic memory expander) driver expects. As a result
  the rework decouples optional infrastructure from the core driver
  context.

  For RCH topologies, where the specification working group did not want
  to confuse pre-CXL-aware operating systems, many of the standard
  registers are hidden which makes support standard bus features like
  AER (PCIe Advanced Error Reporting) difficult. The rework arranges for
  the driver to help the PCI-AER core. Bjorn is on board with this
  direction but a late regression disocvery means the completion of this
  functionality needs to cook a bit longer, so it is code
  reorganizations only for now.

  Summary:

   - Add infrastructure for supporting background commands along with
     support for device sanitization and firmware update

   - Introduce a CXL performance monitoring unit driver based on the
     common definition in the specification.

   - Land some preparatory cleanup and refactoring for the anticipated
     arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1
     topology) error handling.

   - Rework CPU cache management with respect to region configuration
     (device hotplug or other dynamic changes to memory interleaving)

   - Fix region reconfiguration vs CXL decoder ordering rules"

* tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (51 commits)
  cxl: Fix one kernel-doc comment
  cxl/pci: Use correct flag for sanitize polling
  docs: perf: Minimal introduction the the CXL PMU device and driver
  perf: CXL Performance Monitoring Unit driver
  tools/testing/cxl: add firmware update emulation to CXL memdevs
  tools/testing/cxl: Use named effects for the Command Effect Log
  tools/testing/cxl: Fix command effects for inject/clear poison
  cxl: add a firmware update mechanism using the sysfs firmware loader
  cxl/test: Add Secure Erase opcode support
  cxl/mem: Support Secure Erase
  cxl/test: Add Sanitize opcode support
  cxl/mem: Wire up Sanitization support
  cxl/mbox: Add sanitization handling machinery
  cxl/mem: Introduce security state sysfs file
  cxl/mbox: Allow for IRQ_NONE case in the isr
  Revert "cxl/port: Enable the HDM decoder capability for switch ports"
  cxl/memdev: Formalize endpoint port linkage
  cxl/pci: Unconditionally unmask 256B Flit errors
  cxl/region: Manage decoder target_type at decoder-attach time
  cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM
  ...
</content>
</entry>
<entry>
<title>perf: CXL Performance Monitoring Unit driver</title>
<updated>2023-06-26T00:47:09Z</updated>
<author>
<name>Jonathan Cameron</name>
<email>Jonathan.Cameron@huawei.com</email>
</author>
<published>2023-05-26T09:58:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d7107c72796df3be2ba574f1cf6eca75c60d5ef'/>
<id>urn:sha1:5d7107c72796df3be2ba574f1cf6eca75c60d5ef</id>
<content type='text'>
CXL rev 3.0 introduces a standard performance monitoring hardware
block to CXL. Instances are discovered using CXL Register Locator DVSEC
entries. Each CXL component may have multiple PMUs.

This initial driver supports a subset of types of counter.
It supports counters that are either fixed or configurable, but requires
that they support the ability to freeze and write value whilst frozen.

Development done with QEMU model which will be posted shortly.

Example:

$ perf stat -a -e cxl_pmu_mem0.0/h2d_req_snpcur/ -e cxl_pmu_mem0.0/h2d_req_snpdata/ -e cxl_pmu_mem0.0/clock_ticks/ sleep 1

Performance counter stats for 'system wide':

96,757,023,244,321      cxl_pmu_mem0.0/h2d_req_snpcur/
96,757,023,244,365      cxl_pmu_mem0.0/h2d_req_snpdata/
193,514,046,488,653      cxl_pmu_mem0.0/clock_ticks/

       1.090539600 seconds time elapsed

Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Link: https://lore.kernel.org/r/20230526095824.16336-5-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver</title>
<updated>2023-06-09T11:01:10Z</updated>
<author>
<name>Xu Yang</name>
<email>xu.yang_2@nxp.com</email>
</author>
<published>2023-04-18T10:29:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=55691f99d417084305e575c477f06556f93dfb0b'/>
<id>urn:sha1:55691f99d417084305e575c477f06556f93dfb0b</id>
<content type='text'>
Add ddr performance monitor support for i.MX93.

There are 11 counters for ddr performance events.
- Counter 0 is a 64-bit counter that counts only clock cycles.
- Counter 1-10 are 32-bit counters that can monitor counter-specific
  events in addition to counting reference events.

For example:
  perf stat -a -e imx9_ddr0/ddrc_pm_1,counter=1/,imx9_ddr0/ddrc_pm_2,counter=2/ ls

Besides, this ddr pmu support AXI filter capability. It's implemented as
counter-specific events. It now supports read transaction, write transaction
and read beat events which corresponding respecitively to counter 2, 3 and 4.
axi_mask and axi_id need to be as event parameters.

For example:
  perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_trans_filt,counter=2,axi_mask=ID_MASK,axi_id=ID/
  perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_wr_trans_filt,counter=3,axi_mask=ID_MASK,axi_id=ID/
  perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_beat_filt,counter=4,axi_mask=ID_MASK,axi_id=ID/

Signed-off-by: Xu Yang &lt;xu.yang_2@nxp.com&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20230418102910.2065651-1-xu.yang_2@nxp.com
[will: Remove redundant error message on platform_get_irq() failure]
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
</feed>
