summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu/resctrl
AgeCommit message (Collapse)AuthorLines
2026-03-04x86/resctrl: Fix SNC detectionTony Luck-31/+5
Now that the x86 topology code has a sensible nodes-per-package measure, that does not depend on the online status of CPUs, use this to divinate the SNC mode. Note that when Cluster on Die (CoD) is configured on older systems this will also show multiple NUMA nodes per package. Intel Resource Director Technology is incomaptible with CoD. Print a warning and do not use the fixup MSR_RMID_SNC_CONFIG. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Link: https://patch.msgid.link/aaCxbbgjL6OZ6VMd@agluck-desk3 Link: https://patch.msgid.link/20260303110100.367976706@infradead.org
2026-02-10Merge tag 'x86_cache_for_v7.0_rc1' of ↵Linus Torvalds-102/+628
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Extend the resctrl machinery to support telemetry monitoring on Intel (Tony Luck) The practical usage of this is being able to tell how much energy or how much work can be attributed to a group of tasks tracked under a single idenitifier. Prepend this work with proper refactoring of resctrl domains handling code. * tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86,fs/resctrl: Update documentation for telemetry events x86/resctrl: Enable RDT_RESOURCE_PERF_PKG fs/resctrl: Move RMID initialization to first mount x86,fs/resctrl: Compute number of RMIDs as minimum across resources fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG x86/resctrl: Add energy/perf choices to rdt boot option x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() fs/resctrl: Refactor mkdir_mondata_subdir() x86/resctrl: Read telemetry events x86/resctrl: Find and enable usable telemetry events x86,fs/resctrl: Add architectural event pointer x86,fs/resctrl: Fill in details of events for performance and energy GUIDs x86/resctrl: Discover hardware telemetry events fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains x86,fs/resctrl: Add and initialize a resource for package scope monitoring x86,fs/resctrl: Add an architectural hook called for first mount x86,fs/resctrl: Support binary fixed point event counters x86,fs/resctrl: Handle events that can be read from any CPU ...
2026-01-13x86/resctrl: Fix memory bandwidth counter width for HygonXiaochen Shen-2/+16
The memory bandwidth calculation relies on reading the hardware counter and measuring the delta between samples. To ensure accurate measurement, the software reads the counter frequently enough to prevent it from rolling over twice between reads. The default Memory Bandwidth Monitoring (MBM) counter width is 24 bits. Hygon CPUs provide a 32-bit width counter, but they do not support the MBM capability CPUID leaf (0xF.[ECX=1]:EAX) to report the width offset (from 24 bits). Consequently, the kernel falls back to the 24-bit default counter width, which causes incorrect overflow handling on Hygon CPUs. Fix this by explicitly setting the counter width offset to 8 bits (resulting in a 32-bit total counter width) for Hygon CPUs. Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251209062650.1536952-3-shenxiaochen@open-hieco.net
2026-01-13x86/resctrl: Add missing resctrl initialization for HygonXiaochen Shen-2/+4
Hygon CPUs supporting Platform QoS features currently undergo partial resctrl initialization through resctrl_cpu_detect() in the Hygon BSP init helper and AMD/Hygon common initialization code. However, several critical data structures remain uninitialized for Hygon CPUs in the following paths: - get_mem_config()-> __rdt_get_mem_config_amd(): rdt_resource::membw,alloc_capable hw_res::num_closid - rdt_init_res_defs()->rdt_init_res_defs_amd(): rdt_resource::cache hw_res::msr_base,msr_update Add the missing AMD/Hygon common initialization to ensure proper Platform QoS functionality on Hygon CPUs. Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251209062650.1536952-2-shenxiaochen@open-hieco.net
2026-01-10x86/resctrl: Enable RDT_RESOURCE_PERF_PKGTony Luck-0/+22
Since telemetry events are enumerated on resctrl mount the RDT_RESOURCE_PERF_PKG resource is not considered "monitoring capable" during early resctrl initialization. This means that the domain list for RDT_RESOURCE_PERF_PKG is not built when the CPU hotplug notifiers are registered and run for the first time right after resctrl initialization. Mark the RDT_RESOURCE_PERF_PKG as "monitoring capable" upon successful telemetry event enumeration to ensure future CPU hotplug events include this resource and initialize its domain list for CPUs that are already online. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10x86,fs/resctrl: Compute number of RMIDs as minimum across resourcesTony Luck-2/+13
resctrl assumes that only the L3 resource supports monitor events, so it simply takes the rdt_resource::num_rmid from RDT_RESOURCE_L3 as the system's number of RMIDs. The addition of telemetry events in a different resource breaks that assumption. Compute the number of available RMIDs as the minimum value across all mon_capable resources (analogous to how the number of CLOSIDs is computed across alloc_capable resources). Note that mount time enumeration of the telemetry resource means that this number can be reduced. If this happens, then some memory will be wasted as the allocations for rdt_l3_mon_domain::mbm_states[] and rdt_l3_mon_domain::rmid_busy_llc created during resctrl initialization will be larger than needed. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKGTony Luck-1/+52
There are now three meanings for "number of RMIDs": 1) The number for legacy features enumerated by CPUID leaf 0xF. This is the maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled will force scaling down the CPUID enumerated value by the number of SNC nodes per L3-cache. 2) The number of registers in MMIO space for each event. This is enumerated in the XML files and is the value initialized into event_group::num_rmid. 3) The number of "hardware counters" (this isn't a strictly accurate description of how things work, but serves as a useful analogy that does describe the limitations) feeding to those MMIO registers. This is enumerated in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature(). Event groups with insufficient "hardware counters" to track all RMIDs are difficult for users to use, since the system may reassign "hardware counters" at any time. This means that users cannot reliably collect two consecutive event counts to compute the rate at which events are occurring. Disable such event groups by default. The user may override this with a command line "rdt=" option. In this case limit an under-resourced event group's number of possible monitor resource groups to the lowest number of "hardware counters". Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource "num_rmid" value to the smallest of these values as this value will be used later to compare against the number of RMIDs supported by other resources to determine how many monitoring resource groups are supported. N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and the type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid) won't complain about mixing signed and unsigned types. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86/resctrl: Add energy/perf choices to rdt boot optionTony Luck-0/+42
Legacy resctrl features are enumerated by X86_FEATURE_* flags. These may be overridden by quirks to disable features in the case of errata. Users can use kernel command line options to either disable a feature, or to force enable a feature that was disabled by a quirk. A different approach is needed for hardware features that do not have an X86_FEATURE_* flag. Update parsing of the "rdt=" boot parameter to call the telemetry driver directly to handle new "perf" and "energy" options that controls activation of telemetry monitoring of the named type. By itself a "perf" or "energy" option controls the forced enabling or disabling (with ! prefix) of all event groups of the named type. A ":guid" suffix allows for fine grained control per event group. [ bp: s/intel_aet_option/intel_handle_aet_option/g ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKGTony Luck-0/+59
The L3 resource has several requirements for domains. There are per-domain structures that hold the 64-bit values of counters, and elements to keep track of the overflow and limbo threads. None of these are needed for the PERF_PKG resource. The hardware counters are wide enough that they do not wrap around for decades. Define a new rdt_perf_pkg_mon_domain structure which just consists of the standard rdt_domain_hdr to keep track of domain id and CPU mask. Update resctrl_online_mon_domain() for RDT_RESOURCE_PERF_PKG. The only action needed for this resource is to create and populate domain directories if a domain is added while resctrl is mounted. Similarly resctrl_offline_mon_domain() only needs to remove domain directories. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86/resctrl: Read telemetry eventsTony Luck-0/+60
Introduce intel_aet_read_event() to read telemetry events for resource RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each package, so scan all of them and add up all counters. Aggregators may return an invalid data indication if they have received no records for a given RMID. The user will see "Unavailable" if none of the aggregators on a package provide valid counts. Resctrl now uses readq() so depends on X86_64. Update Kconfig. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86/resctrl: Find and enable usable telemetry eventsTony Luck-2/+61
Every event group has a private copy of the data of all telemetry event aggregators (aka "telemetry regions") tracking its feature type. Included may be regions that have the same feature type but tracking different GUID from the event group's. Traverse the event group's telemetry region data and mark all regions that are not usable by the event group as unusable by clearing those regions' MMIO addresses. A region is considered unusable if: 1) GUID does not match the GUID of the event group. 2) Package ID is invalid. 3) The enumerated size of the MMIO region does not match the expected value from the XML description file. Hereafter any telemetry region with an MMIO address is considered valid for the event group it is associated with. Enable all the event group's events as long as there is at least one usable region from where data for its events can be read. Enabling of an event can fail if the same event has already been enabled as part of another event group. It should never happen that the same event is described by different GUID supported by the same system so just WARN (via resctrl_enable_mon_event()) and skip the event. Note that it is architecturally possible that some telemetry events are only supported by a subset of the packages in the system. It is not expected that systems will ever do this. If they do the user will see event files in resctrl that always return "Unavailable". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Add architectural event pointerTony Luck-4/+4
The resctrl file system layer passes the domain, RMID, and event id to the architecture to fetch an event counter. Fetching a telemetry event counter requires additional information that is private to the architecture, for example, the offset into MMIO space from where the counter should be read. Add mon_evt::arch_priv that architecture can use for any private data related to the event. The resctrl filesystem initializes mon_evt::arch_priv when the architecture enables the event and passes it back to architecture when needing to fetch an event counter. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Fill in details of events for performance and energy GUIDsTony Luck-0/+66
The telemetry event aggregators of the Intel Clearwater Forest CPU support two RMID-based feature types: "energy" with GUID 0x26696143¹, and "perf" with GUID 0x26557651². The event counter offsets in an aggregator's MMIO space are arranged in groups for each RMID. E.g., the "energy" counters for GUID 0x26696143 are arranged like this: MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY ... MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY After all counters there are three status registers that provide indications of how many times an aggregator was unable to process event counts, the time stamp for the most recent loss of data, and the time stamp of the most recent successful update. MMIO offset:0x2400 AGG_DATA_LOSS_COUNT MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP Define event_group structures for both of these aggregator types and define the events tracked by the aggregators in the file system code. PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point format. File system code must output as floating point values. ¹https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml ²https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86/resctrl: Discover hardware telemetry eventsTony Luck-0/+122
Each CPU collects data for telemetry events that it sends to the nearest telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID changes, or when a two millisecond timer expires. There is a feature type ("energy" or "perf"), GUID, and MMIO region associated with each aggregator. This combination links to an XML description of the set of telemetry events tracked by the aggregator. XML files are published by Intel in a GitHub repository¹. The telemetry event aggregators maintain per-RMID per-event counts of the total seen for all the CPUs. There may be multiple telemetry event aggregators per package. There are separate sets of aggregators for each feature type. Aggregators in a set may have different GUIDs. All aggregators with the same feature type and GUID are symmetric keeping counts for the same set of events for the CPUs that provide data to them. The XML file for each aggregator provides the following information: 0) Feature type of the events ("perf" or "energy") 1) Which telemetry events are tracked by the aggregator. 2) The order in which the event counters appear for each RMID. 3) The value type of each event counter (integer or fixed-point). 4) The number of RMIDs supported. 5) Which additional aggregator status registers are included. 6) The total size of the MMIO region for an aggregator. Introduce struct event_group that condenses the relevant information from an XML file. Hereafter an "event group" refers to a group of events of a particular feature type (event_group::pfname set to "energy" or "perf") with a particular GUID. Use event_group::pfname to determine the feature id needed to obtain the aggregator details. It will later be used in console messages and with the rdt= boot parameter. The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events. This driver provides intel_pmt_get_regions_by_feature() to list all available telemetry event aggregators of a given feature type. The list includes the "guid", the base address in MMIO space for the region where the event counters are exposed, and the package id where the all the CPUs that report to this aggregator are located. Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event group to obtain a private copy of that event group's aggregator data. Duplicate the aggregator data between event groups that have the same feature type but different GUID. Further processing on this private copy will be unique to the event group. ¹https://github.com/intel/Intel-PMT [ bp: Zap text explaining the code, s/guid/GUID/g ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Add and initialize a resource for package scope monitoringTony Luck-0/+10
Add a new PERF_PKG resource and introduce package level scope for monitoring telemetry events so that CPU hotplug notifiers can build domains at the package granularity. Use the physical package ID available via topology_physical_package_id() to identify the monitoring domains with package level scope. This enables user space to use: /sys/devices/system/cpu/cpuX/topology/physical_package_id to identify the monitoring domain a CPU is associated with. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Add an architectural hook called for first mountTony Luck-0/+4
Enumeration of Intel telemetry events is an asynchronous process involving several mutually dependent drivers added as auxiliary devices during the device_initcall() phase of Linux boot. The process finishes after the probe functions of these drivers completes. But this happens after resctrl_arch_late_init() is executed. Tracing the enumeration process shows that it does complete a full seven seconds before the earliest possible mount of the resctrl file system (when included in /etc/fstab for automatic mount by systemd). Add a hook for use by telemetry event enumeration and initialization and run it once at the beginning of resctrl mount without any locks held. The architecture is responsible for any required locking. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20260105191711.GBaVwON5nZn-uO6Sqg@fat_crate.local
2026-01-05x86,fs/resctrl: Support binary fixed point event countersTony Luck-3/+3
resctrl assumes that all monitor events can be displayed as unsigned decimal integers. Hardware architecture counters may provide some telemetry events with greater precision where the event is not a simple count, but is a measurement of some sort (e.g. Joules for energy consumed). Add a new argument to resctrl_enable_mon_event() for architecture code to inform the file system that the value for a counter is a fixed-point value with a specific number of binary places. Only allow architecture to use floating point format on events that the file system has marked with mon_evt::is_floating_point which reflects the contract with user space on how the event values are displayed. Display fixed point values with values rounded to ceil(binary_bits * log10(2)) decimal places. Special case for zero binary bits to print "{value}.0". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Handle events that can be read from any CPUTony Luck-3/+3
resctrl assumes that monitor events can only be read from a CPU in the cpumask_t set of each domain. This is true for x86 events accessed with an MSR interface, but may not be true for other access methods such as MMIO. Introduce and use flag mon_evt::any_cpu, settable by architecture, that indicates there are no restrictions on which CPU can read that event. This flag is not supported by the L3 event reading that requires to be run on a CPU that belongs to the L3 domain of the event being read. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Rename some L3 specific functionsTony Luck-11/+13
With the arrival of monitor events tied to new domains associated with a different resource it would be clearer if the L3 resource specific functions are more accurately named. Rename three groups of functions: Functions that allocate/free architecture per-RMID MBM state information: arch_domain_mbm_alloc() -> l3_mon_domain_mbm_alloc() mon_domain_free() -> l3_mon_domain_free() Functions that allocate/free filesystem per-RMID MBM state information: domain_setup_mon_state() -> domain_setup_l3_mon_state() domain_destroy_mon_state() -> domain_destroy_l3_mon_state() Initialization/exit: rdt_get_mon_l3_config() -> rdt_get_l3_mon_config() resctrl_mon_resource_init() -> resctrl_l3_mon_resource_init() resctrl_mon_resource_exit() -> resctrl_l3_mon_resource_exit() Ensure kernel-doc descriptions of these functions' return values are present and correctly formatted. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domainTony Luck-33/+33
The upcoming telemetry event monitoring is not tied to the L3 resource and will have a new domain structure. Rename the L3 resource specific domain data structures to include "l3_" in their names to avoid confusion between the different resource specific domain structures: rdt_mon_domain -> rdt_l3_mon_domain rdt_hw_mon_domain -> rdt_hw_l3_mon_domain No functional change. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Use struct rdt_domain_hdr when reading countersTony Luck-3/+9
Convert the whole call sequence from mon_event_read() to resctrl_arch_rmid_read() to pass resource independent struct rdt_domain_hdr instead of an L3 specific domain structure to prepare for monitoring events in other resources. This additional layer of indirection obscures which aspects of event counting depend on a valid domain. Event initialization, support for assignable counters, and normal event counting implicitly depend on a valid domain while summing of domains does not. Split summing domains from the core event counting handling to make their respective dependencies obvious. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-04x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdrTony Luck-2/+2
Up until now, all monitoring events were associated with the L3 resource and it made sense to use the L3 specific "struct rdt_mon_domain *" argument to functions operating on domains. Telemetry events will be tied to a new resource with its instances represented by a new domain structure that, just like struct rdt_mon_domain, starts with the generic struct rdt_domain_hdr. Prepare to support domains belonging to different resources by changing the calling convention of functions operating on domains. Pass the generic header and use that to find the domain specific structure where needed. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-04x86/resctrl: Clean up domain_remove_cpu_ctrl()Tony Luck-15/+14
For symmetry with domain_remove_cpu_mon() refactor domain_remove_cpu_ctrl() to take an early return when removing a CPU does not empty the domain. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-04x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain typesTony Luck-10/+17
New telemetry events will be associated with a new package scoped resource with a new domain structure. Refactor domain_remove_cpu_mon() so all the L3 domain processing is separate from the general domain action of clearing the CPU bit in the mask. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-04x86/resctrl: Move L3 initialization into new helper functionTony Luck-30/+34
Carve out the resource monitoring domain init code into a separate helper in order to be able to initialize new types of monitoring domains besides the usual L3 ones. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-04x86,fs/resctrl: Improve domain type checkingTony Luck-4/+6
Every resctrl resource has a list of domain structures. struct rdt_ctrl_domain and struct rdt_mon_domain both begin with struct rdt_domain_hdr with rdt_domain_hdr::type used in validity checks before accessing the domain of a particular type. Add the resource id to struct rdt_domain_hdr in preparation for a new monitoring domain structure that will be associated with a new monitoring resource. Improve existing domain validity checks with a new helper domain_header_is_valid() that checks both domain type and resource id. domain_header_is_valid() should be used before every call to container_of() that accesses a domain structure. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2025-12-02Merge tag 'x86_cache_for_v6.19_rc1' of ↵Linus Torvalds-0/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups * tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Update bit_usage to reflect io_alloc fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID fs/resctrl: Introduce interface to display io_alloc CBMs fs/resctrl: Add user interface to enable/disable io_alloc feature fs/resctrl: Introduce interface to display "io_alloc" support x86,fs/resctrl: Implement "io_alloc" enable/disable handlers x86,fs/resctrl: Detect io_alloc feature x86/resctrl: Add SDCIAE feature in the command line options x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement fs/resctrl: Consider sparse masks when initializing new group's allocation x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
2025-11-21x86,fs/resctrl: Implement "io_alloc" enable/disable handlersBabu Moger-0/+45
"io_alloc" is the generic name of the new resctrl feature that enables system software to configure the portion of cache allocated for I/O traffic. On AMD systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). Introduce the architecture-specific functions that resctrl fs should call to enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state by setting (to enable) or clearing (to disable) bit 1 of MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com
2025-11-21x86,fs/resctrl: Detect io_alloc featureBabu Moger-0/+7
AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software to control the portions of L3 cache used for direct insertion of data from I/O devices into the L3 cache. Introduce a generic resctrl cache resource property "io_alloc_capable" as the first part of the new "io_alloc" resctrl feature that will support AMD's SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if a portion of the cache can be allocated for I/O traffic. Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD) systems that support SDCIAE. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com
2025-11-21x86/resctrl: Add SDCIAE feature in the command line optionsBabu Moger-0/+2
Add a kernel command-line parameter to enable or disable the exposure of the L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE) hardware feature to resctrl. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/c623edf7cb369ba9da966de47d9f1b666778a40e.1762995456.git.babu.moger@amd.com
2025-10-20x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in ↵Babu Moger-1/+10
mbm_event mode The following NULL pointer dereference is encountered on mount of resctrl fs after booting a system that supports assignable counters with the "rdt=!mbmtotal,!mbmlocal" kernel parameters: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:mbm_cntr_get Call Trace: rdtgroup_assign_cntr_event rdtgroup_assign_cntrs rdt_get_tree Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features and the MBM events they represent. This results in the per-domain MBM event related data structures to not be allocated during early initialization. resctrl fs initialization follows by implicitly enabling both MBM total and local events on a system that supports assignable counters (mbm_event mode), but this enabling occurs after the per-domain data structures have been created. After booting, resctrl fs assumes that an enabled event can access all its state. This results in NULL pointer dereference when resctrl attempts to access the un-allocated structures of an enabled event. Remove the late MBM event enabling from resctrl fs. This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter (mbm_event) mode is enabled without any events to support. Switching between the "default" and "mbm_event" mode without any events is not practical. Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL or X86_FEATURE_CQM_MBM_LOCAL. This ensures all needed MBM related data structures are created before use and that it is only possible to switch between "default" and "mbm_event" mode when the same events are available in both modes. This dependency does not exist in the hardware but this usage of these feature settings work for known systems. [ bp: Massage commit message. ] Fixes: 13390861b426e ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details") Co-developed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-13x86/resctrl: Fix miscount of bandwidth event when reactivating previously ↵Babu Moger-4/+10
unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-10-13x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater ForestChen Yu-0/+1
Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Tony Luck <tony.luck@intel.com>
2025-09-15x86/resctrl: Configure mbm_event mode if supportedBabu Moger-0/+16
Configure mbm_event mode on AMD platforms. On AMD platforms, it is recommended to use the mbm_event mode, if supported, to prevent the hardware from resetting counters between reads. This can result in misleading values or display "Unavailable" if no counter is assigned to the event. Enable mbm_event mode, known as ABMC (Assignable Bandwidth Monitoring Counters) on AMD, by default if the system supports it. Update ABMC across all logical processors within the resctrl domain to ensure proper functionality. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()Babu Moger-0/+75
System software reads resctrl event data for a particular resource by writing the RMID and Event Identifier (EvtID) to the QM_EVTSEL register and then reading the event data from the QM_CTR register. In ABMC mode, the event data of a specific counter ID is read by setting the following fields: QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting QM_EVTSEL.RMID to the desired counter ID. Reading the QM_CTR then returns the contents of the specified counter ID. RMID_VAL_ERROR bit is set if the counter configuration is invalid, or if an invalid counter ID is set in the QM_EVTSEL.RMID field. RMID_VAL_UNAVAIL bit is set if the counter data is unavailable. Introduce resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() to reset and read event data for a specific counter. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Refactor resctrl_arch_rmid_read()Babu Moger-15/+23
resctrl_arch_rmid_read() adjusts the value obtained from MSR_IA32_QM_CTR to account for the overflow for MBM events and apply counter scaling for all the events. This logic is common to both reading an RMID and reading a hardware counter directly. Refactor the hardware value adjustment logic into get_corrected_val() to prepare for support of reading a hardware counter. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter ↵Babu Moger-0/+36
with ABMC The ABMC feature allows users to assign a hardware counter to an RMID, event pair and monitor bandwidth usage as long as it is assigned. The hardware continues to track the assigned counter until it is explicitly unassigned by the user. Implement an x86 architecture-specific handler to configure a counter. This architecture specific handler is called by resctrl fs when a counter is assigned or unassigned as well as when an already assigned counter's configuration should be updated. Configure counters by writing to the L3_QOS_ABMC_CFG MSR, specifying the counter ID, bandwidth source (RMID), and event configuration. The ABMC feature details are documented in APM [1] available from [2]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86/resctrl: Add data structures and definitions for ABMC assignmentBabu Moger-0/+36
The ABMC feature allows users to assign a hardware counter to an RMID, event pair and monitor bandwidth usage as long as it is assigned. The hardware continues to track the assigned counter until it is explicitly unassigned by the user. The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh). ABMC counter assignment is done by setting the counter id, bandwidth source (RMID) and bandwidth configuration. Attempts to read or write the MSR when ABMC is not enabled will result in a #GP(0) exception. Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG (0xC000_03FDh): ========================================================================= Bits Mnemonic Description Access Reset Type Value ========================================================================= 63 CfgEn Configuration Enable R/W 0 62 CtrEn Enable/disable counting R/W 0 61:53 – Reserved MBZ 0 52:48 CtrID Counter Identifier R/W 0 47 IsCOS BwSrc field is a CLOSID R/W 0 (not an RMID) 46:44 – Reserved MBZ 0 43:32 BwSrc Bandwidth Source R/W 0 (RMID or CLOSID) 31:0 BwType Bandwidth configuration R/W 0 tracked by the CtrID ========================================================================== The ABMC feature details are documented in APM [1] available from [2]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). [ bp: Touchups. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86/resctrl: Add support to enable/disable AMD ABMC featureBabu Moger-0/+50
Add the functionality to enable/disable the AMD ABMC feature. The AMD ABMC feature is enabled by setting enabled bit(0) in the L3_QOS_EXT_CFG MSR. When the state of ABMC is changed, the MSR needs to be updated on all the logical processors in the QOS Domain. Hardware counters will reset when ABMC state is changed. [ bp: Massage commit message. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature detailsBabu Moger-5/+13
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5. Bits Description 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth Monitoring Counter ID + 1 The ABMC feature details are documented in APM [1] available from [2]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). Detect the feature and number of assignable counters supported. For backward compatibility, upon detecting the assignable counter feature, enable the mbm_total_bytes and mbm_local_bytes events that users are familiar with as part of original L3 MBM support. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86,fs/resctrl: Consolidate monitoring related data from rdt_resourceBabu Moger-7/+7
The cache allocation and memory bandwidth allocation feature properties are consolidated into struct resctrl_cache and struct resctrl_membw respectively. In preparation for more monitoring properties that will clobber the existing resource struct more, re-organize the monitoring specific properties to also be in a separate structure. Also convert "bandwidth sources" terminology to "memory transactions" to have consistency within resctrl for related monitoring features. [ bp: Massage commit message. ] Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Add ABMC feature in the command line optionsBabu Moger-0/+2
Add a kernel command-line parameter to enable or disable the exposure of the ABMC (Assignable Bandwidth Monitoring Counters) hardware feature to resctrl. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Prepare for more monitor eventsTony Luck-41/+43
There's a rule in computer programming that objects appear zero, once, or many times. So code accordingly. There are two MBM events and resctrl is coded with a lot of if (local) do one thing if (total) do a different thing Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays of pointers to per event data instead of explicit fields for total and local bandwidth. Simplify by coding for many events using loops on which are enabled. Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more widely. Also provide a for_each_mbm_event_id() helper macro. Cleanup variable names in functions touched to consistently use "eventid" for those with type enum resctrl_event_id. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Remove the rdt_mon_features global variableTony Luck-9/+5
rdt_mon_features is used as a bitmask of enabled monitor events. A monitor event's status is now maintained in mon_evt::enabled with all monitor events' mon_evt structures found in the filesystem's mon_event_all[] array. Remove the remaining uses of rdt_mon_features. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Replace architecture event enabled checksTony Luck-4/+4
The resctrl file system now has complete knowledge of the status of every event. So there is no need for per-event function calls to check. Replace each of the resctrl_arch_is_{event}enabled() calls with resctrl_is_mon_event_enabled(QOS_{EVENT}). No functional change. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Consolidate monitor event descriptionsTony Luck-3/+9
There are currently only three monitor events, all associated with the RDT_RESOURCE_L3 resource. Growing support for additional events will be easier with some restructuring to have a single point in file system code where all attributes of all events are defined. Place all event descriptions into an array mon_event_all[]. Doing this has the beneficial side effect of removing the need for rdt_resource::evt_list. Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range checks for event ids and as the starting index to scan mon_event_all[]. Drop the code that builds evt_list and change the two places where the list is scanned to scan mon_event_all[] instead using a new helper macro for_each_mon_event(). Architecture code now informs file system code which events are available with resctrl_enable_mon_event(). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-06-16x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl ↵Qinyun Tan-2/+4
subsystem In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain structure representing a NUMA node relies on the cacheinfo interface (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) for monitoring. The L3 cache information of a SNC NUMA node determines which domains are summed for the "top level" L3-scoped events. rdt_mon_domain::ci is initialized using the first online CPU of a NUMA node. When this CPU goes offline, its shared_cpu_map is cleared to contain only the offline CPU itself. Subsequently, attempting to read counters via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning zero values for "top-level events" without any error indication. Replace the cacheinfo references in struct rdt_mon_domain and struct rmid_read with the cacheinfo ID (a unique identifier for the L3 cache). rdt_domain_hdr::cpu_mask contains the online CPUs associated with that domain. When reading "top-level events", select a CPU from rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine valid CPUs for reading RMID counter via the MSR interface. Considering all CPUs associated with the L3 cache improves the chances of picking a housekeeping CPU on which the counter reading work can be queued, avoiding an unnecessary IPI. Fixes: 328ea68874642 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files") Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
2025-05-27Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds-7223/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-16x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrlJames Morse-7348/+15
Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com
2025-05-16x86/resctrl: Always initialise rid field in rdt_resources_all[]James Morse-5/+5
x86 has an array, rdt_resources_all[], of all possible resources. The for-each-resource walkers depend on the rid field of all resources being initialised. If the array ever grows due to another architecture adding a resource type that is not defined on x86, the for-each-resources walkers will loop forever. Initialise all the rid values in resctrl_arch_late_init() before any for-each-resource walker can be called. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-24-james.morse@arm.com