<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/edac, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-06-30T08:57:24Z</updated>
<entry>
<title>EDAC: Initialize EDAC features sysfs attributes</title>
<updated>2025-06-30T08:57:24Z</updated>
<author>
<name>Shiju Jose</name>
<email>shiju.jose@huawei.com</email>
</author>
<published>2025-06-26T10:13:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1e14ea901dc8d976d355ddc3e0de84ee86ef0596'/>
<id>urn:sha1:1e14ea901dc8d976d355ddc3e0de84ee86ef0596</id>
<content type='text'>
Fix the lockdep splat caused by missing sysfs_attr_init() calls for the
recently added EDAC feature's sysfs attributes.

In lockdep_init_map_type(), the check for the lock-class key if
(!static_obj(key) &amp;&amp; !is_dynamic_key(key)) causes the splat.

  Backtrace:
  RIP: 0010:lockdep_init_map_type
  Call Trace:
   __kernfs_create_file
  sysfs_add_file_mode_ns
  internal_create_group
  internal_create_groups
  device_add
  ? __init_waitqueue_head
  edac_dev_register
  devm_cxl_memdev_edac_register
  ? lock_acquire
  ? find_held_lock
  ? cxl_mem_probe
  ? cxl_mem_probe
  ? lockdep_hardirqs_on
  ? cxl_mem_probe
  cxl_mem_probe

  [ bp: Massage. ]

Fixes: f90b738166fe ("EDAC: Add scrub control feature")
Fixes: bcbd069b11b0 ("EDAC: Add a Error Check Scrub control feature")
Fixes: 699ea5219c4b ("EDAC: Add a memory repair control feature")
Reported-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Suggested-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Link: https://lore.kernel.org/20250626101344.1726-1-shiju.jose@huawei.com
</content>
</entry>
<entry>
<title>EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs</title>
<updated>2025-06-25T14:40:03Z</updated>
<author>
<name>Avadhut Naik</name>
<email>avadhut.naik@amd.com</email>
</author>
<published>2025-05-29T20:50:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a3f3040657417aeadb9622c629d4a0c2693a0f93'/>
<id>urn:sha1:a3f3040657417aeadb9622c629d4a0c2693a0f93</id>
<content type='text'>
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based
SOCs has an Address Mask and a Secondary Address Mask register associated with
it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity
during init using these two registers.

Currently, the module primarily considers only the Address Mask register for
computing DIMM sizes. The Secondary Address Mask register is only considered
for odd CS. Additionally, if it has been considered, the Address Mask register
is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose
total capacity is a power of two (32GB, 64GB, etc), this is not an issue
since only the Address Mask register is used.

For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of
two (48GB, 96GB, etc), however, the Secondary Address Mask register is used
in conjunction with the Address Mask register. However, since the module only
considers either of the two registers for a CS, the size computed by the
module is incorrect. The Secondary Address Mask register is not considered for
even CS, and the Address Mask register is not considered for odd CS.

Introduce a new helper function so that both Address Mask and Secondary
Address Mask registers are considered, when valid, for computing DIMM sizes.
Furthermore, also rename some variables for greater clarity.

Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
Reported-by: Žilvinas Žaltiena &lt;zilvinas@natrix.lt&gt;
Signed-off-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Tested-by: Žilvinas Žaltiena &lt;zilvinas@natrix.lt&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com
</content>
</entry>
<entry>
<title>EDAC/igen6: Fix NULL pointer dereference</title>
<updated>2025-06-18T18:19:45Z</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2025-06-18T16:23:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=88efa0de3285be66969b71ec137d9dab1ee19e52'/>
<id>urn:sha1:88efa0de3285be66969b71ec137d9dab1ee19e52</id>
<content type='text'>
A kernel panic was reported with the following kernel log:

  EDAC igen6: Expected 2 mcs, but only 1 detected.
  BUG: unable to handle page fault for address: 000000000000d570
  ...
  Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
  RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
  ...
  igen6_probe+0x2a0/0x343 [igen6_edac]
  ...
  igen6_init+0xc5/0xff0 [igen6_edac]
  ...

This issue occurred because one memory controller was disabled by
the BIOS but the igen6_edac driver still checked all the memory
controllers, including this absent one, to identify the source of
the error. Accessing the null MMIO for the absent memory controller
resulted in the oops above.

Fix this issue by reverting the configuration structure to non-const
and updating the field 'res_cfg-&gt;num_imc' to reflect the number of
detected memory controllers.

Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
Reported-by: Marek Marczykowski-Górecki &lt;marmarek@invisiblethingslab.com&gt;
Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
Suggested-by: Borislav Petkov &lt;bp@alien8.de&gt;
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Marek Marczykowski-Górecki &lt;marmarek@invisiblethingslab.com&gt;
Link: https://lore.kernel.org/r/20250618162307.1523736-1-qiuxu.zhuo@intel.com
</content>
</entry>
<entry>
<title>EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fh</title>
<updated>2025-06-16T21:11:14Z</updated>
<author>
<name>Avadhut Naik</name>
<email>avadhut.naik@amd.com</email>
</author>
<published>2025-06-13T00:51:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b2e673ae53ef4b943f68585207a5f21cfc9a0714'/>
<id>urn:sha1:b2e673ae53ef4b943f68585207a5f21cfc9a0714</id>
<content type='text'>
AMD's Family 19h-based Models 70h-7fh support 4 unified memory controllers
(UMC) per processor die.

The amd64_edac driver, however, assumes only 2 UMCs are supported since
max_mcs variable for the models has not been explicitly set to 4. The same
results in incomplete or incorrect memory information being logged to dmesg by
the module during initialization in some instances.

Fixes: 6c79e42169fe ("EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh")
Closes: https://lore.kernel.org/all/27dc093f-ce27-4c71-9e81-786150a040b6@reox.at/
Reported-by: reox &lt;mailinglist@reox.at&gt;
Signed-off-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250613005233.2330627-1-avadhut.naik@amd.com
</content>
</entry>
<entry>
<title>Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl</title>
<updated>2025-06-03T20:24:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-03T20:24:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=29e9359005dd1ac5f9683608891718e6a32a20a3'/>
<id>urn:sha1:29e9359005dd1ac5f9683608891718e6a32a20a3</id>
<content type='text'>
Pull Compute Express Link (CXL) updates from Dave Jiang:

 - Remove always true condition in cxl features code

 - Add verification of CHBS length for CXL 2.0

 - Ignore interleave granularity when interleave ways is 1

 - Add update addressing mising MODULE_DESCRIPTION for cxl_test

 - A series of cleanups/refactor to prep for AMD Zen5 translate code

 - Clean %pa debug printk in core/hdm.c

 - Documentation updates:
     - Update to CXL Maturity Map
     - Fixes to source linking in CXL documentation
     - CXL documentation fixes, spelling corrections
     - A large collection of CXL documentation for the entire CXL
       subsystem, including documentation on CXL related platform and
       firmware notes

 - Remove redundant code of cxlctl_get_supported_features()

 - Series to support CXL RAS Features
     - Including "Patrol Scrub Control", "Error Check Scrub",
       "Performance Maitenance" and "Memory Sparing". The series
       connects CXL to EDAC.

* tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (53 commits)
  cxl/edac: Add CXL memory device soft PPR control feature
  cxl/edac: Add CXL memory device memory sparing control feature
  cxl/edac: Support for finding memory operation attributes from the current boot
  cxl/edac: Add support for PERFORM_MAINTENANCE command
  cxl/edac: Add CXL memory device ECS control feature
  cxl/edac: Add CXL memory device patrol scrub control feature
  cxl: Update prototype of function get_support_feature_info()
  EDAC: Update documentation for the CXL memory patrol scrub control feature
  cxl/features: Remove the inline specifier from to_cxlfs()
  cxl/feature: Remove redundant code of get supported features
  docs: ABI: Fix "firwmare" to "firmware"
  cxl/Documentation: Fix typo in sysfs write_bandwidth attribute path
  cxl: doc/linux/access-coordinates Update access coordinates calculation methods
  cxl: docs/platform/acpi/srat Add generic target documentation
  cxl: docs/platform/cdat reference documentation
  Documentation: Update the CXL Maturity Map
  cxl: Sync up the driver-api/cxl documentation
  cxl: docs - add self-referencing cross-links
  cxl: docs/allocation/hugepages
  cxl: docs/allocation/reclaim
  ...
</content>
</entry>
<entry>
<title>EDAC/altera: Use correct write width with the INTTEST register</title>
<updated>2025-05-29T15:38:55Z</updated>
<author>
<name>Niravkumar L Rabara</name>
<email>niravkumar.l.rabara@intel.com</email>
</author>
<published>2025-05-27T14:57:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e5ef4cd2a47f27c0c9d8ff6c0f63a18937c071a3'/>
<id>urn:sha1:e5ef4cd2a47f27c0c9d8ff6c0f63a18937c071a3</id>
<content type='text'>
On the SoCFPGA platform, the INTTEST register supports only 16-bit writes.
A 32-bit write triggers an SError to the CPU so do 16-bit accesses only.

  [ bp: AI-massage the commit message. ]

Fixes: c7b4be8db8bc ("EDAC, altera: Add Arria10 OCRAM ECC support")
Signed-off-by: Niravkumar L Rabara &lt;niravkumar.l.rabara@intel.com&gt;
Signed-off-by: Matthew Gerlach &lt;matthew.gerlach@altera.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Acked-by: Dinh Nguyen &lt;dinguyen@kernel.org&gt;
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250527145707.25458-1-matthew.gerlach@altera.com
</content>
</entry>
<entry>
<title>Merge tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras</title>
<updated>2025-05-27T17:13:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-27T17:13:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ada1b0436b5a290923b072b2eb0368a7869bf680'/>
<id>urn:sha1:ada1b0436b5a290923b072b2eb0368a7869bf680</id>
<content type='text'>
Pull EDAC updates from Borislav Petkov:

 - ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies

 - Rework how RRL registers per channel tracking is done in order to
   support newer hardware with different RRL configurations and refactor
   that code. Add support for Granite Rapids server

 - i10nm: explicitly set RRL modes to fix any wrong BIOS programming

 - Properly save and restore Retry Read error Log channel configuration
   info on Intel drivers

 - igen6: Handle correctly the case of fused off memory controllers on
   Arizona Beach and Amston Lake SoCs before adding support for them

 - the usual set of fixes and cleanups

* tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/bluefield: Don't use bluefield_edac_readl() result on error
  EDAC/i10nm: Fix the bitwise operation between variables of different sizes
  EDAC/ie31200: Add two Intel SoCs for EDAC support
  EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server
  EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log()
  EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log()
  EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers
  EDAC/i10nm: Explicitly set the modes of the RRL register sets
  EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0
  EDAC/skx_common: Fix general protection fault
  EDAC/igen6: Add Intel Amston Lake SoCs support
  EDAC/igen6: Add Intel Arizona Beach SoCs support
  EDAC/igen6: Skip absent memory controllers
</content>
</entry>
<entry>
<title>Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-05-27T15:07:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-27T15:07:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2bd1bea5fa6aa79bc563a57919730eb809651b28'/>
<id>urn:sha1:2bd1bea5fa6aa79bc563a57919730eb809651b28</id>
<content type='text'>
Pull irq cleanups from Thomas Gleixner:
 "A set of cleanups for the generic interrupt subsystem:

   - Consolidate on one set of functions for the interrupt domain code
     to get rid of pointlessly duplicated code with only marginal
     different semantics.

   - Update the documentation accordingly and consolidate the coding
     style of the irqdomain header"

* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  irqdomain: Consolidate coding style
  irqdomain: Fix kernel-doc and add it to Documentation
  Documentation: irqdomain: Update it
  Documentation: irq-domain.rst: Simple improvements
  Documentation: irq/concepts: Minor improvements
  Documentation: irq/concepts: Add commas and reflow
  irqdomain: Improve kernel-docs of functions
  irqdomain: Make struct irq_domain_info variables const
  irqdomain: Use irq_domain_instantiate()'s return value as initializers
  irqdomain: Drop irq_linear_revmap()
  pinctrl: keembay: Switch to irq_find_mapping()
  irqchip/armada-370-xp: Switch to irq_find_mapping()
  gpu: ipu-v3: Switch to irq_find_mapping()
  gpio: idt3243x: Switch to irq_find_mapping()
  sh: Switch to irq_find_mapping()
  powerpc: Switch to irq_find_mapping()
  irqdomain: Drop irq_domain_add_*() functions
  powerpc: Switch irq_domain_add_nomap() to use fwnode
  thermal: Switch to irq_domain_create_linear()
  soc: Switch to irq_domain_create_*()
  ...
</content>
</entry>
<entry>
<title>cxl/edac: Add CXL memory device memory sparing control feature</title>
<updated>2025-05-23T20:24:53Z</updated>
<author>
<name>Shiju Jose</name>
<email>shiju.jose@huawei.com</email>
</author>
<published>2025-05-21T12:47:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=588ca944c27729c7f950d1f44c6d6700a919969a'/>
<id>urn:sha1:588ca944c27729c7f950d1f44c6d6700a919969a</id>
<content type='text'>
Memory sparing is defined as a repair function that replaces a portion of
memory with a portion of functional memory at that same DPA. The subclasses
for this operation vary in terms of the scope of the sparing being
performed. The cacheline sparing subclass refers to a sparing action that
can replace a full cacheline. Row sparing is provided as an alternative to
PPR sparing functions and its scope is that of a single DDR row.
As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
PPR when possible.
Bank sparing allows an entire bank to be replaced. Rank sparing is defined
as an operation in which an entire DDR rank is replaced.

Memory sparing maintenance operations may be supported by CXL devices
that implement CXL.mem protocol. A sparing maintenance operation requests
the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support memory sparing
features may implement sparing maintenance operations.

The host may issue a query command by setting query resources flag in the
input payload (CXL spec 3.2 Table 8-120) to determine availability of
sparing resources for a given address. In response to a query request,
the device shall report the resource availability by producing the memory
sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
of the values specified in the request.

During the execution of a sparing maintenance operation, a CXL memory
device:
- may not retain data
- may not be able to process CXL.mem requests correctly.
These CXL memory device capabilities are specified by restriction flags
in the memory sparing feature readable attributes.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a memory sparing maintenance
operation by using DRAM event record, where the 'maintenance needed' flag
may set. The event record contains some of the DPA, Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
should be repaired. The userspace tool requests for maintenance operation
if the 'maintenance needed' flag set in the CXL DRAM error record.

CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
discovery and configuration.

Add support for controlling CXL memory device memory sparing feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for memory sparing to the userspace. For example CXL memory
sparing control for the CXL mem0 device is exposed in
/sys/bus/edac/devices/cxl_mem0/mem_repairX/

Use case
========
1. CXL device identifies a failure in a memory component, report to
   userspace in a CXL DRAM trace event with DPA and other attributes of
   memory to repair such as channel, rank, nibble mask, bank Group,
   bank, row, column, sub-channel.

2. Rasdaemon process the trace event and may issue query request in sysfs
check resources available for memory sparing if either of the following
conditions met.
 - 'maintenance needed' flag set in the event record.
 - 'threshold event' flag set for CVME threshold feature.
 - When the number of corrected error reported on a CXL.mem media to the
   userspace exceeds the threshold value for corrected error count defined
   by the userspace policy.

3. Rasdaemon process the memory sparing trace event and issue repair
   request for memory sparing.

Kernel CXL driver shall report memory sparing event record to the userspace
with the resource availability in order rasdaemon to process the event
record and issue a repair request in sysfs for the memory sparing operation
in the CXL device.

Note: Based on the feedbacks from the community 'query' sysfs attribute is
removed and reporting memory sparing error record to the userspace are not
supported. Instead userspace issues sparing operation and kernel does the
same to the CXL memory device, when 'maintenance needed' flag set in the
DRAM event record.

Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device.

Note: Tested memory sparing feature control with QEMU patch
      "hw/cxl: Add emulation for memory sparing control feature"
      https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6

Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
Reviewed-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://patch.msgid.link/20250521124749.817-8-shiju.jose@huawei.com
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
</content>
</entry>
<entry>
<title>EDAC/bluefield: Don't use bluefield_edac_readl() result on error</title>
<updated>2025-05-22T15:58:28Z</updated>
<author>
<name>David Thompson</name>
<email>davthompson@nvidia.com</email>
</author>
<published>2025-03-18T21:47:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea3b0b7f541b9511abe2b89547c95458804f38e2'/>
<id>urn:sha1:ea3b0b7f541b9511abe2b89547c95458804f38e2</id>
<content type='text'>
The bluefield_edac_readl() routine returns an uninitialized result on error
paths. In those cases the calling routine should not use the uninitialized
result. The driver should simply log the error, and then return early.

Fixes: e41967575474 ("EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2")
Signed-off-by: David Thompson &lt;davthompson@nvidia.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shravan Kumar Ramani &lt;shravankr@nvidia.com&gt;
Link: https://lore.kernel.org/20250318214747.12271-1-davthompson@nvidia.com
</content>
</entry>
</feed>
