<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/base/memory.c, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-07-10T05:41:59Z</updated>
<entry>
<title>drivers/base/node: optimize memory block registration to reduce boot time</title>
<updated>2025-07-10T05:41:59Z</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2025-05-28T17:18:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4f745def815d86eece3614aa0cd10a25cf94fed4'/>
<id>urn:sha1:4f745def815d86eece3614aa0cd10a25cf94fed4</id>
<content type='text'>
Patch series "drivers/base/node.c: optimization and cleanups", v7.


This patch (of 7)

During node device initialization, `memory blocks` are registered under
each NUMA node.  The `memory blocks` to be registered are identified using
the node's start and end PFNs, which are obtained from the node's pg_data

However, not all PFNs within this range necessarily belong to the same
node—some may belong to other nodes.  Additionally, due to the
discontiguous nature of physical memory, certain sections within a `memory
block` may be absent.

As a result, `memory blocks` that fall between a node's start and end PFNs
may span across multiple nodes, and some sections within those blocks may
be missing.  `Memory blocks` have a fixed size, which is architecture
dependent.

Due to these considerations, the memory block registration is currently
performed as follows:

for_each_online_node(nid):
    start_pfn = pgdat-&gt;node_start_pfn;
    end_pfn = pgdat-&gt;node_start_pfn + node_spanned_pages;
    for_each_memory_block_between(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn))
        mem_blk = memory_block_id(pfn_to_section_nr(pfn));
        pfn_mb_start=section_nr_to_pfn(mem_blk-&gt;start_section_nr)
        pfn_mb_end = pfn_start + memory_block_pfns - 1
        for (pfn = pfn_mb_start; pfn &lt; pfn_mb_end; pfn++):
            if (get_nid_for_pfn(pfn) != nid):
                continue;
            else
                do_register_memory_block_under_node(nid, mem_blk,
                                                        MEMINIT_EARLY);

Here, we derive the start and end PFNs from the node's pg_data, then
determine the memory blocks that may belong to the node.  For each `memory
block` in this range, we inspect all PFNs it contains and check their
associated NUMA node ID.  If a PFN within the block matches the current
node, the memory block is registered under that node.

If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() performs
a binary search in the `memblock regions` to determine the NUMA node ID
for a given PFN.  If it is not enabled, the node ID is retrieved directly
from the struct page.

On large systems, this process can become time-consuming, especially since
we iterate over each `memory block` and all PFNs within it until a match
is found.  When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the
additional overhead of the binary search increases the execution time
significantly, potentially leading to soft lockups during boot.

In this patch, we iterate over `memblock region` to identify the `memory
blocks` that belong to the current NUMA node.  `memblock regions` are
contiguous memory ranges, each associated with a single NUMA node, and
they do not span across multiple nodes.

for_each_memory_region(r): // r =&gt; region
  if (!node_online(r-&gt;nid)):
    continue;
  else
    for_each_memory_block_between(r-&gt;base, r-&gt;base + r-&gt;size - 1):
      do_register_memory_block_under_node(r-&gt;nid, mem_blk, MEMINIT_EARLY);

We iterate over all memblock regions, and if the node associated with the
region is online, we calculate the start and end memory blocks based on
the region's start and end PFNs.  We then register all the memory blocks
within that range under the region node.

Test Results on My system with 32TB RAM
=======================================
1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.

Without this patch
------------------
Startup finished in 1min 16.528s (kernel)

With this patch
---------------
Startup finished in 17.236s (kernel) - 78% Improvement

2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled.

Without this patch
------------------
Startup finished in 28.320s (kernel)

With this patch
---------------
Startup finished in 15.621s (kernel) - 46% Improvement

[donettom@linux.ibm.com: restore removed extra line]
  Link: https://lkml.kernel.org/r/20250609140354.467908-1-donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memory: implement memory_block_advise/probe_max_size</title>
<updated>2025-05-12T00:48:07Z</updated>
<author>
<name>Gregory Price</name>
<email>gourry@gourry.net</email>
</author>
<published>2025-01-27T15:34:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=737e9d021993c99377bb61e762c0d6945a37615b'/>
<id>urn:sha1:737e9d021993c99377bb61e762c0d6945a37615b</id>
<content type='text'>
Patch series "memory,x86,acpi: hotplug memory alignment advisement", v8.

When physical address regions are not aligned to memory block size, the
misaligned portion is lost (stranded capacity).

Block size (min/max/selected) is architecture defined.  Most architectures
tend to use the minimum block size or some simplistic heurist.  On x86,
memory block size increases up to 2GB, and is otherwise fitted to the
alignment of non-hotplug (i.e.  not special purpose memory).

CXL exposes its memory for management through the ACPI CEDT (CXL Early
Detection Table) in a field called the CXL Fixed Memory Window.  Per the
CXL specification, this memory must be aligned to at least 256MB.

When a CFMW aligns on a size less than the block size, this causes a loss
of up to 2GB per CFMW on x86.  It is not uncommon for CFMW to be allocated
per-device - though this behavior is BIOS defined.

This patch set provides 3 things:
 1) implement advise/query functions in driverse/base/memory.c to
    report/query architecture agnostic hotplug block alignment advice.
 2) update x86 memblock size logic to consider the hotplug advice
 3) add code in acpi/numa/srat.c to report CFMW alignment advice

The advisement interfaces are design to be called during arch_init code
prior to allocator and smp_init.  start_kernel will call these through
setup_arch() (via acpi and mm/init_64.c on x86), which occurs prior to
mm_core_init and smp_init - so no need for atomics.

There's an attempt to signal callers to advise() that query has already
occurred, but this is predicated on the notion that query actually occurs
(which presently only happens on the x86 arch).  This is to assist
debugging future users.  Otherwise, the advise() call has been marked
__init to help static discovery of bad call times.

Once query is called the first time, it will always return the same value.

Interfaces return -EBUSY and 0 respectively on systems without hotplug.


This patch (of 3):

Hotplug memory sources may have opinions on what the memblock size should
be - usually for alignment purposes.  For example, CXL memory extents can
be 256MB with a matching alignment.  If this size/alignment is smaller
than the block size, it can result in stranded capacity.

Implement memory_block_advise_max_size for use prior to allocator init,
for software to advise the system on the max block size.

Implement memory_block_probe_max_size for use by arch init code to
calculate the best block size.  Use of advice is architecture defined.

The probe value can never change after first probe.  Calls to advise after
probe will return -EBUSY to aid debugging.

On systems without hotplug, always return -ENODEV and 0 respectively.

Link: https://lkml.kernel.org/r/20250127153405.3379117-1-gourry@gourry.net
Link: https://lkml.kernel.org/r/20250127153405.3379117-2-gourry@gourry.net
Signed-off-by: Gregory Price &lt;gourry@gourry.net&gt;
Suggested-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Alison Schofield &lt;alison.schofield@intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Borislav Betkov &lt;bp@alien8.de&gt;
Cc: Bruno Faccini &lt;bfaccini@nvidia.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Haibo Xu &lt;haibo1.xu@intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Joanthan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Robert Richter &lt;rrichter@amd.com&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory: Avoid overhead from for_each_present_section_nr()</title>
<updated>2025-04-15T16:19:49Z</updated>
<author>
<name>Gavin Shan</name>
<email>gshan@redhat.com</email>
</author>
<published>2025-04-10T12:51:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b9792abb76ae1649080b8d48092c52e24c7bbdc2'/>
<id>urn:sha1:b9792abb76ae1649080b8d48092c52e24c7bbdc2</id>
<content type='text'>
for_each_present_section_nr() was introduced to add_boot_memory_block()
by commit 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()").
It causes unnecessary overhead when the present sections are really
sparse. next_present_section_nr() called by the macro to find the next
present section, which is far away from the spanning sections in the
specified block. Too much time consumed by next_present_section_nr()
in this case, which can lead to softlockup as observed by Aditya Gupta
on IBM Power10 machine.

  watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1]
  Modules linked in:
  CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY
  Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
  NIP:  c00000000209218c LR: c000000002092204 CTR: 0000000000000000
  REGS: c00040000418fa30 TRAP: 0900   Not tainted  (6.15.0-rc1-next-20250408)
  MSR:  9000000002009033 &lt;SF,HV,VEC,EE,ME,IR,DR,RI,LE&gt;  CR: 28000428  XER: 00000000
  CFAR: 0000000000000000 IRQMASK: 0
  GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
  GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80
  GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428
  GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000
  NIP [c00000000209218c] memory_dev_init+0x114/0x1e0
  LR [c000000002092204] memory_dev_init+0x18c/0x1e0
  Call Trace:
  [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
  [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
  [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
  [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
  [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c

Avoid the overhead by folding for_each_present_section_nr() to the outer
loop. add_boot_memory_block() is dropped after that.

Fixes: 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()")
Closes: https://lore.kernel.org/linux-mm/20250409180344.477916-1-adityag@linux.ibm.com
Reported-by: Aditya Gupta &lt;adityag@linux.ibm.com&gt;
Signed-off-by: Gavin Shan &lt;gshan@redhat.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Tested-by: Aditya Gupta &lt;adityag@linux.ibm.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: https://lore.kernel.org/r/20250410125110.1232329-1-gshan@redhat.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory: improve add_boot_memory_block()</title>
<updated>2025-03-18T05:07:01Z</updated>
<author>
<name>Gavin Shan</name>
<email>gshan@redhat.com</email>
</author>
<published>2025-03-11T23:30:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=61659efdb35ce6c6ac7639342098f3c4548b794b'/>
<id>urn:sha1:61659efdb35ce6c6ac7639342098f3c4548b794b</id>
<content type='text'>
Patch series "drivers/base/memory: Two cleanups", v3.

Two cleanups to drivers/base/memory.


This patch (of 2)L

It's unnecessary to count the present sections for the specified block
since the block will be added if any section in the block is present. 
Besides, for_each_present_section_nr() can be reused as Andrew Morton
suggested.

Improve by using for_each_present_section_nr() and dropping the
unnecessary @section_count.

No functional changes intended.

Link: https://lkml.kernel.org/r/20250311233045.148943-1-gshan@redhat.com
Link: https://lkml.kernel.org/r/20250311233045.148943-2-gshan@redhat.com
Signed-off-by: Gavin Shan &lt;gshan@redhat.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Danilo Krummrich &lt;dakr@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory: simplify outputting of valid_zones_show()</title>
<updated>2025-03-17T05:05:56Z</updated>
<author>
<name>Shiyang Ruan</name>
<email>ruansy.fnst@fujitsu.com</email>
</author>
<published>2025-01-08T01:52:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=100bc3b877fca43e46485883eaf179aec7a11c86'/>
<id>urn:sha1:100bc3b877fca43e46485883eaf179aec7a11c86</id>
<content type='text'>
No need to specify position at the first writing to the buf because the
@len is always 0 at this time.  Use sysfs_emit() instead to simplify it. 
Also avoid setting/checking default_zone with a conditional operator.

Link: https://lkml.kernel.org/r/20250108015223.1522887-1-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan &lt;ruansy.fnst@fujitsu.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add build-time option for hotplug memory default online type</title>
<updated>2025-01-26T04:22:21Z</updated>
<author>
<name>Gregory Price</name>
<email>gourry@gourry.net</email>
</author>
<published>2024-12-20T21:07:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=44d46b76c3a4b514a0cc9dab147ed430e5c1d699'/>
<id>urn:sha1:44d46b76c3a4b514a0cc9dab147ed430e5c1d699</id>
<content type='text'>
Memory hotplug presently auto-onlines memory into a zone the kernel deems
appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.

The memhp_default_state boot param enables runtime config, but it's not
possible to do this at build-time.

Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with
CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param.

Selections:
  CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE
    =&gt; mhp_default_online_type = "offline"
       Memory will not be onlined automatically.

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO
    =&gt; mhp_default_online_type = "online"
       Memory will be onlined automatically in a zone deemed.
       appropriate by the kernel.

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL
    =&gt; mhp_default_online_type = "online_kernel"
       Memory will be onlined automatically.
       The zone may allow kernel data (e.g. ZONE_NORMAL).

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE
    =&gt; mhp_default_online_type = "online_movable"
       Memory will be onlined automatically.
       The zone will be ZONE_MOVABLE.

Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing
default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior.

Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.

[gourry@gourry.net: update KConfig comments]
  Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net
Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net
Signed-off-by: Gregory Price &lt;gourry@gourry.net&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>crash: add a new kexec flag for hotplug support</title>
<updated>2024-04-23T04:59:01Z</updated>
<author>
<name>Sourabh Jain</name>
<email>sourabhjain@linux.ibm.com</email>
</author>
<published>2024-03-26T05:54:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=79365026f86948b52c3cb7bf099dded92c559b4c'/>
<id>urn:sha1:79365026f86948b52c3cb7bf099dded92c559b4c</id>
<content type='text'>
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
this flag to indicate to the kernel that it is safe to modify the
elfcorehdr of the kdump image loaded using the kexec_load system call.

However, it is possible that architectures may need to update kexec
segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
on PowerPC. Introducing a new kexec flag for every new kexec segment
may not be a good solution. Hence, a generic kexec flag bit,
`KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
hotplug support intent between the kexec tool and the kernel for the
kexec_load system call.

Now we have two kexec flags that enables crash hotplug support for
kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in
x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures).

To simplify the process of finding and reporting the crash hotplug
support the following changes are introduced.

1. Define arch specific function to process the kexec flags and
   determine crash hotplug support

2. Rename the @update_elfcorehdr member of struct kimage to
   @hotplug_support and populate it for both kexec_load and
   kexec_file_load syscalls, because architecture can update more than
   one kexec segment

3. Let generic function crash_check_hotplug_support report hotplug
   support for loaded kdump image based on value of @hotplug_support

To bring the x86 crash hotplug support in line with the above points,
the following changes have been made:

- Introduce the arch_crash_hotplug_support function to process kexec
  flags and determine crash hotplug support

- Remove the arch_crash_hotplug_[cpu|memory]_support functions

Signed-off-by: Sourabh Jain &lt;sourabhjain@linux.ibm.com&gt;
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Hari Bathini &lt;hbathini@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20240326055413.186534-3-sourabhjain@linux.ibm.com

</content>
</entry>
<entry>
<title>mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers</title>
<updated>2024-02-22T00:00:01Z</updated>
<author>
<name>Sumanth Korikkar</name>
<email>sumanthk@linux.ibm.com</email>
</author>
<published>2024-01-08T13:27:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c5f1e2d1890935a734c302b9b8579748222b8e1e'/>
<id>urn:sha1:c5f1e2d1890935a734c302b9b8579748222b8e1e</id>
<content type='text'>
Patch series "implement "memmap on memory" feature on s390".

This series provides "memmap on memory" support on s390 platform.  "memmap
on memory" allows struct pages array to be allocated from the hotplugged
memory range instead of allocating it from main system memory.

s390 currently preallocates struct pages array for all potentially
possible memory, which ensures memory onlining always succeeds, but with
the cost of significant memory consumption from the available system
memory during boottime.  In certain extreme configuration, this could lead
to ipl failure.

"memmap on memory" ensures struct pages array are populated from self
contained hotplugged memory range instead of depleting the available
system memory and this could eliminate ipl failure on s390 platform.

On other platforms, system might go OOM when the physically hotplugged
memory depletes the available memory before it is onlined.  Hence, "memmap
on memory" feature was introduced as described in commit a08a2ae34613
("mm,memory_hotplug: allocate memmap from the added memory range").

Unlike other architectures, s390 memory blocks are not physically
accessible until it is online.  To make it physically accessible two new
memory notifiers MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE are added and
this notifier lets the hypervisor inform that the memory should be made
physically accessible.  This allows for "memmap on memory" initialization
during memory hotplug onlining phase, which is performed before calling
MEM_GOING_ONLINE notifier.

Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers
to prepare the transition of memory to and from a physically accessible
state.  New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure
altmap cannot be written when adding memory - before it is set online. 
This enhancement is crucial for implementing the "memmap on memory"
feature for s390 in a subsequent patch.

Patches 2 allocates vmemmap pages from self-contained memory range for
s390.  It allocates memory map (struct pages array) from the hotplugged
memory range, rather than using system memory by passing altmap to vmemmap
functions.

Patch 3 removes unhandled memory notifier types on s390.

Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers
on s390.  MEM_PREPARE_ONLINE memory notifier makes memory block physical
accessible via sclp assign command.  The notifier ensures self-contained
memory maps are accessible and hence enabling the "memmap on memory" on
s390.  MEM_FINISH_OFFLINE memory notifier shifts the memory block to an
inaccessible state via sclp unassign command.

Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390.


This patch (of 5):

Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to
prepare the transition of memory to and from a physically accessible
state.  This enhancement is crucial for implementing the "memmap on
memory" feature for s390 in a subsequent patch.

Platforms such as x86 can support physical memory hotplug via ACPI.  When
there is physical memory hotplug, ACPI event leads to the memory addition
with the following callchain:

acpi_memory_device_add()
  -&gt; acpi_memory_enable_device()
     -&gt; __add_memory()

After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.

On s390, memory hotplug works in a different way.  The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier.  This is too late and "memmap
on memory" initialization is performed before calling MEM_GOING_ONLINE
notifier.

During the memory hotplug addition phase, altmap support is prepared and
during the memory onlining phase s390 requires memory to be physically
accessible and then subsequently initiate the "memmap on memory"
initialization process.

The memory provider will handle new MEM_PREPARE_ONLINE /
MEM_FINISH_OFFLINE notifications and make the memory accessible.

The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when
used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written
(e.g., poisoned) when adding memory -- before it is set online.  This
allows for adding memory with an altmap that is not currently made
available by a hypervisor.  When onlining that memory, the hypervisor can
be instructed to make that memory accessible via the new notifiers and the
onlining phase will not require any memory allocations, which is helpful
in low-memory situations.

All architectures ignore unknown memory notifiers.  Therefore, the
introduction of these new notifiers does not result in any functional
modifications across architectures.

Link: https://lkml.kernel.org/r/20240108132747.3238763-1-sumanthk@linux.ibm.com
Link: https://lkml.kernel.org/r/20240108132747.3238763-2-sumanthk@linux.ibm.com
Signed-off-by: Sumanth Korikkar &lt;sumanthk@linux.ibm.com&gt;
Suggested-by: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Suggested-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core</title>
<updated>2024-01-18T17:48:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-18T17:48:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=80955ae955d15ea5c11d55cd50032a5243a6dfd6'/>
<id>urn:sha1:80955ae955d15ea5c11d55cd50032a5243a6dfd6</id>
<content type='text'>
Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
</content>
</entry>
<entry>
<title>driver core: mark remaining local bus_type variables as const</title>
<updated>2023-12-21T12:56:30Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2023-12-19T15:35:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=580fc9c750fde7404f2d726637135fb785c67e86'/>
<id>urn:sha1:580fc9c750fde7404f2d726637135fb785c67e86</id>
<content type='text'>
Now that the driver core can properly handle constant struct bus_type,
change the local driver core bus_type variables to be a constant
structure as well, placing them into read-only memory which can not be
modified at runtime.

Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Kevin Hilman &lt;khilman@kernel.org&gt;
Cc: Ulf Hansson &lt;ulf.hansson@linaro.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Acked-by: William Breathitt Gray &lt;william.gray@linaro.org&gt;
Acked-by: Dave Ertman &lt;david.m.ertman@intel.com&gt;
Link: https://lore.kernel.org/r/2023121908-paver-follow-cc21@gregkh
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
