<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/base/memory.c, branch v5.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-12-01T20:59:04Z</updated>
<entry>
<title>drivers/base/memory.c: drop the mem_sysfs_mutex</title>
<updated>2019-12-01T20:59:04Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-12-01T01:54:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=848e19ad3c3352b6e0906f05b282a3e22c67c98f'/>
<id>urn:sha1:848e19ad3c3352b6e0906f05b282a3e22c67c98f</id>
<content type='text'>
The mem_sysfs_mutex isn't really helpful.  Also, it's not really clear
what the mutex protects at all.

The device lists of the memory subsystem are protected separately.  We
don't need that mutex when looking up.  creating, or removing
independent devices.  find_memory_block_by_id() will perform locking on
its own and grab a reference of the returned device.

At the time memory_dev_init() is called, we cannot have concurrent
hot(un)plug operations yet - we're still fairly early during boot.  We
don't need any locking.

The creation/removal of memory block devices should be protected on a
higher level - especially using the device hotplug lock to avoid
documented issues (see Documentation/core-api/memory-hotplug.rst) - or
if that is reworked, using similar locking.

Protecting in the context of these functions only doesn't really make
sense.  Especially, if we would have a situation where the same memory
blocks are created/deleted at the same time, there is something horribly
going wrong (imagining adding/removing a DIMM at the same time from two
call paths) - after the functions succeeded something else in the
callers would blow up (e.g., create_memory_block_devices() succeeded but
there are no memory block devices anymore).

All relevant call paths (except when adding memory early during boot via
ACPI, which is now documented) hold the device hotplug lock when adding
memory, and when removing memory.  Let's document that instead.

Add a simple safety net to create_memory_block_devices() in case we
would actually remove memory blocks while adding them, so we'll never
dereference a NULL pointer.  Simplify memory_dev_init() now that the
lock is gone.

Link: http://lkml.kernel.org/r/20190925082621.4927-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, soft-offline: convert parameter to pfn</title>
<updated>2019-12-01T20:59:04Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>nao.horiguchi@gmail.com</email>
</author>
<published>2019-12-01T01:53:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=feec24a6139d4640c6ef344e0271a8cd4d509e60'/>
<id>urn:sha1:feec24a6139d4640c6ef344e0271a8cd4d509e60</id>
<content type='text'>
Currently soft_offline_page() receives struct page, and its sibling
memory_failure() receives pfn.  This discrepancy looks weird and makes
precheck on pfn validity tricky.  So let's align them.

Link: http://lkml.kernel.org/r/20191016234706.GA5493@www9186uo.sakura.ne.jp
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: fix try_offline_node()</title>
<updated>2019-11-16T02:34:00Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-11-16T01:34:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2c91f8fc6c999fe10185d8ad99fda1759f662f70'/>
<id>urn:sha1:2c91f8fc6c999fe10185d8ad99fda1759f662f70</id>
<content type='text'>
try_offline_node() is pretty much broken right now:

 - The node span is updated when onlining memory, not when adding it. We
   ignore memory that was mever onlined. Bad.

 - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
   trigger a kernel panic. Bad for memory that is offline but also bad
   for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
   first PFN of a section might contain garbage.

 - Sections belonging to mixed nodes are not properly considered.

As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE).  This makes things
more complicated.

Luckily, we can piggy pack on the node span and the nid stored in memory
blocks.  Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node().  Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.

If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline.  As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).

Introduce for_each_memory_block() to efficiently walk all memory blocks.

Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized.  This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.

Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized).  The introducing
commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.

I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node.  The node is properly onlined when adding the DIMMs.  When
removing the DIMMs, the node is properly offlined.

Masayoshi Mizuma reported:

: Without this patch, memory hotplug fails as panic:
:
:  BUG: kernel NULL pointer dereference, address: 0000000000000000
:  ...
:  Call Trace:
:   remove_memory_block_devices+0x81/0xc0
:   try_remove_memory+0xb4/0x130
:   __remove_memory+0xa/0x20
:   acpi_memory_device_remove+0x84/0x100
:   acpi_bus_trim+0x57/0x90
:   acpi_bus_trim+0x2e/0x90
:   acpi_device_hotplug+0x2b2/0x4d0
:   acpi_hotplug_work_fn+0x1a/0x30
:   process_one_work+0x171/0x380
:   worker_thread+0x49/0x3f0
:   kthread+0xf8/0x130
:   ret_from_fork+0x35/0x40

[david@redhat.com: v3]
  Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Tested-by: Masayoshi Mizuma &lt;m.mizuma@jp.fujitsu.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: "Peter Zijlstra (Intel)" &lt;peterz@infradead.org&gt;
Cc: Jani Nikula &lt;jani.nikula@intel.com&gt;
Cc: Nayna Jain &lt;nayna@linux.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()</title>
<updated>2019-10-19T10:32:31Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-10-19T03:19:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=641fe2e9387a36f9ee01d7c69382d1fe147a5e98'/>
<id>urn:sha1:641fe2e9387a36f9ee01d7c69382d1fe147a5e98</id>
<content type='text'>
Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING.  They should not get touched.

Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:

  :/# echo 5637144576 &gt; /sys/devices/system/memory/soft_offline_page
  [   23.097167] soft offline: 0x150000 page already poisoned

But the actual result depends on the garbage in the memmap.

soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE.  Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.

Add a check against pfn_to_online_page() and similarly return -EIO.

Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.13+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory.c: don't store end_section_nr in memory blocks</title>
<updated>2019-09-24T22:54:09Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-09-23T22:35:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b6c88d3b9d38f9448e0fcf44847a075ea81d5ca2'/>
<id>urn:sha1:b6c88d3b9d38f9448e0fcf44847a075ea81d5ca2</id>
<content type='text'>
Each memory block spans the same amount of sections/pages/bytes.  The size
is determined before the first memory block is created.  No need to store
what we can easily calculate - and the calculations even look simpler now.

Michal brought up the idea of variable-sized memory blocks.  However, if
we ever implement something like this, we will need an API compatibility
switch and reworks at various places (most code assumes a fixed memory
block size).  So let's cleanup what we have right now.

While at it, fix the variable naming in register_mem_sect_under_node() -
we no longer talk about a single section.

Link: http://lkml.kernel.org/r/20190809110200.2746-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>driver/base/memory.c: validate memory block size early</title>
<updated>2019-09-24T22:54:09Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-09-23T22:35:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=902ce63b337381092ff865f542e854ff3d0ebe2b'/>
<id>urn:sha1:902ce63b337381092ff865f542e854ff3d0ebe2b</id>
<content type='text'>
Let's validate the memory block size early, when initializing the memory
device infrastructure.  Fail hard in case the value is not suitable.

As nobody checks the return value of memory_dev_init(), turn it into a
void function and fail with a panic in all scenarios instead.  Otherwise,
we'll crash later during boot when core/drivers expect that the memory
device infrastructure (including memory_block_size_bytes()) works as
expected.

I think long term, we should move the whole memory block size
configuration (set_memory_block_size_order() and
memory_block_size_bytes()) into drivers/base/memory.c.

Link: http://lkml.kernel.org/r/20190806090142.22709-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory.c: fixup documentation of removable/phys_index/block_size_bytes</title>
<updated>2019-09-24T22:54:09Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-09-23T22:35:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f915fb7fb2c1c57be05880012b46d2edd5124797'/>
<id>urn:sha1:f915fb7fb2c1c57be05880012b46d2edd5124797</id>
<content type='text'>
Let's rephrase to memory block terminology and add some further
clarifications.

Link: http://lkml.kernel.org/r/20190806080826.5963-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/node.c: simplify unregister_memory_block_under_nodes()</title>
<updated>2019-09-24T22:54:09Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-09-23T22:35:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d84f2f5a755208da3f93e17714631485cb3da11c'/>
<id>urn:sha1:d84f2f5a755208da3f93e17714631485cb3da11c</id>
<content type='text'>
We don't allow to offline memory block devices that belong to multiple
numa nodes.  Therefore, such devices can never get removed.  It is
sufficient to process a single node when removing the memory block.  No
need to iterate over each and every PFN.

We already have the nid stored for each memory block.  Make sure that the
nid always has a sane value.

Please note that checking for node_online(nid) is not required.  If we
would have a memory block belonging to a node that is no longer offline,
then we would have a BUG in the node offlining code.

Link: http://lkml.kernel.org/r/20190719135244.15242-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers/base/memory.c: get rid of find_memory_block_hinted()</title>
<updated>2019-07-19T00:08:07Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dd625285910d3cff535fa76355e49949513918a4'/>
<id>urn:sha1:dd625285910d3cff535fa76355e49949513918a4</id>
<content type='text'>
No longer needed, let's remove it.  Also, drop the "hint" parameter
completely from "find_memory_block_by_id", as nobody needs it anymore.

[david@redhat.com: v3]
  Link: http://lkml.kernel.org/r/20190620183139.4352-7-david@redhat.com
[david@redhat.com: handle zero-length walks]
  Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
Link: http://lkml.kernel.org/r/20190614100114.311-7-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Qian Cai &lt;cai@lca.pw&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Mike Travis &lt;mike.travis@hpe.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: move and simplify walk_memory_blocks()</title>
<updated>2019-07-19T00:08:06Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea8846411ad686ff626e00bb2c3821b3db2ab56a'/>
<id>urn:sha1:ea8846411ad686ff626e00bb2c3821b3db2ab56a</id>
<content type='text'>
Let's move walk_memory_blocks() to the place where memory block logic
resides and simplify it.  While at it, add a type for the callback
function.

Link: http://lkml.kernel.org/r/20190614100114.311-6-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Mike Travis &lt;mike.travis@hpe.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
