<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/memory_hotplug.c, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-06-26T07:27:38Z</updated>
<entry>
<title>mm/memory_hotplug.c: fix false softlockup during pfn range removal</title>
<updated>2020-06-26T07:27:38Z</updated>
<author>
<name>Ben Widawsky</name>
<email>ben.widawsky@intel.com</email>
</author>
<published>2020-06-26T03:30:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b7e3debdd0408c0dca5d4750371afa5003f792dc'/>
<id>urn:sha1:b7e3debdd0408c0dca5d4750371afa5003f792dc</id>
<content type='text'>
When working with very large nodes, poisoning the struct pages (for which
there will be very many) can take a very long time.  If the system is
using voluntary preemptions, the software watchdog will not be able to
detect forward progress.  This patch addresses this issue by offering to
give up time like __remove_pages() does.  This behavior was introduced in
v5.6 with: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in
remove_pfn_range_from_zone()")

Alternately, init_page_poison could do this cond_resched(), but it seems
to me that the caller of init_page_poison() is what actually knows whether
or not it should relax its own priority.

Based on Dan's notes, I think this is perfectly safe: commit f931ab479dd2
("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

Aside from fixing the lockup, it is also a friendlier thing to do on lower
core systems that might wipe out large chunks of hotplug memory (probably
not a very common case).

Fixes this kind of splat:

  watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
  irq event stamp: 138450
  hardirqs last  enabled at (138449): [&lt;ffffffffa1001f26&gt;] trace_hardirqs_on_thunk+0x1a/0x1c
  hardirqs last disabled at (138450): [&lt;ffffffffa1001f42&gt;] trace_hardirqs_off_thunk+0x1a/0x1c
  softirqs last  enabled at (138448): [&lt;ffffffffa1e00347&gt;] __do_softirq+0x347/0x456
  softirqs last disabled at (138443): [&lt;ffffffffa10c416d&gt;] irq_exit+0x7d/0xb0
  CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
  Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
  RIP: 0010:memset_erms+0x9/0x10
  Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 &lt;f3&gt; aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
  Call Trace:
   remove_pfn_range_from_zone+0x3a/0x380
   memunmap_pages+0x17f/0x280
   release_nodes+0x22a/0x260
   __device_release_driver+0x172/0x220
   device_driver_detach+0x3e/0xa0
   unbind_store+0x113/0x130
   kernfs_fop_write+0xdc/0x1c0
   vfs_write+0xde/0x1d0
   ksys_write+0x58/0xd0
   do_syscall_64+0x5a/0x120
   entry_SYSCALL_64_after_hwframe+0x49/0xb3
  Built 2 zonelists, mobility grouping on.  Total pages: 49050381
  Policy zone: Normal
  Built 3 zonelists, mobility grouping on.  Total pages: 49312525
  Policy zone: Normal

David said: "It really only is an issue for devmem.  Ordinary
hotplugged system memory is not affected (onlined/offlined in memory
block granularity)."

Link: http://lkml.kernel.org/r/20200619231213.1160351-1-ben.widawsky@intel.com
Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
Signed-off-by: Ben Widawsky &lt;ben.widawsky@intel.com&gt;
Reported-by: "Scargall, Steve" &lt;steve.scargall@intel.com&gt;
Reported-by: Ben Widawsky &lt;ben.widawsky@intel.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost</title>
<updated>2020-06-10T20:42:09Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-10T20:42:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=09102704c67457c6cdea6c0394c34843484a852c'/>
<id>urn:sha1:09102704c67457c6cdea6c0394c34843484a852c</id>
<content type='text'>
Pull virtio updates from Michael Tsirkin:

 - virtio-mem: paravirtualized memory hotplug

 - support doorbell mapping for vdpa

 - config interrupt support in ifc

 - fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  vhost/test: fix up after API change
  virtio_mem: convert device block size into 64bit
  virtio-mem: drop unnecessary initialization
  ifcvf: implement config interrupt in IFCVF
  vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
  vhost_vdpa: Support config interrupt in vdpa
  ifcvf: ignore continuous setting same status value
  virtio-mem: Don't rely on implicit compiler padding for requests
  virtio-mem: Try to unplug the complete online memory block first
  virtio-mem: Use -ETXTBSY as error code if the device is busy
  virtio-mem: Unplug subblocks right-to-left
  virtio-mem: Drop manual check for already present memory
  virtio-mem: Add parent resource for all added "System RAM"
  virtio-mem: Better retry handling
  virtio-mem: Offline and remove completely unplugged memory blocks
  mm/memory_hotplug: Introduce offline_and_remove_memory()
  virtio-mem: Allow to offline partially unplugged memory blocks
  mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
  virtio-mem: Paravirtualized memory hotunplug part 2
  virtio-mem: Paravirtualized memory hotunplug part 1
  ...
</content>
</entry>
<entry>
<title>mm/memory_hotplug: fix a typo in comment "recoreded"-&gt;"recorded"</title>
<updated>2020-06-05T02:06:23Z</updated>
<author>
<name>Ethon Paul</name>
<email>ethp@qq.com</email>
</author>
<published>2020-06-04T23:48:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=52cfc24578c32aa4d604fcb38e081ed0eb4cfb5c'/>
<id>urn:sha1:52cfc24578c32aa4d604fcb38e081ed0eb4cfb5c</id>
<content type='text'>
There is a typo in comment, fix it.
s/recoreded/recorded

Signed-off-by: Ethon Paul &lt;ethp@qq.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Link: http://lkml.kernel.org/r/20200410160328.13843-1-ethp@qq.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: introduce add_memory_driver_managed()</title>
<updated>2020-06-05T02:06:23Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-06-04T23:48:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7b7b27214bba1966772f9213cd2d8e5d67f8487f'/>
<id>urn:sha1:7b7b27214bba1966772f9213cd2d8e5d67f8487f</id>
<content type='text'>
Patch series "mm/memory_hotplug: Interface to add driver-managed system
ram", v4.

kexec (via kexec_load()) can currently not properly handle memory added
via dax/kmem, and will have similar issues with virtio-mem.  kexec-tools
will currently add all memory to the fixed-up initial firmware memmap.  In
case of dax/kmem, this means that - in contrast to a proper reboot - how
that persistent memory will be used can no longer be configured by the
kexec'd kernel.  In case of virtio-mem it will be harmful, because that
memory might contain inaccessible pieces that require coordination with
hypervisor first.

In both cases, we want to let the driver in the kexec'd kernel handle
detecting and adding the memory, like during an ordinary reboot.
Introduce add_memory_driver_managed().  More on the samentics are in patch
#1.

In the future, we might want to make this behavior configurable for
dax/kmem- either by configuring it in the kernel (which would then also
allow to configure kexec_file_load()) or in kexec-tools by also adding
"System RAM (kmem)" memory from /proc/iomem to the fixed-up initial
firmware memmap.

More on the motivation can be found in [1] and [2].

[1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com
[2] https://lkml.kernel.org/r/20200430102908.10107-1-david@redhat.com

This patch (of 3):

Some device drivers rely on memory they managed to not get added to the
initial (firmware) memmap as system RAM - so it's not used as initial
system RAM by the kernel and the driver is under control.  While this is
the case during cold boot and after a reboot, kexec is not aware of that
and might add such memory to the initial (firmware) memmap of the kexec
kernel.  We need ways to teach kernel and userspace that this system ram
is different.

For example, dax/kmem allows to decide at runtime if persistent memory is
to be used as system ram.  Another future user is virtio-mem, which has to
coordinate with its hypervisor to deal with inaccessible parts within
memory resources.

We want to let users in the kernel (esp. kexec) but also user space
(esp. kexec-tools) know that this memory has different semantics and
needs to be handled differently:
1. Don't create entries in /sys/firmware/memmap/
2. Name the memory resource "System RAM ($DRIVER)" (exposed via
   /proc/iomem) ($DRIVER might be "kmem", "virtio_mem").
3. Flag the memory resource IORESOURCE_MEM_DRIVER_MANAGED

/sys/firmware/memmap/ [1] represents the "raw firmware-provided memory
map" because "on most architectures that firmware-provided memory map is
modified afterwards by the kernel itself".  The primary user is kexec on
x86-64.  Since commit d96ae5309165 ("memory-hotplug: create
/sys/firmware/memmap entry for new memory"), we add all hotplugged memory
to that firmware memmap - which makes perfect sense for traditional memory
hotplug on x86-64, where real HW will also add hotplugged DIMMs to the
firmware memmap.  We replicate what the "raw firmware-provided memory map"
looks like after hot(un)plug.

To keep things simple, let the user provide the full resource name instead
of only the driver name - this way, we don't have to manually
allocate/craft strings for memory resources.  Also use the resource name
to make decisions, to avoid passing additional flags.  In case the name
isn't "System RAM", it's special.

We don't have to worry about firmware_map_remove() on the removal path.
If there is no entry, it will simply return with -EINVAL.

We'll adapt dax/kmem in a follow-up patch.

[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap

Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: http://lkml.kernel.org/r/20200508084217.9160-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200508084217.9160-3-david@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK</title>
<updated>2020-06-05T02:06:23Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-06-04T23:48:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=52219aeaf2dc6f7607704af2c40e3866fb04aed2'/>
<id>urn:sha1:52219aeaf2dc6f7607704af2c40e3866fb04aed2</id>
<content type='text'>
The comment in add_memory_resource() is stale: hotadd_new_pgdat() will no
longer call get_pfn_range_for_nid(), as a hotadded pgdat will simply span
no pages at all, until memory is moved to the zone/node via
move_pfn_range_to_zone() - e.g., when onlining memory blocks.

The only archs that care about memblocks for hotplugged memory (either for
iterating over all system RAM or testing for memory validity) are arm64,
s390x, and powerpc - due to CONFIG_ARCH_KEEP_MEMBLOCK.  Without
CONFIG_ARCH_KEEP_MEMBLOCK, we can simply stop messing with memblocks.

Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Link: http://lkml.kernel.org/r/20200422155353.25381-3-david@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: set node_start_pfn of hotadded pgdat to 0</title>
<updated>2020-06-05T02:06:23Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-06-04T23:48:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c68ab18c6aee0397574afb418f6775f23379198e'/>
<id>urn:sha1:c68ab18c6aee0397574afb418f6775f23379198e</id>
<content type='text'>
Patch series "mm/memory_hotplug: handle memblocks only with
CONFIG_ARCH_KEEP_MEMBLOCK", v1.

A hotadded node/pgdat will span no pages at all, until memory is moved to
the zone/node via move_pfn_range_to_zone() -&gt; resize_pgdat_range - e.g.,
when onlining memory blocks.  We don't have to initialize the
node_start_pfn to the memory we are adding.

This patch (of 2):

Especially, there is an inconsistency:
 - Hotplugging memory to a memory-less node with cpus: node_start_pf ==  0
 - Offlining and removing last memory from a node: node_start_pfn == 0
 - Hotplugging memory to a memory-less node without cpus: node_start_pfn != 0

As soon as memory is onlined, node_start_pfn is overwritten with the
actual start.  E.g., when adding two DIMMs but only onlining one of both,
only that DIMM (with online memory blocks) is spanned by the node.

Currently, the validity of node_start_pfn really is linked to
node_spanned_pages != 0.  With node_spanned_pages == 0 (e.g., before
onlining memory), it has no meaning.

So let's stop setting node_start_pfn, just to be overwritten via
move_pfn_range_to_zone().  This avoids confusion when looking at the code,
wondering which magic will be performed with the node_start_pfn in this
function, when hotadding a pgdat.

Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Link: http://lkml.kernel.org/r/20200422155353.25381-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200422155353.25381-2-david@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: remove is_mem_section_removable()</title>
<updated>2020-06-05T02:06:23Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-06-04T23:48:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04f3465c98665b7c5a3484d7194f1858954069f5'/>
<id>urn:sha1:04f3465c98665b7c5a3484d7194f1858954069f5</id>
<content type='text'>
Fortunately, all users of is_mem_section_removable() are gone.  Get rid of
it, including some now unnecessary functions.

Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Link: http://lkml.kernel.org/r/20200407135416.24093-3-david@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: refrain from adding memory into an impossible node</title>
<updated>2020-06-05T02:06:23Z</updated>
<author>
<name>Vishal Verma</name>
<email>vishal.l.verma@intel.com</email>
</author>
<published>2020-06-04T23:48:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa6d9ec790550b758215b6c6fa9f940878c3e2a2'/>
<id>urn:sha1:fa6d9ec790550b758215b6c6fa9f940878c3e2a2</id>
<content type='text'>
A misbehaving qemu created a situation where the ACPI SRAT table
advertised one fewer proximity domains than intended.  The NFIT table did
describe all the expected proximity domains.  This caused the device dax
driver to assign an impossible target_node to the device, and when
hotplugged as system memory, this would fail with the following signature:

   BUG: kernel NULL pointer dereference, address: 0000000000000088
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 80000001767d4067 P4D 80000001767d4067 PUD 10e0c4067 PMD 0
   Oops: 0000 [#1] SMP PTI
   CPU: 4 PID: 22737 Comm: kswapd3 Tainted: G           O      5.6.0-rc5 #9
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
   RIP: 0010:prepare_kswapd_sleep+0x7c/0xc0
   Code: 89 df e8 87 fd ff ff 89 c2 31 c0 84 d2 74 e6 0f 1f 44 00 00 48 8b 05 fb af 7a 01 48 63 93 88 1d 01 00 48 8b 84 d0 20 0f 00 00 &lt;48&gt; 3b 98 88 00 00 00 75 28 f0 80 a0 80 00 00 00 fe f0 80 a3 38 20
   RSP: 0018:ffffc900017a3e78 EFLAGS: 00010202
   RAX: 0000000000000000 RBX: ffff8881209e0000 RCX: 0000000000000000
   RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff8881209e0e80
   RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000008000
   R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000003
   R13: 0000000000000003 R14: 0000000000000000 R15: ffffc900017a3ec8
   FS:  0000000000000000(0000) GS:ffff888318c00000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 0000000000000088 CR3: 0000000120b50002 CR4: 00000000001606e0
   Call Trace:
    kswapd+0x103/0x520
    kthread+0x120/0x140
    ret_from_fork+0x3a/0x50

Add a check in the add_memory path to fail if the node to which we are
adding memory is in the node_possible_map

Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Link: http://lkml.kernel.org/r/20200416225438.15208-1-vishal.l.verma@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: Introduce offline_and_remove_memory()</title>
<updated>2020-06-04T19:36:52Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-05-07T14:01:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=08b3acd7a68fc17902e1cb6b146389322840deab'/>
<id>urn:sha1:08b3acd7a68fc17902e1cb6b146389322840deab</id>
<content type='text'>
virtio-mem wants to offline and remove a memory block once it unplugged
all subblocks (e.g., using alloc_contig_range()). Let's provide
an interface to do that from a driver. virtio-mem already supports to
offline partially unplugged memory blocks. Offlining a fully unplugged
memory block will not require to migrate any pages. All unplugged
subblocks are PageOffline() and have a reference count of 0 - so
offlining code will simply skip them.

All we need is an interface to offline and remove the memory from kernel
module context, where we don't have access to the memory block devices
(esp. find_memory_block() and device_offline()) and the device hotplug
lock.

To keep things simple, allow to only work on a single memory block.

Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: https://lore.kernel.org/r/20200507140139.17083-9-david@redhat.com
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE</title>
<updated>2020-06-04T19:36:52Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-05-07T14:01:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa218795cb5fd583c94fc838dc76b7379dc4976a'/>
<id>urn:sha1:aa218795cb5fd583c94fc838dc76b7379dc4976a</id>
<content type='text'>
virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged (allocated via alloc_contig_range()), especially, to later
offline and remove completely unplugged memory blocks. The important part
is that PageOffline() has to remain set until the section is offline, so
these pages will never get accessed (e.g., when dumping). The pages should
not be handed back to the buddy (which would require clearing PageOffline()
and result in issues if offlining fails and the pages are suddenly in the
buddy).

Let's allow to do that by allowing to isolate any PageOffline() page
when offlining. This way, we can reach the memory hotplug notifier
MEM_GOING_OFFLINE, where the driver can signal that he is fine with
offlining this page by dropping its reference count. PageOffline() pages
with a reference count of 0 can then be skipped when offlining the
pages (like if they were free, however they are not in the buddy).

Anybody who uses PageOffline() pages and does not agree to offline them
(e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
decrement the reference count and make offlining fail when trying to
migrate such an unmovable page. So there should be no observable change.
Same applies to balloon compaction users (movable PageOffline() pages), the
pages will simply be migrated.

Note 1: If offlining fails, a driver has to increment the reference
	count again in MEM_CANCEL_OFFLINE.

Note 2: A driver that makes use of this has to be aware that re-onlining
	the memory block has to be handled by hooking into onlining code
	(online_page_callback_t), resetting the page PageOffline() and
	not giving them to the buddy.

Reviewed-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: https://lore.kernel.org/r/20200507140139.17083-7-david@redhat.com
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
</feed>
