<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/page_alloc.c, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-07-03T23:15:25Z</updated>
<entry>
<title>mm/page_alloc: fix documentation error</title>
<updated>2020-07-03T23:15:25Z</updated>
<author>
<name>Joel Savitz</name>
<email>jsavitz@redhat.com</email>
</author>
<published>2020-07-03T22:15:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8beeae86b8e19a4714531f8764b68bd828ffba2a'/>
<id>urn:sha1:8beeae86b8e19a4714531f8764b68bd828ffba2a</id>
<content type='text'>
When I increased the upper bound of the min_free_kbytes value in
ee8eb9a5fe863 ("mm/page_alloc: increase default min_free_kbytes bound") I
forgot to tweak the above comment to reflect the new value.  This patch
fixes that mistake.

Signed-off-by: Joel Savitz &lt;jsavitz@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
Cc: Fabrizio D'Angelo &lt;fdangelo@redhat.com&gt;
Link: http://lkml.kernel.org/r/20200624221236.29560-1-jsavitz@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost</title>
<updated>2020-06-10T20:42:09Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-10T20:42:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=09102704c67457c6cdea6c0394c34843484a852c'/>
<id>urn:sha1:09102704c67457c6cdea6c0394c34843484a852c</id>
<content type='text'>
Pull virtio updates from Michael Tsirkin:

 - virtio-mem: paravirtualized memory hotplug

 - support doorbell mapping for vdpa

 - config interrupt support in ifc

 - fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  vhost/test: fix up after API change
  virtio_mem: convert device block size into 64bit
  virtio-mem: drop unnecessary initialization
  ifcvf: implement config interrupt in IFCVF
  vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
  vhost_vdpa: Support config interrupt in vdpa
  ifcvf: ignore continuous setting same status value
  virtio-mem: Don't rely on implicit compiler padding for requests
  virtio-mem: Try to unplug the complete online memory block first
  virtio-mem: Use -ETXTBSY as error code if the device is busy
  virtio-mem: Unplug subblocks right-to-left
  virtio-mem: Drop manual check for already present memory
  virtio-mem: Add parent resource for all added "System RAM"
  virtio-mem: Better retry handling
  virtio-mem: Offline and remove completely unplugged memory blocks
  mm/memory_hotplug: Introduce offline_and_remove_memory()
  virtio-mem: Allow to offline partially unplugged memory blocks
  mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
  virtio-mem: Paravirtualized memory hotunplug part 2
  virtio-mem: Paravirtualized memory hotunplug part 1
  ...
</content>
</entry>
<entry>
<title>kernel/sysctl: support handling command line aliases</title>
<updated>2020-06-08T18:05:56Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2020-06-08T04:40:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0a477e1ae21b28267b9bd8599f75c115291b1666'/>
<id>urn:sha1:0a477e1ae21b28267b9bd8599f75c115291b1666</id>
<content type='text'>
We can now handle sysctl parameters on kernel command line, but
historically some parameters introduced their own command line
equivalent, which we don't want to remove for compatibility reasons.

We can, however, convert them to the generic infrastructure with a table
translating the legacy command line parameters to their sysctl names,
and removing the one-off param handlers.

This patch adds the support and makes the first conversion to
demonstrate it, on the (deprecated) numa_zonelist_order parameter.

Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Guilherme G . Piccoli" &lt;gpiccoli@canonical.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Ivan Teterevkov &lt;ivan.teterevkov@nutanix.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20200427180433.7029-3-vbabka@suse.cz
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE</title>
<updated>2020-06-04T19:36:52Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-05-07T14:01:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa218795cb5fd583c94fc838dc76b7379dc4976a'/>
<id>urn:sha1:aa218795cb5fd583c94fc838dc76b7379dc4976a</id>
<content type='text'>
virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged (allocated via alloc_contig_range()), especially, to later
offline and remove completely unplugged memory blocks. The important part
is that PageOffline() has to remain set until the section is offline, so
these pages will never get accessed (e.g., when dumping). The pages should
not be handed back to the buddy (which would require clearing PageOffline()
and result in issues if offlining fails and the pages are suddenly in the
buddy).

Let's allow to do that by allowing to isolate any PageOffline() page
when offlining. This way, we can reach the memory hotplug notifier
MEM_GOING_OFFLINE, where the driver can signal that he is fine with
offlining this page by dropping its reference count. PageOffline() pages
with a reference count of 0 can then be skipped when offlining the
pages (like if they were free, however they are not in the buddy).

Anybody who uses PageOffline() pages and does not agree to offline them
(e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
decrement the reference count and make offlining fail when trying to
migrate such an unmovable page. So there should be no observable change.
Same applies to balloon compaction users (movable PageOffline() pages), the
pages will simply be migrated.

Note 1: If offlining fails, a driver has to increment the reference
	count again in MEM_CANCEL_OFFLINE.

Note 2: A driver that makes use of this has to be aware that re-onlining
	the memory block has to be handled by hooking into onlining code
	(online_page_callback_t), resetting the page PageOffline() and
	not giving them to the buddy.

Reviewed-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: https://lore.kernel.org/r/20200507140139.17083-7-david@redhat.com
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio-mem: Paravirtualized memory hotunplug part 2</title>
<updated>2020-06-04T19:36:52Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2020-05-07T14:01:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=255f598507083905995ecab96392770ae03aac7f'/>
<id>urn:sha1:255f598507083905995ecab96392770ae03aac7f</id>
<content type='text'>
We also want to unplug online memory (contained in online memory blocks
and, therefore, managed by the buddy), and eventually replug it later.

When requested to unplug memory, we use alloc_contig_range() to allocate
subblocks in online memory blocks (so we are the owner) and send them to
our hypervisor. When requested to plug memory, we can replug such memory
using free_contig_range() after asking our hypervisor.

We also want to mark all allocated pages PG_offline, so nobody will
touch them. To differentiate pages that were never onlined when
onlining the memory block from pages allocated via alloc_contig_range(), we
use PageDirty(). Based on this flag, virtio_mem_fake_online() can either
online the pages for the first time or use free_contig_range().

It is worth noting that there are no guarantees on how much memory can
actually get unplugged again. All device memory might completely be
fragmented with unmovable data, such that no subblock can get unplugged.

We are not touching the ZONE_MOVABLE. If memory is onlined to the
ZONE_MOVABLE, it can only get unplugged after that memory was offlined
manually by user space. In normal operation, virtio-mem memory is
suggested to be onlined to ZONE_NORMAL. In the future, we will try to
make unplug more likely to succeed.

Add a module parameter to control if online memory shall be touched.

As we want to access alloc_contig_range()/free_contig_range() from
kernel module context, export the symbols.

Note: Whenever virtio-mem uses alloc_contig_range(), all affected pages
are on the same node, in the same zone, and contain no holes.

Acked-by: Michal Hocko &lt;mhocko@suse.com&gt; # to export contig range allocator API
Tested-by: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Igor Mammedov &lt;imammedo@redhat.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: https://lore.kernel.org/r/20200507140139.17083-6-david@redhat.com
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2020-06-04T03:24:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-04T03:24:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee01c4d72adffb7d424535adf630f2955748fa8b'/>
<id>urn:sha1:ee01c4d72adffb7d424535adf630f2955748fa8b</id>
<content type='text'>
Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
</content>
</entry>
<entry>
<title>mm/vmscan.c: change prototype for shrink_page_list</title>
<updated>2020-06-04T03:09:47Z</updated>
<author>
<name>Maninder Singh</name>
<email>maninder1.s@samsung.com</email>
</author>
<published>2020-06-03T23:01:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=730ec8c01a2bd6a311ada404398f44c142ac5e8e'/>
<id>urn:sha1:730ec8c01a2bd6a311ada404398f44c142ac5e8e</id>
<content type='text'>
commit 3c710c1ad11b ("mm, vmscan extract shrink_page_list reclaim counters
into a struct") changed data type for the function, so changing return
type for funciton and its caller.

Signed-off-by: Vaneet Narang &lt;v.narang@samsung.com&gt;
Signed-off-by: Maninder Singh &lt;maninder1.s@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Amit Sahrawat &lt;a.sahrawat@samsung.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Link: http://lkml.kernel.org/r/1588168259-25604-1-git-send-email-maninder1.s@samsung.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc.c: add missing newline</title>
<updated>2020-06-04T03:09:45Z</updated>
<author>
<name>Chen Tao</name>
<email>chentao107@huawei.com</email>
</author>
<published>2020-06-03T23:00:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=633bf2fe8da0520c74ee73e9eb82d78948c3b3cc'/>
<id>urn:sha1:633bf2fe8da0520c74ee73e9eb82d78948c3b3cc</id>
<content type='text'>
Add missing line breaks on pr_warn().

Signed-off-by: Chen Tao &lt;chentao107@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20200603063547.235825-1-chentao107@huawei.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make deferred init's max threads arch-specific</title>
<updated>2020-06-04T03:09:45Z</updated>
<author>
<name>Daniel Jordan</name>
<email>daniel.m.jordan@oracle.com</email>
</author>
<published>2020-06-03T22:59:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ecd096506922332fdb36ff1211e03601befe6e18'/>
<id>urn:sha1:ecd096506922332fdb36ff1211e03601befe6e18</id>
<content type='text'>
Using padata during deferred init has only been tested on x86, so for now
limit it to this architecture.

If another arch wants this, it can find the max thread limit that's best
for it and override deferred_page_init_max_threads().

Signed-off-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Robert Elliott &lt;elliott@hpe.com&gt;
Cc: Shile Zhang &lt;shile.zhang@linux.alibaba.com&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Steven Sistare &lt;steven.sistare@oracle.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Link: http://lkml.kernel.org/r/20200527173608.2885243-8-daniel.m.jordan@oracle.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: parallelize deferred_init_memmap()</title>
<updated>2020-06-04T03:09:45Z</updated>
<author>
<name>Daniel Jordan</name>
<email>daniel.m.jordan@oracle.com</email>
</author>
<published>2020-06-03T22:59:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e44431498f5fbf427f139aa413cf381b4fa3a600'/>
<id>urn:sha1:e44431498f5fbf427f139aa413cf381b4fa3a600</id>
<content type='text'>
Deferred struct page init is a significant bottleneck in kernel boot.
Optimizing it maximizes availability for large-memory systems and allows
spinning up short-lived VMs as needed without having to leave them
running.  It also benefits bare metal machines hosting VMs that are
sensitive to downtime.  In projects such as VMM Fast Restart[1], where
guest state is preserved across kexec reboot, it helps prevent application
and network timeouts in the guests.

Multithread to take full advantage of system memory bandwidth.

The maximum number of threads is capped at the number of CPUs on the node
because speedups always improve with additional threads on every system
tested, and at this phase of boot, the system is otherwise idle and
waiting on page init to finish.

Helper threads operate on section-aligned ranges to both avoid false
sharing when setting the pageblock's migrate type and to avoid accessing
uninitialized buddy pages, though max order alignment is enough for the
latter.

The minimum chunk size is also a section.  There was benefit to using
multiple threads even on relatively small memory (1G) systems, and this is
the smallest size that the alignment allows.

The time (milliseconds) is the slowest node to initialize since boot
blocks until all nodes finish.  intel_pstate is loaded in active mode
without hwp and with turbo enabled, and intel_idle is active as well.

    Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal)
      2 nodes * 26 cores * 2 threads = 104 CPUs
      384G/node = 768G memory

                   kernel boot                 deferred init
                   ------------------------    ------------------------
    node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
          (  0)         --   4089.7 (  8.1)         --   1785.7 (  7.6)
       2% (  1)       1.7%   4019.3 (  1.5)       3.8%   1717.7 ( 11.8)
      12% (  6)      34.9%   2662.7 (  2.9)      79.9%    359.3 (  0.6)
      25% ( 13)      39.9%   2459.0 (  3.6)      91.2%    157.0 (  0.0)
      37% ( 19)      39.2%   2485.0 ( 29.7)      90.4%    172.0 ( 28.6)
      50% ( 26)      39.3%   2482.7 ( 25.7)      90.3%    173.7 ( 30.0)
      75% ( 39)      39.0%   2495.7 (  5.5)      89.4%    190.0 (  1.0)
     100% ( 52)      40.2%   2443.7 (  3.8)      92.3%    138.0 (  1.0)

    Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, kvm guest)
      1 node * 16 cores * 2 threads = 32 CPUs
      192G/node = 192G memory

                   kernel boot                 deferred init
                   ------------------------    ------------------------
    node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
          (  0)         --   1988.7 (  9.6)         --   1096.0 ( 11.5)
       3% (  1)       1.1%   1967.0 ( 17.6)       0.3%   1092.7 ( 11.0)
      12% (  4)      41.1%   1170.3 ( 14.2)      73.8%    287.0 (  3.6)
      25% (  8)      47.1%   1052.7 ( 21.9)      83.9%    177.0 ( 13.5)
      38% ( 12)      48.9%   1016.3 ( 12.1)      86.8%    144.7 (  1.5)
      50% ( 16)      48.9%   1015.7 (  8.1)      87.8%    134.0 (  4.4)
      75% ( 24)      49.1%   1012.3 (  3.1)      88.1%    130.3 (  2.3)
     100% ( 32)      49.5%   1004.0 (  5.3)      88.5%    125.7 (  2.1)

    Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal)
      2 nodes * 18 cores * 2 threads = 72 CPUs
      128G/node = 256G memory

                   kernel boot                 deferred init
                   ------------------------    ------------------------
    node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
          (  0)         --   1680.0 (  4.6)         --    627.0 (  4.0)
       3% (  1)       0.3%   1675.7 (  4.5)      -0.2%    628.0 (  3.6)
      11% (  4)      25.6%   1250.7 (  2.1)      67.9%    201.0 (  0.0)
      25% (  9)      30.7%   1164.0 ( 17.3)      81.8%    114.3 ( 17.7)
      36% ( 13)      31.4%   1152.7 ( 10.8)      84.0%    100.3 ( 17.9)
      50% ( 18)      31.5%   1150.7 (  9.3)      83.9%    101.0 ( 14.1)
      75% ( 27)      31.7%   1148.0 (  5.6)      84.5%     97.3 (  6.4)
     100% ( 36)      32.0%   1142.3 (  4.0)      85.6%     90.0 (  1.0)

    AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
      1 node * 8 cores * 2 threads = 16 CPUs
      64G/node = 64G memory

                   kernel boot                 deferred init
                   ------------------------    ------------------------
    node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
          (  0)         --   1029.3 ( 25.1)         --    240.7 (  1.5)
       6% (  1)      -0.6%   1036.0 (  7.8)      -2.2%    246.0 (  0.0)
      12% (  2)      11.8%    907.7 (  8.6)      44.7%    133.0 (  1.0)
      25% (  4)      13.9%    886.0 ( 10.6)      62.6%     90.0 (  6.0)
      38% (  6)      17.8%    845.7 ( 14.2)      69.1%     74.3 (  3.8)
      50% (  8)      16.8%    856.0 ( 22.1)      72.9%     65.3 (  5.7)
      75% ( 12)      15.4%    871.0 ( 29.2)      79.8%     48.7 (  7.4)
     100% ( 16)      21.0%    813.7 ( 21.0)      80.5%     47.0 (  5.2)

Server-oriented distros that enable deferred page init sometimes run in
small VMs, and they still benefit even though the fraction of boot time
saved is smaller:

    AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
      1 node * 2 cores * 2 threads = 4 CPUs
      16G/node = 16G memory

                   kernel boot                 deferred init
                   ------------------------    ------------------------
    node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
          (  0)         --    716.0 ( 14.0)         --     49.7 (  0.6)
      25% (  1)       1.8%    703.0 (  5.3)      -4.0%     51.7 (  0.6)
      50% (  2)       1.6%    704.7 (  1.2)      43.0%     28.3 (  0.6)
      75% (  3)       2.7%    696.7 ( 13.1)      49.7%     25.0 (  0.0)
     100% (  4)       4.1%    687.0 ( 10.4)      55.7%     22.0 (  0.0)

    Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, kvm guest)
      1 node * 2 cores * 2 threads = 4 CPUs
      14G/node = 14G memory

                   kernel boot                 deferred init
                   ------------------------    ------------------------
    node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
          (  0)         --    787.7 (  6.4)         --    122.3 (  0.6)
      25% (  1)       0.2%    786.3 ( 10.8)      -2.5%    125.3 (  2.1)
      50% (  2)       5.9%    741.0 ( 13.9)      37.6%     76.3 ( 19.7)
      75% (  3)       8.3%    722.0 ( 19.0)      49.9%     61.3 (  3.2)
     100% (  4)       9.3%    714.7 (  9.5)      56.4%     53.3 (  1.5)

On Josh's 96-CPU and 192G memory system:

    Without this patch series:
    [    0.487132] node 0 initialised, 23398907 pages in 292ms
    [    0.499132] node 1 initialised, 24189223 pages in 304ms
    ...
    [    0.629376] Run /sbin/init as init process

    With this patch series:
    [    0.231435] node 1 initialised, 24189223 pages in 32ms
    [    0.236718] node 0 initialised, 23398907 pages in 36ms

[1] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf

Signed-off-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Reviewed-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Robert Elliott &lt;elliott@hpe.com&gt;
Cc: Shile Zhang &lt;shile.zhang@linux.alibaba.com&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Steven Sistare &lt;steven.sistare@oracle.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Link: http://lkml.kernel.org/r/20200527173608.2885243-7-daniel.m.jordan@oracle.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
