<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v5.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-04-26T16:18:05Z</updated>
<entry>
<title>mm/page_alloc.c: fix never set ALLOC_NOFRAGMENT flag</title>
<updated>2019-04-26T16:18:05Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2019-04-26T05:24:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8118b82eb756e271929697e8ada5f637dc443af1'/>
<id>urn:sha1:8118b82eb756e271929697e8ada5f637dc443af1</id>
<content type='text'>
Commit 0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
removed setting of the ALLOC_NOFRAGMENT flag.  Bring it back.

The runtime effect is that ALLOC_NOFRAGMENT behaviour is restored so
that allocations are spread across local zones to avoid fragmentation
due to mixing pageblocks as long as possible.

Link: http://lkml.kernel.org/r/20190423120806.3503-2-aryabinin@virtuozzo.com
Fixes: 0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc.c: avoid potential NULL pointer dereference</title>
<updated>2019-04-26T16:18:05Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2019-04-26T05:23:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8139ad043d632c0e9e12d760068a7a8e91659aa1'/>
<id>urn:sha1:8139ad043d632c0e9e12d760068a7a8e91659aa1</id>
<content type='text'>
ac.preferred_zoneref-&gt;zone passed to alloc_flags_nofragment() can be NULL.
'zone' pointer unconditionally derefernced in alloc_flags_nofragment().
Bail out on NULL zone to avoid potential crash.  Currently we don't see
any crashes only because alloc_flags_nofragment() has another bug which
allows compiler to optimize away all accesses to 'zone'.

Link: http://lkml.kernel.org/r/20190423120806.3503-1-aryabinin@virtuozzo.com
Fixes: 6bb154504f8b ("mm, page_alloc: spread allocations across zones before introducing fragmentation")
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, page_alloc: always use a captured page regardless of compaction result</title>
<updated>2019-04-26T16:18:05Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2019-04-26T05:23:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee8ab0eeb49bd3982090c8f14dc9cc65bcd13c5c'/>
<id>urn:sha1:ee8ab0eeb49bd3982090c8f14dc9cc65bcd13c5c</id>
<content type='text'>
During the development of commit 5e1f0f098b46 ("mm, compaction: capture
a page under direct compaction"), a paranoid check was added to ensure
that if a captured page was available after compaction that it was
consistent with the final state of compaction.  The intent was to catch
serious programming bugs such as using a stale page pointer and causing
corruption problems.

However, it is possible to get a captured page even if compaction was
unsuccessful if an interrupt triggered and happened to free pages in
interrupt context that got merged into a suitable high-order page.  It's
highly unlikely but Li Wang did report the following warning on s390
occuring when testing OOM handling.  Note that the warning is slightly
edited for clarity.

  WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_direct_compact+0x182/0x190
  Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
    lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390
    des_s390 des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc
    ip_tables xfs libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod
    qeth qdio lcs ctcm ccwgroup fsm dm_mirror dm_region_hash dm_log
    dm_mod
  CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1

This patch simply removes the check entirely instead of trying to be
clever about pages freed from interrupt context.  If a serious
programming error was introduced, it is highly likely to be caught by
prep_new_page() instead.

Link: http://lkml.kernel.org/r/20190419085133.GH18914@techsingularity.net
Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Reported-by: Li Wang &lt;liwang@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model</title>
<updated>2019-04-26T16:18:05Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2019-04-26T05:23:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=24512228b7a3f412b5a51f189df302616b021c33'/>
<id>urn:sha1:24512228b7a3f412b5a51f189df302616b021c33</id>
<content type='text'>
Mikulas Patocka reported that commit 1c30844d2dfe ("mm: reclaim small
amounts of memory when an external fragmentation event occurs") "broke"
memory management on parisc.

The machine is not NUMA but the DISCONTIG model creates three pgdats
even though it's a UMA machine for the following ranges

        0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
        1) Start 0x0000000100000000 End 0x00000001bfdfffff Size   3070 MB
        2) Start 0x0000004040000000 End 0x00000040ffffffff Size   3072 MB

Mikulas reported:

	With the patch 1c30844d2, the kernel will incorrectly reclaim the
	first zone when it fills up, ignoring the fact that there are two
	completely free zones. Basiscally, it limits cache size to 1GiB.

	For example, if I run:
	# dd if=/dev/sda of=/dev/null bs=1M count=2048

	- with the proper kernel, there should be "Buffers - 2GiB"
	when this command finishes. With the patch 1c30844d2, buffers
	will consume just 1GiB or slightly more, because the kernel was
	incorrectly reclaiming them.

The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis.  Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.

DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.

Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Tested-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: James Bottomley &lt;James.Bottomley@hansenpartnership.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: drop memory device reference after find_memory_block()</title>
<updated>2019-04-26T16:18:04Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-04-26T05:23:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=89c02e69fc5245f8a2f34b58b42d43a737af1a5e'/>
<id>urn:sha1:89c02e69fc5245f8a2f34b58b42d43a737af1a5e</id>
<content type='text'>
Right now we are using find_memory_block() to get the node id for the
pfn range to online.  We are missing to drop a reference to the memory
block device.  While the device still gets unregistered via
device_unregister(), resulting in no user visible problem, the device is
never released via device_release(), resulting in a memory leak.  Fix
that by properly using a put_device().

Link: http://lkml.kernel.org/r/20190411110955.1430-1-david@redhat.com
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Pankaj Gupta &lt;pagupta@redhat.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu</title>
<updated>2019-04-19T22:37:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-04-19T22:37:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4c3f49ae1306c05e91211c06feddfd0a4a57fabd'/>
<id>urn:sha1:4c3f49ae1306c05e91211c06feddfd0a4a57fabd</id>
<content type='text'>
Pull percpu fixlet from Dennis Zhou:
 "This stops printing the base address of percpu memory on
  initialization"

* 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  percpu: stop printing kernel addresses
</content>
</entry>
<entry>
<title>coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping</title>
<updated>2019-04-19T16:46:05Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2019-04-19T00:50:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04f5866e41fb70690e28397487d8bd8eea7d712a'/>
<id>urn:sha1:04f5866e41fb70690e28397487d8bd8eea7d712a</id>
<content type='text'>
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the -&gt;core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once -&gt;core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/kmemleak.c: fix unused-function warning</title>
<updated>2019-04-19T16:46:05Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2019-04-19T00:50:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dce5b0bdeec61bdbee56121ceb1d014151d5cab1'/>
<id>urn:sha1:dce5b0bdeec61bdbee56121ceb1d014151d5cab1</id>
<content type='text'>
The only references outside of the #ifdef have been removed, so now we
get a warning in non-SMP configurations:

  mm/kmemleak.c:1404:13: error: unused function 'scan_large_block' [-Werror,-Wunused-function]

Add a new #ifdef around it.

Link: http://lkml.kernel.org/r/20190416123148.3502045-1-arnd@arndb.de
Fixes: 298a32b13208 ("kmemleak: powerpc: skip scanning holes in the .bss section")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Vincent Whitchurch &lt;vincent.whitchurch@axis.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix inactive list balancing between NUMA nodes and cgroups</title>
<updated>2019-04-19T16:46:05Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-04-19T00:50:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3b991208b897f52507168374033771a984b947b1'/>
<id>urn:sha1:3b991208b897f52507168374033771a984b947b1</id>
<content type='text'>
During !CONFIG_CGROUP reclaim, we expand the inactive list size if it's
thrashing on the node that is about to be reclaimed.  But when cgroups
are enabled, we suddenly ignore the node scope and use the cgroup scope
only.  The result is that pressure bleeds between NUMA nodes depending
on whether cgroups are merely compiled into Linux.  This behavioral
difference is unexpected and undesirable.

When the refault adaptivity of the inactive list was first introduced,
there were no statistics at the lruvec level - the intersection of node
and memcg - so it was better than nothing.

But now that we have that infrastructure, use lruvec_page_state() to
make the list balancing decision always NUMA aware.

[hannes@cmpxchg.org: fix bisection hole]
  Link: http://lkml.kernel.org/r/20190417155241.GB23013@cmpxchg.org
Link: http://lkml.kernel.org/r/20190412144438.2645-1-hannes@cmpxchg.org
Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hotplug: treat CMA pages as unmovable</title>
<updated>2019-04-19T16:46:05Z</updated>
<author>
<name>Qian Cai</name>
<email>cai@lca.pw</email>
</author>
<published>2019-04-19T00:50:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a9f219157b22d0ffb340a9c5f431afd02cd2cf3'/>
<id>urn:sha1:1a9f219157b22d0ffb340a9c5f431afd02cd2cf3</id>
<content type='text'>
has_unmovable_pages() is used by allocating CMA and gigantic pages as
well as the memory hotplug.  The later doesn't know how to offline CMA
pool properly now, but if an unused (free) CMA page is encountered, then
has_unmovable_pages() happily considers it as a free memory and
propagates this up the call chain.  Memory offlining code then frees the
page without a proper CMA tear down which leads to an accounting issues.
Moreover if the same memory range is onlined again then the memory never
gets back to the CMA pool.

State after memory offline:

 # grep cma /proc/vmstat
 nr_free_cma 205824

 # cat /sys/kernel/debug/cma/cma-kvm_cma/count
 209920

Also, kmemleak still think those memory address are reserved below but
have already been used by the buddy allocator after onlining.  This
patch fixes the situation by treating CMA pageblocks as unmovable except
when has_unmovable_pages() is called as part of CMA allocation.

  Offlined Pages 4096
  kmemleak: Cannot insert 0xc000201f7d040008 into the object search tree (overlaps existing)
  Call Trace:
    dump_stack+0xb0/0xf4 (unreliable)
    create_object+0x344/0x380
    __kmalloc_node+0x3ec/0x860
    kvmalloc_node+0x58/0x110
    seq_read+0x41c/0x620
    __vfs_read+0x3c/0x70
    vfs_read+0xbc/0x1a0
    ksys_read+0x7c/0x140
    system_call+0x5c/0x70
  kmemleak: Kernel memory leak detector disabled
  kmemleak: Object 0xc000201cc8000000 (size 13757317120):
  kmemleak:   comm "swapper/0", pid 0, jiffies 4294937297
  kmemleak:   min_count = -1
  kmemleak:   count = 0
  kmemleak:   flags = 0x5
  kmemleak:   checksum = 0
  kmemleak:   backtrace:
       cma_declare_contiguous+0x2a4/0x3b0
       kvm_cma_reserve+0x11c/0x134
       setup_arch+0x300/0x3f8
       start_kernel+0x9c/0x6e8
       start_here_common+0x1c/0x4b0
  kmemleak: Automatic memory scanning thread ended

[cai@lca.pw: use is_migrate_cma_page() and update commit log]
  Link: http://lkml.kernel.org/r/20190416170510.20048-1-cai@lca.pw
Link: http://lkml.kernel.org/r/20190413002623.8967-1-cai@lca.pw
Signed-off-by: Qian Cai &lt;cai@lca.pw&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
