<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-03-12T00:17:47Z</updated>
<entry>
<title>mm/mempool: avoid KASAN marking mempool poison checks as use-after-free</title>
<updated>2016-03-12T00:17:47Z</updated>
<author>
<name>Matthew Dawson</name>
<email>matthew@mjdsystems.ca</email>
</author>
<published>2016-03-11T21:08:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7640131032db9118a78af715ac77ba2debeeb17c'/>
<id>urn:sha1:7640131032db9118a78af715ac77ba2debeeb17c</id>
<content type='text'>
When removing an element from the mempool, mark it as unpoisoned in KASAN
before verifying its contents for SLUB/SLAB debugging.  Otherwise KASAN
will flag the reads checking the element use-after-free writes as
use-after-free reads.

Signed-off-by: Matthew Dawson &lt;matthew@mjdsystems.ca&gt;
Acked-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers</title>
<updated>2016-03-09T23:43:42Z</updated>
<author>
<name>Jan Stancek</name>
<email>jstancek@redhat.com</email>
</author>
<published>2016-03-09T22:08:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86613628b3d367743f71b945c203774c522404f4'/>
<id>urn:sha1:86613628b3d367743f71b945c203774c522404f4</id>
<content type='text'>
Replace ENOTSUPP with EOPNOTSUPP.  If hugepages are not supported, this
value is propagated to userspace.  EOPNOTSUPP is part of uapi and is
widely supported by libc libraries.

It gives nicer message to user, rather than:

  # cat /proc/sys/vm/nr_hugepages
  cat: /proc/sys/vm/nr_hugepages: Unknown error 524

And also LTP's proc01 test was failing because this ret code (524)
was unexpected:

  proc01      1  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524
  proc01      2  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524
  proc01      3  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524

Signed-off-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, thp: fix migration of PTE-mapped transparent huge pages</title>
<updated>2016-03-09T23:43:42Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-03-09T22:08:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0a2e280b6d8ea4afef07c749070705d6af403b7f'/>
<id>urn:sha1:0a2e280b6d8ea4afef07c749070705d6af403b7f</id>
<content type='text'>
We don't have native support of THP migration, so we have to split huge
page into small pages in order to migrate it to different node.  This
includes PTE-mapped huge pages.

I made mistake in refcounting patchset: we don't actually split
PTE-mapped huge page in queue_pages_pte_range(), if we step on head
page.

The result is that the head page is queued for migration, but none of
tail pages: putting head page on queue takes pin on the page and any
subsequent attempts of split_huge_pages() would fail and we skip queuing
tail pages.

unmap_and_move_huge_page() will eventually split the huge pages, but
only one of 512 pages would get migrated.

Let's fix the situation.

Fixes: 248db92da13f2507 ("migrate_pages: try to split pages on queuing")
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kasan: add functions to clear stack poison</title>
<updated>2016-03-09T23:43:42Z</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2016-03-09T22:08:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e3ae116339f9a0c77523abc95e338fa405946e07'/>
<id>urn:sha1:e3ae116339f9a0c77523abc95e338fa405946e07</id>
<content type='text'>
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code.  If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.

Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.

Instead, this series explicitly removes any stale poison before it can
be hit.  In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.

On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents.  To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared.  Architectures which don't perform a cold return as part of
idle do not need any additional code.

This patch (of 3):

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In some cases (e.g.  hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code.  If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g.  a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.

To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called.  This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack.  These
will be used by subsequent patches to avoid problems with hotplug and
idle.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Lorenzo Pieralisi &lt;lorenzo.pieralisi@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: __delete_from_page_cache show Bad page if mapped</title>
<updated>2016-03-09T23:43:42Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2016-03-09T22:08:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=06b241f32c711d7ca868a0351dd97fe91fd8817b'/>
<id>urn:sha1:06b241f32c711d7ca868a0351dd97fe91fd8817b</id>
<content type='text'>
Commit e1534ae95004 ("mm: differentiate page_mapped() from
page_mapcount() for compound pages") changed the famous
BUG_ON(page_mapped(page)) in __delete_from_page_cache() to
VM_BUG_ON_PAGE(page_mapped(page)): which gives us more info when
CONFIG_DEBUG_VM=y, but nothing at all when not.

Although it has not usually been very helpul, being hit long after the
error in question, we do need to know if it actually happens on users'
systems; but reinstating a crash there is likely to be opposed :)

In the non-debug case, pr_alert("BUG: Bad page cache") plus dump_page(),
dump_stack(), add_taint() - I don't really believe LOCKDEP_NOW_UNRELIABLE,
but that seems to be the standard procedure now.  Move that, or the
VM_BUG_ON_PAGE(), up before the deletion from tree: so that the
unNULLified page-&gt;mapping gives a little more information.

If the inode is being evicted (rather than truncated), it won't have any
vmas left, so it's safe(ish) to assume that the raised mapcount is
erroneous, and we can discount it from page_count to avoid leaking the
page (I'm less worried by leaking the occasional 4kB, than losing a
potential 2MB page with each 4kB page leaked).

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: hugetlb_no_page: rate-limit warning message</title>
<updated>2016-03-09T23:43:42Z</updated>
<author>
<name>Geoffrey Thomas</name>
<email>geofft@ldpreload.com</email>
</author>
<published>2016-03-09T22:08:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=910154d520c97cd0095a889e6b878041c91111a6'/>
<id>urn:sha1:910154d520c97cd0095a889e6b878041c91111a6</id>
<content type='text'>
The warning message "killed due to inadequate hugepage pool" simply
indicates that SIGBUS was sent, not that the process was forcibly killed.
If the process has a signal handler installed does not fix the problem,
this message can rapidly spam the kernel log.

On my amd64 dev machine that does not have hugepages configured, I can
reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e.,
4 megabytes of huge pages) and running something that sets a signal
handler and forks, like

  #include &lt;sys/mman.h&gt;
  #include &lt;signal.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;unistd.h&gt;

  sig_atomic_t counter = 10;
  void handler(int signal)
  {
      if (counter-- == 0)
         exit(0);
  }

  int main(void)
  {
      int status;
      char *addr = mmap(NULL, 4 * 1048576, PROT_READ | PROT_WRITE,
              MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
      if (addr == MAP_FAILED) {perror("mmap"); return 1;}
      *addr = 'x';
      switch (fork()) {
         case -1:
            perror("fork"); return 1;
         case 0:
            signal(SIGBUS, handler);
            *addr = 'x';
            break;
         default:
            *addr = 'x';
            wait(&amp;status);
            if (WIFSIGNALED(status)) {
               psignal(WTERMSIG(status), "child");
            }
            break;
      }
  }

Signed-off-by: Geoffrey Thomas &lt;geofft@ldpreload.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: move writeback calls into the filesystems</title>
<updated>2016-02-27T18:28:52Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2016-02-26T23:19:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f6d5b529b7dfe2fca30cbf4bc81e16575090025'/>
<id>urn:sha1:7f6d5b529b7dfe2fca30cbf4bc81e16575090025</id>
<content type='text'>
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 &amp; xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode-&gt;i_sb-&gt;s_bdev.  This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
-&gt;writepages function so that it can supply us with a valid block
device.  This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Al Viro &lt;viro@ftp.linux.org.uk&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: numa: quickly fail allocations for NUMA balancing on full nodes</title>
<updated>2016-02-27T18:28:52Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2016-02-26T23:19:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8479eba7781fa9ffb28268840de6facfc12c35a7'/>
<id>urn:sha1:8479eba7781fa9ffb28268840de6facfc12c35a7</id>
<content type='text'>
Commit 4167e9b2cf10 ("mm: remove GFP_THISNODE") removed the GFP_THISNODE
flag combination due to confusing semantics.  It noted that
alloc_misplaced_dst_page() was one such user after changes made by
commit e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify").

Unfortunately when GFP_THISNODE was removed, users of
alloc_misplaced_dst_page() started waking kswapd and entering direct
reclaim because the wrong GFP flags are cleared.  The consequence is
that workloads that used to fit into memory now get reclaimed which is
addressed by this patch.

The problem can be demonstrated with "mutilate" that exercises memcached
which is software dedicated to memory object caching.  The configuration
uses 80% of memory and is run 3 times for varying numbers of clients.
The results on a 4-socket NUMA box are

mutilate
                            4.4.0                 4.4.0
                          vanilla           numaswap-v1
Hmean    1      8394.71 (  0.00%)     8395.32 (  0.01%)
Hmean    4     30024.62 (  0.00%)    34513.54 ( 14.95%)
Hmean    7     32821.08 (  0.00%)    70542.96 (114.93%)
Hmean    12    55229.67 (  0.00%)    93866.34 ( 69.96%)
Hmean    21    39438.96 (  0.00%)    85749.21 (117.42%)
Hmean    30    37796.10 (  0.00%)    50231.49 ( 32.90%)
Hmean    47    18070.91 (  0.00%)    38530.13 (113.22%)

The metric is queries/second with the more the better.  The results are
way outside of the noise and the reason for the improvement is obvious
from some of the vmstats

                                 4.4.0       4.4.0
                               vanillanumaswap-v1r1
Minor Faults                1929399272  2146148218
Major Faults                  19746529        3567
Swap Ins                      57307366        9913
Swap Outs                     50623229       17094
Allocation stalls                35909         443
DMA allocs                           0           0
DMA32 allocs                  72976349   170567396
Normal allocs               5306640898  5310651252
Movable allocs                       0           0
Direct pages scanned         404130893      799577
Kswapd pages scanned         160230174           0
Kswapd pages reclaimed        55928786           0
Direct pages reclaimed         1843936       41921
Page writes file                  2391           0
Page writes anon              50623229       17094

The vanilla kernel is swapping like crazy with large amounts of direct
reclaim and kswapd activity.  The figures are aggregate but it's known
that the bad activity is throughout the entire test.

Note that simple streaming anon/file memory consumers also see this
problem but it's not as obvious.  In those cases, kswapd is awake when
it should not be.

As there are at least two reclaim-related bugs out there, it's worth
spelling out the user-visible impact.  This patch only addresses bugs
related to excessive reclaim on NUMA hardware when the working set is
larger than a NUMA node.  There is a bug related to high kswapd CPU
usage but the reports are against laptops and other UMA hardware and is
not addressed by this patch.

Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.1+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED</title>
<updated>2016-02-27T18:28:52Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2016-02-26T23:19:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433'/>
<id>urn:sha1:ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433</id>
<content type='text'>
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>thp: call pmdp_invalidate() with correct virtual address</title>
<updated>2016-02-24T18:46:30Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-02-24T15:58:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ac015e293bbe3858533009612eac58842daf325'/>
<id>urn:sha1:2ac015e293bbe3858533009612eac58842daf325</id>
<content type='text'>
Sebastian Ott and Gerald Schaefer reported random crashes on s390.
It was bisected to my THP refcounting patchset.

The problem is that pmdp_invalidated() called with wrong virtual
address. It got offset up by HPAGE_PMD_SIZE by loop over ptes.

The solution is to introduce new variable to be used in loop and don't
touch 'haddr'.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-and-tested-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Reported-and-tested-by Sebastian Ott &lt;sebott@linux.vnet.ibm.com&gt;
Reviewed-by: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
