<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/memory.c, branch v4.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-09-25T22:43:42Z</updated>
<entry>
<title>mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing</title>
<updated>2016-09-25T22:43:42Z</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lstoakes@gmail.com</email>
</author>
<published>2016-09-11T22:54:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=38e088546522e1e86d2b8f401a1354ad3a9b3303'/>
<id>urn:sha1:38e088546522e1e86d2b8f401a1354ad3a9b3303</id>
<content type='text'>
The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
PMDs respectively as requiring balancing upon a subsequent page fault.
User-defined PROT_NONE memory regions which also have this flag set will
not normally invoke the NUMA balancing code as do_page_fault() will send
a segfault to the process before handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of
memory, handle_mm_fault() is called via faultin_page() and
__get_user_pages() without any access checks being performed, meaning
the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
region.

A simple means of triggering this problem is to access PROT_NONE mmap'd
memory using /proc/self/mem which reliably results in the NUMA handling
functions being invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some
simple repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
added at commit c0e7cad to avoid accidentally provoking strange
behaviour by attempting to apply NUMA balancing to pages that are in
fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than
invoking BUG_ON() as faulting in these pages via faultin_page() is a
valid reason for reaching the NUMA check with the PROT_NONE page table
flag set and is therefore not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
Reported-by: Trevor Saunders &lt;tbsaunde@tbsaunde.org&gt;
Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: move swap-in anonymous page into active list</title>
<updated>2016-08-02T21:31:41Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2016-08-02T21:02:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a8018fb4c6976559c3f04bcf760822381be501d'/>
<id>urn:sha1:1a8018fb4c6976559c3f04bcf760822381be501d</id>
<content type='text'>
Every swap-in anonymous page starts from inactive lru list's head.  It
should be activated unconditionally when VM decide to reclaim because
page table entry for the page always usually has marked accessed bit.
Thus, their window size for getting a new referece is 2 * NR_inactive +
NR_active while others is NR_inactive + NR_active.

It's not fair that it has more chance to be referenced compared to other
newly allocated page which starts from active lru list's head.

Johannes:

: The page can still have a valid copy on the swap device, so prefering to
: reclaim that page over a fresh one could make sense.  But as you point
: out, having it start inactive instead of active actually ends up giving it
: *more* LRU time, and that seems to be without justification.

Rik:

: The reason newly read in swap cache pages start on the inactive list is
: that we do some amount of read-around, and do not know which pages will
: get used.
:
: However, immediately activating the ones that DO get used, like your patch
: does, is the right thing to do.

Link: http://lkml.kernel.org/r/1469762740-17860-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Nadav Amit &lt;nadav.amit@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fail prefaulting if page table allocation fails</title>
<updated>2016-08-02T21:31:41Z</updated>
<author>
<name>Vegard Nossum</name>
<email>vegard.nossum@oracle.com</email>
</author>
<published>2016-08-02T21:02:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c5f88bd29ab42d5d1e77085b5f69d5c6da20324e'/>
<id>urn:sha1:c5f88bd29ab42d5d1e77085b5f69d5c6da20324e</id>
<content type='text'>
I ran into this:

    BUG: sleeping function called from invalid context at mm/page_alloc.c:3784
    in_atomic(): 0, irqs_disabled(): 0, pid: 1434, name: trinity-c1
    2 locks held by trinity-c1/1434:
     #0:  (&amp;mm-&gt;mmap_sem){......}, at: [&lt;ffffffff810ce31e&gt;] __do_page_fault+0x1ce/0x8f0
     #1:  (rcu_read_lock){......}, at: [&lt;ffffffff81378f86&gt;] filemap_map_pages+0xd6/0xdd0

    CPU: 0 PID: 1434 Comm: trinity-c1 Not tainted 4.7.0+ #58
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
      dump_stack+0x65/0x84
      panic+0x185/0x2dd
      ___might_sleep+0x51c/0x600
      __might_sleep+0x90/0x1a0
      __alloc_pages_nodemask+0x5b1/0x2160
      alloc_pages_current+0xcc/0x370
      pte_alloc_one+0x12/0x90
      __pte_alloc+0x1d/0x200
      alloc_set_pte+0xe3e/0x14a0
      filemap_map_pages+0x42b/0xdd0
      handle_mm_fault+0x17d5/0x28b0
      __do_page_fault+0x310/0x8f0
      trace_do_page_fault+0x18d/0x310
      do_async_page_fault+0x27/0xa0
      async_page_fault+0x28/0x30

The important bits from the above is that filemap_map_pages() is calling
into the page allocator while holding rcu_read_lock (sleeping is not
allowed inside RCU read-side critical sections).

According to Kirill Shutemov, the prefaulting code in do_fault_around()
is supposed to take care of this, but missing error handling means that
the allocation failure can go unnoticed.

We don't need to return VM_FAULT_OOM (or any other error) here, since we
can just let the normal fault path try again.

Fixes: 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/1469708107-11868-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "Hillf Danton" &lt;hillf.zj@alibaba-inc.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:26:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e496cf3d782135c1cca0d154d4b924517ff58de0'/>
<id>urn:sha1:e496cf3d782135c1cca0d154d4b924517ff58de0</id>
<content type='text'>
For file mappings, we don't deposit page tables on THP allocation
because it's not strictly required to implement split_huge_pmd(): we can
just clear pmd and let following page faults to reconstruct the page
table.

But Power makes use of deposited page table to address MMU quirk.

Let's hide THP page cache, including huge tmpfs, under separate config
option, so it can be forbidden on Power.

We can revert the patch later once solution for Power found.

Link: http://lkml.kernel.org/r/1466021202-61880-36-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>shmem: add huge pages support</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:26:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=800d8c63b2e989c2e349632d1648119bf5862f01'/>
<id>urn:sha1:800d8c63b2e989c2e349632d1648119bf5862f01</id>
<content type='text'>
Here's basic implementation of huge pages support for shmem/tmpfs.

It's all pretty streight-forward:

  - shmem_getpage() allcoates huge page if it can and try to inserd into
    radix tree with shmem_add_to_page_cache();

  - shmem_add_to_page_cache() puts the page onto radix-tree if there's
    space for it;

  - shmem_undo_range() removes huge pages, if it fully within range.
    Partial truncate of huge pages zero out this part of THP.

    This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE)
    behaviour. As we don't really create hole in this case,
    lseek(SEEK_HOLE) may have inconsistent results depending what
    pages happened to be allocated.

  - no need to change shmem_fault: core-mm will map an compound page as
    huge if VMA is suitable;

Link: http://lkml.kernel.org/r/1466021202-61880-30-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>thp: handle file COW faults</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:25:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af9e4d5f2de2eabdc7145e077ba48b2a638465c6'/>
<id>urn:sha1:af9e4d5f2de2eabdc7145e077ba48b2a638465c6</id>
<content type='text'>
File COW for THP is handled on pte level: just split the pmd.

It's not clear how benefitial would be allocation of huge pages on COW
faults.  And it would require some code to make them work.

I think at some point we can consider teaching khugepaged to collapse
pages in COW mappings, but allocating huge on fault is probably
overkill.

Link: http://lkml.kernel.org/r/1466021202-61880-16-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>thp, vmstats: add counters for huge file pages</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:25:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=95ecedcd6abbb05d8177331e2fa697888dcd634b'/>
<id>urn:sha1:95ecedcd6abbb05d8177331e2fa697888dcd634b</id>
<content type='text'>
THP_FILE_ALLOC: how many times huge page was allocated and put page
cache.

THP_FILE_MAPPED: how many times file huge page was mapped.

Link: http://lkml.kernel.org/r/1466021202-61880-13-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: introduce do_set_pmd()</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:25:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1010245964415bb7403463115bab2cd26244b445'/>
<id>urn:sha1:1010245964415bb7403463115bab2cd26244b445</id>
<content type='text'>
With postponed page table allocation we have chance to setup huge pages.
do_set_pte() calls do_set_pmd() if following criteria met:

 - page is compound;
 - pmd entry in pmd_none();
 - vma has suitable size and alignment;

Link: http://lkml.kernel.org/r/1466021202-61880-12-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>rmap: support file thp</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:25:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dd78fedde4b99b322f2dc849d467d365a82e23ca'/>
<id>urn:sha1:dd78fedde4b99b322f2dc849d467d365a82e23ca</id>
<content type='text'>
Naive approach: on mapping/unmapping the page as compound we update
-&gt;_mapcount on each 4k page.  That's not efficient, but it's not obvious
how we can optimize this.  We can look into optimization later.

PG_double_map optimization doesn't work for file pages since lifecycle
of file pages is different comparing to anon pages: file page can be
mapped again at any time.

Link: http://lkml.kernel.org/r/1466021202-61880-11-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: postpone page table allocation until we have page to map</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:25:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7267ec008b5cd8b3579e188b1ff238815643e372'/>
<id>urn:sha1:7267ec008b5cd8b3579e188b1ff238815643e372</id>
<content type='text'>
The idea (and most of code) is borrowed again: from Hugh's patchset on
huge tmpfs[1].

Instead of allocation pte page table upfront, we postpone this until we
have page to map in hands.  This approach opens possibility to map the
page as huge if filesystem supports this.

Comparing to Hugh's patch I've pushed page table allocation a bit
further: into do_set_pte().  This way we can postpone allocation even in
faultaround case without moving do_fault_around() after __do_fault().

do_set_pte() got renamed to alloc_set_pte() as it can allocate page
table if required.

[1] http://lkml.kernel.org/r/alpine.LSU.2.11.1502202015090.14414@eggly.anvils

Link: http://lkml.kernel.org/r/1466021202-61880-10-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
