<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/memory.c, branch v3.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-10-29T10:37:52Z</updated>
<entry>
<title>mm: numa: Sanitize task_numa_fault() callsites</title>
<updated>2013-10-29T10:37:52Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-10-07T10:28:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c61109e34f60f6e85bb43c5a1cd51c0e3db40847'/>
<id>urn:sha1:c61109e34f60f6e85bb43c5a1cd51c0e3db40847</id>
<content type='text'>
There are three callers of task_numa_fault():

 - do_huge_pmd_numa_page():
     Accounts against the current node, not the node where the
     page resides, unless we migrated, in which case it accounts
     against the node we migrated to.

 - do_numa_page():
     Accounts against the current node, not the node where the
     page resides, unless we migrated, in which case it accounts
     against the node we migrated to.

 - do_pmd_numa_page():
     Accounts not at all when the page isn't migrated, otherwise
     accounts against the node we migrated towards.

This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.

So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.

They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: memcg: handle non-error OOM situations more gracefully</title>
<updated>2013-10-17T04:35:53Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2013-10-16T20:46:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4942642080ea82d99ab5b653abb9a12b7ba31f4a'/>
<id>urn:sha1:4942642080ea82d99ab5b653abb9a12b7ba31f4a</id>
<content type='text'>
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead.  But there are many more and it's impractical to annotate
them all.

First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well.  This simplifies the code quite a bit for
added bonus.

Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts.  If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.

Reported-by: azurIt &lt;azurit@pobox.sk&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: migration: do not lose soft dirty bit if page is in migration state</title>
<updated>2013-10-17T04:35:52Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@gmail.com</email>
</author>
<published>2013-10-16T20:46:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c3d16e16522fe3fe8759735850a0676da18f4b1d'/>
<id>urn:sha1:c3d16e16522fe3fe8759735850a0676da18f4b1d</id>
<content type='text'>
If page migration is turned on in config and the page is migrating, we
may lose the soft dirty bit.  If fork and mprotect are called on
migrating pages (once migration is complete) pages do not obtain the
soft dirty bit in the correspond pte entries.  Fix it adding an
appropriate test on swap entries.

Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@linux.vnet.ibm.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@gmail.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()</title>
<updated>2013-09-12T22:38:03Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-09-12T22:14:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c02925540ca7019465a43c00f8a3c0186ddace2b'/>
<id>urn:sha1:c02925540ca7019465a43c00f8a3c0186ddace2b</id>
<content type='text'>
do_huge_pmd_anonymous_page() has copy-pasted piece of handle_mm_fault()
to handle fallback path.

Let's consolidate code back by introducing VM_FAULT_FALLBACK return
code.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcg: do not trap chargers with full callstack on OOM</title>
<updated>2013-09-12T22:38:02Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2013-09-12T22:13:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3812c8c8f3953921ef18544110dafc3505c1ac62'/>
<id>urn:sha1:3812c8c8f3953921ef18544110dafc3505c1ac62</id>
<content type='text'>
The memcg OOM handling is incredibly fragile and can deadlock.  When a
task fails to charge memory, it invokes the OOM killer and loops right
there in the charge code until it succeeds.  Comparably, any other task
that enters the charge path at this point will go to a waitqueue right
then and there and sleep until the OOM situation is resolved.  The problem
is that these tasks may hold filesystem locks and the mmap_sem; locks that
the selected OOM victim may need to exit.

For example, in one reported case, the task invoking the OOM killer was
about to charge a page cache page during a write(), which holds the
i_mutex.  The OOM killer selected a task that was just entering truncate()
and trying to acquire the i_mutex:

OOM invoking task:
  mem_cgroup_handle_oom+0x241/0x3b0
  mem_cgroup_cache_charge+0xbe/0xe0
  add_to_page_cache_locked+0x4c/0x140
  add_to_page_cache_lru+0x22/0x50
  grab_cache_page_write_begin+0x8b/0xe0
  ext3_write_begin+0x88/0x270
  generic_file_buffered_write+0x116/0x290
  __generic_file_aio_write+0x27c/0x480
  generic_file_aio_write+0x76/0xf0           # takes -&gt;i_mutex
  do_sync_write+0xea/0x130
  vfs_write+0xf3/0x1f0
  sys_write+0x51/0x90
  system_call_fastpath+0x18/0x1d

OOM kill victim:
  do_truncate+0x58/0xa0              # takes i_mutex
  do_last+0x250/0xa30
  path_openat+0xd7/0x440
  do_filp_open+0x49/0xa0
  do_sys_open+0x106/0x240
  sys_open+0x20/0x30
  system_call_fastpath+0x18/0x1d

The OOM handling task will retry the charge indefinitely while the OOM
killed task is not releasing any resources.

A similar scenario can happen when the kernel OOM killer for a memcg is
disabled and a userspace task is in charge of resolving OOM situations.
In this case, ALL tasks that enter the OOM path will be made to sleep on
the OOM waitqueue and wait for userspace to free resources or increase
the group's limit.  But a userspace OOM handler is prone to deadlock
itself on the locks held by the waiting tasks.  For example one of the
sleeping tasks may be stuck in a brk() call with the mmap_sem held for
writing but the userspace handler, in order to pick an optimal victim,
may need to read files from /proc/&lt;pid&gt;, which tries to acquire the same
mmap_sem for reading and deadlocks.

This patch changes the way tasks behave after detecting a memcg OOM and
makes sure nobody loops or sleeps with locks held:

1. When OOMing in a user fault, invoke the OOM killer and restart the
   fault instead of looping on the charge attempt.  This way, the OOM
   victim can not get stuck on locks the looping task may hold.

2. When OOMing in a user fault but somebody else is handling it
   (either the kernel OOM killer or a userspace handler), don't go to
   sleep in the charge context.  Instead, remember the OOMing memcg in
   the task struct and then fully unwind the page fault stack with
   -ENOMEM.  pagefault_out_of_memory() will then call back into the
   memcg code to check if the -ENOMEM came from the memcg, and then
   either put the task to sleep on the memcg's OOM waitqueue or just
   restart the fault.  The OOM victim can no longer get stuck on any
   lock a sleeping task may hold.

Debugged by Michal Hocko.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: azurIt &lt;azurit@pobox.sk&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcg: enable memcg OOM killer only for user faults</title>
<updated>2013-09-12T22:38:01Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2013-09-12T22:13:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=519e52473ebe9db5cdef44670d5a97f1fd53d721'/>
<id>urn:sha1:519e52473ebe9db5cdef44670d5a97f1fd53d721</id>
<content type='text'>
System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.

Enable the memcg OOM killer only for user faults, where it's really the
only option available.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: azurIt &lt;azurit@pobox.sk&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: migrate: add hugepage migration code to move_pages()</title>
<updated>2013-09-11T22:57:48Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-09-11T21:22:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e632a938d914d271bec26e570d36c755a1e35e4c'/>
<id>urn:sha1:e632a938d914d271bec26e570d36c755a1e35e4c</id>
<content type='text'>
Extend move_pages() to handle vma with VM_HUGETLB set.  We will be able to
migrate hugepage with move_pages(2) after applying the enablement patch
which comes later in this series.

We avoid getting refcount on tail pages of hugepage, because unlike thp,
hugepage is not split and we need not care about races with splitting.

And migration of larger (1GB for x86_64) hugepage are not enabled.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: move pgtable related functions to right place</title>
<updated>2013-09-11T22:57:30Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2013-09-11T21:21:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bc4b4448dba660afc8df3790564320302d9709a1'/>
<id>urn:sha1:bc4b4448dba660afc8df3790564320302d9709a1</id>
<content type='text'>
pgtable related functions are mostly in pgtable-generic.c.
So move remaining functions from memory.c to pgtable-generic.c.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge 3.11-rc6 into char-misc-next</title>
<updated>2013-08-19T03:40:33Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2013-08-19T03:40:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5bc0b123dcb2bb65b0b1ec57e591459dcf583d3d'/>
<id>urn:sha1:5bc0b123dcb2bb65b0b1ec57e591459dcf583d3d</id>
<content type='text'>
We want these fixes in this tree.

Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Fix TLB gather virtual address range invalidation corner cases</title>
<updated>2013-08-16T15:52:46Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-08-15T18:42:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2b047252d087be7f2ba088b4933cd904f92e6fce'/>
<id>urn:sha1:2b047252d087be7f2ba088b4933cd904f92e6fce</id>
<content type='text'>
Ben Tebulin reported:

 "Since v3.7.2 on two independent machines a very specific Git
  repository fails in 9/10 cases on git-fsck due to an SHA1/memory
  failures.  This only occurs on a very specific repository and can be
  reproduced stably on two independent laptops.  Git mailing list ran
  out of ideas and for me this looks like some very exotic kernel issue"

and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

That commit itself is not actually buggy, but what it does is to make it
much more likely to hit the partial TLB invalidation case, since it
introduces a new case in tlb_next_batch() that previously only ever
happened when running out of memory.

The real bug is that the TLB gather virtual memory range setup is subtly
buggered.  It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
enable tlb flush range in generic mmu_gather"), and the range handling
was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
range flushed when __tlb_remove_page() runs out of slots"), but that fix
was not complete.

The problem with the TLB gather virtual address range is that it isn't
set up by the initial tlb_gather_mmu() initialization (which didn't get
the TLB range information), but it is set up ad-hoc later by the
functions that actually flush the TLB.  And so any such case that forgot
to update the TLB range entries would potentially miss TLB invalidates.

Rather than try to figure out exactly which particular ad-hoc range
setup was missing (I personally suspect it's the hugetlb case in
zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
did), this patch just gets rid of the problem at the source: make the
TLB range information available to tlb_gather_mmu(), and initialize it
when initializing all the other tlb gather fields.

This makes the patch larger, but conceptually much simpler.  And the end
result is much more understandable; even if you want to play games with
partial ranges when invalidating the TLB contents in chunks, now the
range information is always there, and anybody who doesn't want to
bother with it won't introduce subtle bugs.

Ben verified that this fixes his problem.

Reported-bisected-and-tested-by: Ben Tebulin &lt;tebulin@googlemail.com&gt;
Build-testing-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Build-testing-by: Richard Weinberger &lt;richard.weinberger@gmail.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
