<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v4.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-09-30T22:26:52Z</updated>
<entry>
<title>mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()</title>
<updated>2016-09-30T22:26:52Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-09-30T22:11:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=22f2ac51b6d643666f4db093f13144f773ff3f3a'/>
<id>urn:sha1:22f2ac51b6d643666f4db093f13144f773ff3f3a</id>
<content type='text'>
Antonio reports the following crash when using fuse under memory pressure:

  kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: all of them
  CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
  RIP: shadow_lru_isolate+0x181/0x190
  Call Trace:
    __list_lru_walk_one.isra.3+0x8f/0x130
    list_lru_walk_one+0x23/0x30
    scan_shadow_nodes+0x34/0x50
    shrink_slab.part.40+0x1ed/0x3d0
    shrink_zone+0x2ca/0x2e0
    kswapd+0x51e/0x990
    kthread+0xd8/0xf0
    ret_from_fork+0x3f/0x70

which corresponds to the following sanity check in the shadow node
tracking:

  BUG_ON(node-&gt;count &amp; RADIX_TREE_COUNT_MASK);

The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere with
reclaim of the radix tree node under memory pressure.

While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page.  Eventually the page count bits in node-&gt;count underflow while
leaving the node incorrectly linked to the shadow node LRU.

To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert().  This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.

Also, make the sanity checks a bit less obscure by using the helpers for
checking the number of pages and shadows in a radix tree node.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Antonio SJ Musumeci &lt;trapexit@spawn.link&gt;
Debugged-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.15+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mem-hotplug: use nodes that contain memory as mask in new_node_page()</title>
<updated>2016-09-28T23:19:02Z</updated>
<author>
<name>Li Zhong</name>
<email>zhong@linux.vnet.ibm.com</email>
</author>
<published>2016-09-28T22:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=231e97e2b8ec9a1556ced5d8a89cda03a480b179'/>
<id>urn:sha1:231e97e2b8ec9a1556ced5d8a89cda03a480b179</id>
<content type='text'>
9bb627be47a5 ("mem-hotplug: don't clear the only node in new_node_page()")
prevents allocating from an empty nodemask, but as David points out, it is
still wrong.  As node_online_map may include memoryless nodes, only
allocating from these nodes is meaningless.

This patch uses node_states[N_MEMORY] mask to prevent the above case.

Fixes: 9bb627be47a5 ("mem-hotplug: don't clear the only node in new_node_page()")
Fixes: 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline")
Link: http://lkml.kernel.org/r/1474447117.28370.6.camel@TP420
Signed-off-by: Li Zhong &lt;zhong@linux.vnet.ibm.com&gt;
Suggested-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm,ksm: fix endless looping in allocating memory when ksm enable</title>
<updated>2016-09-28T23:19:01Z</updated>
<author>
<name>zhong jiang</name>
<email>zhongjiang@huawei.com</email>
</author>
<published>2016-09-28T22:22:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b398e416e880159fe55eefd93c6588fa072cd66'/>
<id>urn:sha1:5b398e416e880159fe55eefd93c6588fa072cd66</id>
<content type='text'>
I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.

Call trace:
[&lt;ffffffc000086a88&gt;] __switch_to+0x74/0x8c
[&lt;ffffffc000a1bae0&gt;] __schedule+0x23c/0x7bc
[&lt;ffffffc000a1c09c&gt;] schedule+0x3c/0x94
[&lt;ffffffc000a1eb84&gt;] rwsem_down_write_failed+0x214/0x350
[&lt;ffffffc000a1e32c&gt;] down_write+0x64/0x80
[&lt;ffffffc00021f794&gt;] __ksm_exit+0x90/0x19c
[&lt;ffffffc0000be650&gt;] mmput+0x118/0x11c
[&lt;ffffffc0000c3ec4&gt;] do_exit+0x2dc/0xa74
[&lt;ffffffc0000c46f8&gt;] do_group_exit+0x4c/0xe4
[&lt;ffffffc0000d0f34&gt;] get_signal+0x444/0x5e0
[&lt;ffffffc000089fcc&gt;] do_signal+0x1d8/0x450
[&lt;ffffffc00008a35c&gt;] do_notify_resume+0x70/0x78

The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator

ksm_do_scan
	scan_get_next_rmap_item
		down_read
		get_next_rmap_item
			alloc_rmap_item   #ksmd will loop permanently.

There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.

Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.

While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.

Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang &lt;zhongjiang@huawei.com&gt;
Suggested-by: Hugh Dickins &lt;hughd@google.com&gt;
Suggested-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing</title>
<updated>2016-09-25T22:43:42Z</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lstoakes@gmail.com</email>
</author>
<published>2016-09-11T22:54:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=38e088546522e1e86d2b8f401a1354ad3a9b3303'/>
<id>urn:sha1:38e088546522e1e86d2b8f401a1354ad3a9b3303</id>
<content type='text'>
The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
PMDs respectively as requiring balancing upon a subsequent page fault.
User-defined PROT_NONE memory regions which also have this flag set will
not normally invoke the NUMA balancing code as do_page_fault() will send
a segfault to the process before handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of
memory, handle_mm_fault() is called via faultin_page() and
__get_user_pages() without any access checks being performed, meaning
the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
region.

A simple means of triggering this problem is to access PROT_NONE mmap'd
memory using /proc/self/mem which reliably results in the NUMA handling
functions being invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some
simple repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
added at commit c0e7cad to avoid accidentally provoking strange
behaviour by attempting to apply NUMA balancing to pages that are in
fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than
invoking BUG_ON() as faulting in these pages via faultin_page() is a
valid reason for reaching the NUMA check with the PROT_NONE page table
flag set and is therefore not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
Reported-by: Trevor Saunders &lt;tbsaunde@tbsaunde.org&gt;
Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'hughd-fixes' (patches from Hugh Dickins)</title>
<updated>2016-09-24T18:31:45Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-09-24T18:31:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f26574178f6c698e5d76e66ca68a95cc35eef9f'/>
<id>urn:sha1:0f26574178f6c698e5d76e66ca68a95cc35eef9f</id>
<content type='text'>
Merge VM fixes from High Dickins:
 "I get the impression that Andrew is away or busy at the moment, so I'm
  going to send you three independent uncontroversial little mm fixes
  directly - though none is strictly a 4.8 regression fix.

   - shmem: fix tmpfs to handle the huge= option properly from Toshi
     Kani is a one-liner to fix a major embarrassment in 4.8's hugepages
     on tmpfs feature: although Hillf pointed it out in June, somehow
     both Kirill and I repeatedly dropped the ball on this one.  You
     might wonder if the feature got tested at all with that bug in:
     yes, it did, but for wider testing coverage, Kirill and I had each
     relied too much on an override which bypasses that condition.

   - huge tmpfs: fix Committed_AS leak just a run-of-the-mill accounting
     fix in the same feature.

   - mm: delete unnecessary and unsafe init_tlb_ubc() is an unrelated
     fix to 4.3's TLB flush batching in reclaim: the bug would be rare,
     and none of us will be shamed if this one misses 4.8; but it got
     such a quick ack from Mel today that I'm inclined to offer it along
     with the first two"

* emailed patches from Hugh Dickins &lt;hughd@google.com&gt;:
  mm: delete unnecessary and unsafe init_tlb_ubc()
  huge tmpfs: fix Committed_AS leak
  shmem: fix tmpfs to handle the huge= option properly
</content>
</entry>
<entry>
<title>mm: delete unnecessary and unsafe init_tlb_ubc()</title>
<updated>2016-09-24T18:20:01Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2016-09-24T03:27:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b385d21f27d86426472f6ae92a231095f7de2a8d'/>
<id>urn:sha1:b385d21f27d86426472f6ae92a231095f7de2a8d</id>
<content type='text'>
init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically
initialized with zeroes in the init_task, and copied from parent to
child while it is quiescent in arch_dup_task_struct(); so I went to
delete it.

But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to
check that it was always empty at that point, and found them firing:
because memcg reclaim can recurse into global reclaim (when allocating
biosets for swapout in my case), and arrive back at the init_tlb_ubc()
in shrink_node_memcg().

Resetting tlb_ubc.flush_required at that point is wrong: if the upper
level needs a deferred TLB flush, but the lower level turns out not to,
we miss a TLB flush.  But fortunately, that's the only part of the
protocol that does not nest: with the initialization removed, cpumask
collects bits from upper and lower levels, and flushes TLB when needed.

Fixes: 72b252aed506 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>huge tmpfs: fix Committed_AS leak</title>
<updated>2016-09-24T18:20:01Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2016-09-24T03:24:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=71664665c3e3ca5ff61ef5fc65480f82cd575eb2'/>
<id>urn:sha1:71664665c3e3ca5ff61ef5fc65480f82cd575eb2</id>
<content type='text'>
Under swapping load on huge tmpfs, /proc/meminfo's Committed_AS grows
bigger and bigger: just a cosmetic issue for most users, but disabling
for those who run without overcommit (/proc/sys/vm/overcommit_memory 2).

shmem_uncharge() was forgetting to unaccount __vm_enough_memory's
charge, and shmem_charge() was forgetting it on the filesystem-full
error path.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>shmem: fix tmpfs to handle the huge= option properly</title>
<updated>2016-09-24T18:20:01Z</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hpe.com</email>
</author>
<published>2016-09-24T03:21:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3089bf614c7e2fd441ee001e3ff3d18326f6f091'/>
<id>urn:sha1:3089bf614c7e2fd441ee001e3ff3d18326f6f091</id>
<content type='text'>
shmem_get_unmapped_area() checks SHMEM_SB(sb)-&gt;huge incorrectly, which
leads to a reversed effect of "huge=" mount option.

Fix the check in shmem_get_unmapped_area().

Note, the default value of SHMEM_SB(sb)-&gt;huge remains as
SHMEM_HUGE_NEVER.  User will need to specify "huge=" option to enable
huge page mappings.

Reported-by: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Signed-off-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: usercopy: Check for module addresses</title>
<updated>2016-09-20T23:07:39Z</updated>
<author>
<name>Laura Abbott</name>
<email>labbott@redhat.com</email>
</author>
<published>2016-09-20T15:56:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa4f0601115319a52c80f468c8f007e5aa9277cb'/>
<id>urn:sha1:aa4f0601115319a52c80f468c8f007e5aa9277cb</id>
<content type='text'>
While running a compile on arm64, I hit a memory exposure

usercopy: kernel memory exposure attempt detected from fffffc0000f3b1a8 (buffer_head) (1 bytes)
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:75!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
llc ebtable_nat ip6table_security ip6table_raw ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
iptable_security iptable_raw iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
ebtable_filter ebtables ip6table_filter ip6_tables vfat fat xgene_edac
xgene_enet edac_core i2c_xgene_slimpro i2c_core at803x realtek xgene_dma
mdio_xgene gpio_dwapb gpio_xgene_sb xgene_rng mailbox_xgene_slimpro nfsd
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sdhci_of_arasan
sdhci_pltfm sdhci mmc_core xhci_plat_hcd gpio_keys
CPU: 0 PID: 19744 Comm: updatedb Tainted: G        W 4.8.0-rc3-threadinfo+ #1
Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.12 Aug 12 2016
task: fffffe03df944c00 task.stack: fffffe00d128c000
PC is at __check_object_size+0x70/0x3f0
LR is at __check_object_size+0x70/0x3f0
...
[&lt;fffffc00082b4280&gt;] __check_object_size+0x70/0x3f0
[&lt;fffffc00082cdc30&gt;] filldir64+0x158/0x1a0
[&lt;fffffc0000f327e8&gt;] __fat_readdir+0x4a0/0x558 [fat]
[&lt;fffffc0000f328d4&gt;] fat_readdir+0x34/0x40 [fat]
[&lt;fffffc00082cd8f8&gt;] iterate_dir+0x190/0x1e0
[&lt;fffffc00082cde58&gt;] SyS_getdents64+0x88/0x120
[&lt;fffffc0008082c70&gt;] el0_svc_naked+0x24/0x28

fffffc0000f3b1a8 is a module address. Modules may have compiled in
strings which could get copied to userspace. In this instance, it
looks like "." which matches with a size of 1 byte. Extend the
is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
all possible cases.

Signed-off-by: Laura Abbott &lt;labbott@redhat.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting</title>
<updated>2016-09-19T22:36:17Z</updated>
<author>
<name>Johannes Weiner</name>
<email>jweiner@fb.com</email>
</author>
<published>2016-09-19T21:44:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=db2ba40c277dc545bab531671c3f45ac0afea6f8'/>
<id>urn:sha1:db2ba40c277dc545bab531671c3f45ac0afea6f8</id>
<content type='text'>
During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.

The problem turns out to be the per-cpu charge cache.  Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for all
memory that is being tracked.  Those caches are safe against scheduling
preemption, but not against interrupts - such as the newly added packet
receive path.  When cache draining is interrupted by network RX taking
pages out of the cache, the resuming drain operation will put references
of in-use pages, thus causing the imbalance.

Disable IRQs during all per-cpu charge cache operations.

Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Link: http://lkml.kernel.org/r/20160914194846.11153-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Vladimir Davydov &lt;vdavydov@virtuozzo.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
