<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/page_alloc.c, branch v4.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-09-02T00:52:01Z</updated>
<entry>
<title>mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator</title>
<updated>2016-09-02T00:52:01Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2016-09-01T23:14:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6aa303defb7454a2520c4ddcdf6b081f62a15890'/>
<id>urn:sha1:6aa303defb7454a2520c4ddcdf6b081f62a15890</id>
<content type='text'>
Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts
of memory when booting a secondary kernel.  Srikar Dronamraju reported
that multiple nodes may have no memory managed by the buddy allocator
but still return true for populated_zone().

Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of
nodes") was reported to cause kswapd to spin at 100% CPU usage when
fadump was enabled.  The old code happened to deal with the situation of
a populated node with zero free pages by co-incidence but the current
code tries to reclaim populated zones without realising that is
impossible.

We cannot just convert populated_zone() as many existing users really
need to check for present_pages.  This patch introduces a managed_zone()
helper and uses it in the few cases where it is critical that the check
is made for managed pages -- zonelist construction and page reclaim.

Link: http://lkml.kernel.org/r/20160831195104.GB8119@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Reported-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Tested-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, oom: prevent premature OOM killer invocation for high order request</title>
<updated>2016-09-02T00:52:01Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-09-01T23:14:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6b4e3181d7bd5ca5ab6f45929e4a5ffa7ab4ab7f'/>
<id>urn:sha1:6b4e3181d7bd5ca5ab6f45929e4a5ffa7ab4ab7f</id>
<content type='text'>
There have been several reports about pre-mature OOM killer invocation
in 4.7 kernel when order-2 allocation request (for the kernel stack)
invoked OOM killer even during basic workloads (light IO or even kernel
compile on some filesystems).  In all reported cases the memory is
fragmented and there are no order-2+ pages available.  There is usually
a large amount of slab memory (usually dentries/inodes) and further
debugging has shown that there are way too many unmovable blocks which
are skipped during the compaction.  Multiple reporters have confirmed
that the current linux-next which includes [1] and [2] helped and OOMs
are not reproducible anymore.

A simpler fix for the late rc and stable is to simply ignore the
compaction feedback and retry as long as there is a reclaim progress and
we are not getting OOM for order-0 pages.  We already do that for
CONFING_COMPACTION=n so let's reuse the same code when compaction is
enabled as well.

[1] http://lkml.kernel.org/r/20160810091226.6709-1-vbabka@suse.cz
[2] http://lkml.kernel.org/r/f7a9ea9d-bb88-bfd6-e340-3a933559305a@suse.cz

Fixes: 0a0337e0d1d1 ("mm, oom: rework oom detection")
Link: http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.cz
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: Olaf Hering &lt;olaf@aepfle.de&gt;
Tested-by: Ralf-Peter Rohbeck &lt;Ralf-Peter.Rohbeck@quantum.com&gt;
Cc: Markus Trippelsdorf &lt;markus@trippelsdorf.de&gt;
Cc: Arkadiusz Miskiewicz &lt;a.miskiewicz@gmail.com&gt;
Cc: Ralf-Peter Rohbeck &lt;Ralf-Peter.Rohbeck@quantum.com&gt;
Cc: Jiri Slaby &lt;jslaby@suse.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Joonsoo Kim &lt;js1304@gmail.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.7.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc, meminfo: use correct helpers for calculating LRU sizes in meminfo</title>
<updated>2016-08-11T23:58:13Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2016-08-11T22:32:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2f95ff90b99600f53df4a0aa652322d349d67957'/>
<id>urn:sha1:2f95ff90b99600f53df4a0aa652322d349d67957</id>
<content type='text'>
meminfo_proc_show() and si_mem_available() are using the wrong helpers
for calculating the size of the LRUs.  The user-visible impact is that
there appears to be an abnormally high number of unevictable pages.

Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc.c: recalculate some of node threshold when on/offline memory</title>
<updated>2016-08-10T23:40:56Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2016-08-10T23:27:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6423aa8192c596848e1b23bd4193dc0924e7274d'/>
<id>urn:sha1:6423aa8192c596848e1b23bd4193dc0924e7274d</id>
<content type='text'>
Some of node threshold depends on number of managed pages in the node.
When memory is going on/offline, it can be changed and we need to adjust
them.

Add recalculation to appropriate places and clean-up related functions
for better maintenance.

Link: http://lkml.kernel.org/r/1470724248-26780-2-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc.c: fix wrong initialization when sysctl_min_unmapped_ratio changes</title>
<updated>2016-08-10T23:40:56Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2016-08-10T23:27:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=81cbcbc2d810c0ce49fba81f864302e1afe5ff27'/>
<id>urn:sha1:81cbcbc2d810c0ce49fba81f864302e1afe5ff27</id>
<content type='text'>
Before resetting min_unmapped_pages, we need to initialize
min_unmapped_pages rather than min_slab_pages.

Fixes: a5f5f91da6 (mm: convert zone_reclaim to node_reclaim)
Link: http://lkml.kernel.org/r/1470724248-26780-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: only mark charged pages with PageKmemcg</title>
<updated>2016-08-09T17:14:10Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@virtuozzo.com</email>
</author>
<published>2016-08-08T20:03:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c4159a75b64c0e67caededf4d7372c1b58a5f42a'/>
<id>urn:sha1:c4159a75b64c0e67caededf4d7372c1b58a5f42a</id>
<content type='text'>
To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page-&gt;_mapcount to -512.  Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true.  The latter is set iff there are kmem-enabled memory cgroups
(online or offline).  The root cgroup is not considered kmem-enabled.

As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.

  # no memory cgroups has been created yet, create one
  mkdir /sys/fs/cgroup/memory/test
  # run something allocating pages with __GFP_ACCOUNT, e.g.
  # a program using pipe
  dmesg | tail
  # remove the memory cgroup
  rmdir /sys/fs/cgroup/memory/test

we'll get bad page state bug complaining about page-&gt;_mapcount != -1:

  BUG: Bad page state in process swapper/0  pfn:1fd945c
  page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
  flags: 0x1000000000000000()

To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.

Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Vladimir Davydov &lt;vdavydov@virtuozzo.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: initialise per_cpu_nodestats for all online pgdats at boot</title>
<updated>2016-08-05T00:02:09Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2016-08-04T22:31:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4911ea2bcf52345bf3dc73292b8c4676f6192c6'/>
<id>urn:sha1:b4911ea2bcf52345bf3dc73292b8c4676f6192c6</id>
<content type='text'>
Paul Mackerras and Reza Arbab reported that machines with memoryless
nodes fail when vmstats are refreshed.  Paul reported an oops as follows

  Unable to handle kernel paging request for data at address 0xff7a10000
  Faulting instruction address: 0xc000000000270cd0
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-kvm+ #118
  task: c000000ff0680010 task.stack: c000000ff0704000
  NIP: c000000000270cd0 LR: c000000000270ce8 CTR: 0000000000000000
  REGS: c000000ff0707900 TRAP: 0300   Not tainted  (4.7.0-kvm+)
  MSR: 9000000102009033 &lt;SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]&gt;  CR: 846b6824  XER: 20000000
  CFAR: c000000000008768 DAR: 0000000ff7a10000 DSISR: 42000000 SOFTE: 1
  NIP refresh_zone_stat_thresholds+0x80/0x240
  LR refresh_zone_stat_thresholds+0x98/0x240
  Call Trace:
    refresh_zone_stat_thresholds+0xb8/0x240 (unreliable)

Both supplied potential fixes but one potentially misses checks and
another had redundant initialisations.  This version initialises
per_cpu_nodestats on a per-pgdat basis instead of on a per-zone basis.

Link: http://lkml.kernel.org/r/20160804092404.GI2799@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Reported-by: Paul Mackerras &lt;paulus@ozlabs.org&gt;
Reported-by: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Tested-by: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: replace obsolete _refok by __ref</title>
<updated>2016-08-02T21:31:41Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2016-08-02T21:03:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bd721ea73e1f965569b40620538c942001f76294'/>
<id>urn:sha1:bd721ea73e1f965569b40620538c942001f76294</id>
<content type='text'>
There was only one use of __initdata_refok and __exit_refok

__init_refok was used 46 times against 82 for __ref.

Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")

This patch removes the following compatibility definitions and replaces
them treewide.

/* compatibility defines */
#define __init_refok     __ref
#define __initdata_refok __refdata
#define __exit_refok     __ref

I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, compaction: simplify contended compaction handling</title>
<updated>2016-07-28T23:07:41Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2016-07-28T22:49:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c3486f5376696034d0fcbef8ba70c70cfcb26f51'/>
<id>urn:sha1:c3486f5376696034d0fcbef8ba70c70cfcb26f51</id>
<content type='text'>
Async compaction detects contention either due to failing trylock on
zone-&gt;lock or lru_lock, or by need_resched().  Since 1f9efdef4f3f ("mm,
compaction: khugepaged should not give up due to need_resched()") the
code got quite complicated to distinguish these two up to the
__alloc_pages_slowpath() level, so different decisions could be taken
for khugepaged allocations.

After the recent changes, khugepaged allocations don't check for
contended compaction anymore, so we again don't need to distinguish lock
and sched contention, and simplify the current convoluted code a lot.

However, I believe it's also possible to simplify even more and
completely remove the check for contended compaction after the initial
async compaction for costly orders, which was originally aimed at THP
page fault allocations.  There are several reasons why this can be done
now:

- with the new defaults, THP page faults no longer do reclaim/compaction at
  all, unless the system admin has overridden the default, or application has
  indicated via madvise that it can benefit from THP's. In both cases, it
  means that the potential extra latency is expected and worth the benefits.
- even if reclaim/compaction proceeds after this patch where it previously
  wouldn't, the second compaction attempt is still async and will detect the
  contention and back off, if the contention persists
- there are still heuristics like deferred compaction and pageblock skip bits
  in place that prevent excessive THP page fault latencies

Link: http://lkml.kernel.org/r/20160721073614.24395-9-vbabka@suse.cz
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, compaction: introduce direct compaction priority</title>
<updated>2016-07-28T23:07:41Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2016-07-28T22:49:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a5508cd83f10f663e05d212cb81f600a3af46e40'/>
<id>urn:sha1:a5508cd83f10f663e05d212cb81f600a3af46e40</id>
<content type='text'>
In the context of direct compaction, for some types of allocations we
would like the compaction to either succeed or definitely fail while
trying as hard as possible.  Current async/sync_light migration mode is
insufficient, as there are heuristics such as caching scanner positions,
marking pageblocks as unsuitable or deferring compaction for a zone.  At
least the final compaction attempt should be able to override these
heuristics.

To communicate how hard compaction should try, we replace migration mode
with a new enum compact_priority and change the relevant function
signatures.  In compact_zone_order() where struct compact_control is
constructed, the priority is mapped to suitable control flags.  This
patch itself has no functional change, as the current priority levels
are mapped back to the same migration modes as before.  Expanding them
will be done next.

Note that !CONFIG_COMPACTION variant of try_to_compact_pages() is
removed, as the only caller exists under CONFIG_COMPACTION.

Link: http://lkml.kernel.org/r/20160721073614.24395-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
