<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/page_alloc.c, branch v3.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-09-30T21:31:01Z</updated>
<entry>
<title>revert "mm/memory-hotplug: fix lowmem count overflow when offline pages"</title>
<updated>2013-09-30T21:31:01Z</updated>
<author>
<name>Joonyoung Shim</name>
<email>jy0922.shim@samsung.com</email>
</author>
<published>2013-09-30T20:45:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7393dc45f6ed5d3aba43b06d49eb3b15f1318906'/>
<id>urn:sha1:7393dc45f6ed5d3aba43b06d49eb3b15f1318906</id>
<content type='text'>
This reverts commit cea27eb2a202 ("mm/memory-hotplug: fix lowmem count
overflow when offline pages").

The fixed bug by commit cea27eb was fixed to another way by commit
3dcc0571cd64 ("mm: correctly update zone-&gt;managed_pages").  That commit
enhances memory_hotplug.c to adjust totalhigh_pages when hot-removing
memory, for details please refer to:

  http://marc.info/?l=linux-mm&amp;m=136957578620221&amp;w=2

As a result, commit cea27eb2a202 currently causes duplicated decreasing
of totalhigh_pages, thus the revert.

Signed-off-by: Joonyoung Shim &lt;jy0922.shim@samsung.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Jiang Liu &lt;liuj97@gmail.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Bartlomiej Zolnierkiewicz &lt;b.zolnierkie@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: correct the comment about the value for buddy _mapcount</title>
<updated>2013-09-11T22:58:06Z</updated>
<author>
<name>Wang Sheng-Hui</name>
<email>shhuiw@gmail.com</email>
</author>
<published>2013-09-11T21:22:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf6fe945389e674130dc7564392930cf7fb9f5e8'/>
<id>urn:sha1:cf6fe945389e674130dc7564392930cf7fb9f5e8</id>
<content type='text'>
Set _mapcount PAGE_BUDDY_MAPCOUNT_VALUE to make the page buddy.  Not the
magic number -2.

Signed-off-by: Wang Sheng-Hui &lt;shhuiw@gmail.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmscan: fix do_try_to_free_pages() livelock</title>
<updated>2013-09-11T22:58:01Z</updated>
<author>
<name>Lisa Du</name>
<email>cldu@marvell.com</email>
</author>
<published>2013-09-11T21:22:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e543d5780e36ff5ee56c44d7e2e30db3457a7ed'/>
<id>urn:sha1:6e543d5780e36ff5ee56c44d7e2e30db3457a7ed</id>
<content type='text'>
This patch is based on KOSAKI's work and I add a little more description,
please refer https://lkml.org/lkml/2012/6/14/74.

Currently, I found system can enter a state that there are lots of free
pages in a zone but only order-0 and order-1 pages which means the zone is
heavily fragmented, then high order allocation could make direct reclaim
path's long stall(ex, 60 seconds) especially in no swap and no compaciton
enviroment.  This problem happened on v3.4, but it seems issue still lives
in current tree, the reason is do_try_to_free_pages enter live lock:

kswapd will go to sleep if the zones have been fully scanned and are still
not balanced.  As kswapd thinks there's little point trying all over again
to avoid infinite loop.  Instead it changes order from high-order to
0-order because kswapd think order-0 is the most important.  Look at
73ce02e9 in detail.  If watermarks are ok, kswapd will go back to sleep
and may leave zone-&gt;all_unreclaimable =3D 0.  It assume high-order users
can still perform direct reclaim if they wish.

Direct reclaim continue to reclaim for a high order which is not a
COSTLY_ORDER without oom-killer until kswapd turn on
zone-&gt;all_unreclaimble= .  This is because to avoid too early oom-kill.
So it means direct_reclaim depends on kswapd to break this loop.

In worst case, direct-reclaim may continue to page reclaim forever when
kswapd sleeps forever until someone like watchdog detect and finally kill
the process.  As described in:
http://thread.gmane.org/gmane.linux.kernel.mm/103737

We can't turn on zone-&gt;all_unreclaimable from direct reclaim path because
direct reclaim path don't take any lock and this way is racy.  Thus this
patch removes zone-&gt;all_unreclaimable field completely and recalculates
zone reclaimable state every time.

Note: we can't take the idea that direct-reclaim see zone-&gt;pages_scanned
directly and kswapd continue to use zone-&gt;all_unreclaimable.  Because, it
is racy.  commit 929bea7c71 (vmscan: all_unreclaimable() use
zone-&gt;all_unreclaimable as a name) describes the detail.

[akpm@linux-foundation.org: uninline zone_reclaimable_pages() and zone_reclaimable()]
Cc: Aaditya Kumar &lt;aaditya.kumar.30@gmail.com&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Cc: Nick Piggin &lt;npiggin@gmail.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Bob Liu &lt;lliubbo@gmail.com&gt;
Cc: Neil Zhang &lt;zhangwm@marvell.com&gt;
Cc: Russell King - ARM Linux &lt;linux@arm.linux.org.uk&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Lisa Du &lt;cldu@marvell.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_alloc: fix comment get_page_from_freelist</title>
<updated>2013-09-11T22:57:56Z</updated>
<author>
<name>SeungHun Lee</name>
<email>waydi1@gmail.com</email>
</author>
<published>2013-09-11T21:22:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3b11f0aaae830f0f569cb8fb7fd26f4133ebdabd'/>
<id>urn:sha1:3b11f0aaae830f0f569cb8fb7fd26f4133ebdabd</id>
<content type='text'>
cpuset_zone_allowed is changed to cpuset_zone_allowed_softwall and the
comment is moved to __cpuset_node_allowed_softwall.  So fix this comment.

Signed-off-by: SeungHun Lee &lt;waydi1@gmail.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memblock, numa: binary search node id</title>
<updated>2013-09-11T22:57:51Z</updated>
<author>
<name>Yinghai Lu</name>
<email>yinghai@kernel.org</email>
</author>
<published>2013-09-11T21:22:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e76b63f80d938a1319eb5fb0ae7ea69bddfbae38'/>
<id>urn:sha1:e76b63f80d938a1319eb5fb0ae7ea69bddfbae38</id>
<content type='text'>
Current early_pfn_to_nid() on arch that support memblock go over
memblock.memory one by one, so will take too many try near the end.

We can use existing memblock_search to find the node id for given pfn,
that could save some time on bigger system that have many entries
memblock.memory array.

Here are the timing differences for several machines.  In each case with
the patch less time was spent in __early_pfn_to_nid().

                        3.11-rc5        with patch      difference (%)
                        --------        ----------      --------------
UV1: 256 nodes  9TB:     411.66          402.47         -9.19 (2.23%)
UV2: 255 nodes 16TB:    1141.02         1138.12         -2.90 (0.25%)
UV2:  64 nodes  2TB:     128.15          126.53         -1.62 (1.26%)
UV2:  32 nodes  2TB:     121.87          121.07         -0.80 (0.66%)
                        Time in seconds.

Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Russ Anderson &lt;rja@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memory-hotplug: enable memory hotplug to handle hugepage</title>
<updated>2013-09-11T22:57:48Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-09-11T21:22:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8721bbbdd36382de51cd6b7a56322e0acca2414'/>
<id>urn:sha1:c8721bbbdd36382de51cd6b7a56322e0acca2414</id>
<content type='text'>
Until now we can't offline memory blocks which contain hugepages because a
hugepage is considered as an unmovable page.  But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.

What's different from other users of hugepage migration is that we need to
decompose all the hugepages inside the target memory block into free buddy
pages after hugepage migration, because otherwise free hugepages remaining
in the memory block intervene the memory offlining.  For this reason we
introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().

Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().

As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block.  So we now simply leave
it to fail as it is.

[yongjun_wei@trendmicro.com.cn: remove duplicated include]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use zone_is_empty() instead of if(zone-&gt;spanned_pages)</title>
<updated>2013-09-11T22:57:38Z</updated>
<author>
<name>Xishi Qiu</name>
<email>qiuxishi@huawei.com</email>
</author>
<published>2013-09-11T21:21:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8080fc038e91265e1002df7cae805fc17bb772fc'/>
<id>urn:sha1:8080fc038e91265e1002df7cae805fc17bb772fc</id>
<content type='text'>
Use "zone_is_empty()" instead of "if (zone-&gt;spanned_pages)".
Simplify the code, no functional change.

Signed-off-by: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Cody P Schafer &lt;cody@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmstat: create separate function to fold per cpu diffs into local counters</title>
<updated>2013-09-11T22:57:31Z</updated>
<author>
<name>Christoph Lameter</name>
<email>cl@linux.com</email>
</author>
<published>2013-09-11T21:21:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2bb921e526656556e68f99f5f15a4a1bf2691844'/>
<id>urn:sha1:2bb921e526656556e68f99f5f15a4a1bf2691844</id>
<content type='text'>
The main idea behind this patchset is to reduce the vmstat update overhead
by avoiding interrupt enable/disable and the use of per cpu atomics.

This patch (of 3):

It is better to have a separate folding function because
refresh_cpu_vm_stats() also does other things like expire pages in the
page allocator caches.

If we have a separate function then refresh_cpu_vm_stats() is only called
from the local cpu which allows additional optimizations.

The folding function is only called when a cpu is being downed and
therefore no other processor will be accessing the counters.  Also
simplifies synchronization.

[akpm@linux-foundation.org: fix UP build]
Signed-off-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
CC: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Joonsoo Kim &lt;js1304@gmail.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, page_alloc: add unlikely macro to help compiler optimization</title>
<updated>2013-09-11T22:57:29Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2013-09-11T21:21:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e66f09725771ac5b1b868d6b19eba84e30ffad88'/>
<id>urn:sha1:e66f09725771ac5b1b868d6b19eba84e30ffad88</id>
<content type='text'>
We rarely allocate a page with ALLOC_NO_WATERMARKS and it is used in slow
path.  For helping compiler optimization, add unlikely macro to
ALLOC_NO_WATERMARKS checking.

This patch doesn't have any effect now, because gcc already optimize this
properly.  But we cannot assume that gcc always does right and nobody
re-evaluate if gcc do proper optimization with their change, for example,
it is not optimized properly on v3.10.  So adding compiler hint here is
reasonable.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_alloc: fair zone allocator policy</title>
<updated>2013-09-11T22:57:23Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2013-09-11T21:20:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=81c0a2bb515fd4daae8cab64352877480792b515'/>
<id>urn:sha1:81c0a2bb515fd4daae8cab64352877480792b515</id>
<content type='text'>
Each zone that holds userspace pages of one workload must be aged at a
speed proportional to the zone size.  Otherwise, the time an individual
page gets to stay in memory depends on the zone it happened to be
allocated in.  Asymmetry in the zone aging creates rather unpredictable
aging behavior and results in the wrong pages being reclaimed, activated
etc.

But exactly this happens right now because of the way the page allocator
and kswapd interact.  The page allocator uses per-node lists of all zones
in the system, ordered by preference, when allocating a new page.  When
the first iteration does not yield any results, kswapd is woken up and the
allocator retries.  Due to the way kswapd reclaims zones below the high
watermark while a zone can be allocated from when it is above the low
watermark, the allocator may keep kswapd running while kswapd reclaim
ensures that the page allocator can keep allocating from the first zone in
the zonelist for extended periods of time.  Meanwhile the other zones
rarely see new allocations and thus get aged much slower in comparison.

The result is that the occasional page placed in lower zones gets
relatively more time in memory, even gets promoted to the active list
after its peers have long been evicted.  Meanwhile, the bulk of the
working set may be thrashing on the preferred zone even though there may
be significant amounts of memory available in the lower zones.

Even the most basic test -- repeatedly reading a file slightly bigger than
memory -- shows how broken the zone aging is.  In this scenario, no single
page should be able stay in memory long enough to get referenced twice and
activated, but activation happens in spades:

  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 0
      nr_inactive_file 0
      nr_active_file 8
      nr_inactive_file 1582
      nr_active_file 11994
  $ cat data data data data &gt;/dev/null
  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 70
      nr_inactive_file 258753
      nr_active_file 443214
      nr_inactive_file 149793
      nr_active_file 12021

Fix this with a very simple round robin allocator.  Each zone is allowed a
batch of allocations that is proportional to the zone's size, after which
it is treated as full.  The batch counters are reset when all zones have
been tried and the allocator enters the slowpath and kicks off kswapd
reclaim.  Allocation and reclaim is now fairly spread out to all
available/allowable zones:

  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 0
      nr_inactive_file 174
      nr_active_file 4865
      nr_inactive_file 53
      nr_active_file 860
  $ cat data data data data &gt;/dev/null
  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 0
      nr_inactive_file 666622
      nr_active_file 4988
      nr_inactive_file 190969
      nr_active_file 937

When zone_reclaim_mode is enabled, allocations will now spread out to all
zones on the local node, not just the first preferred zone (which on a 4G
node might be a tiny Normal zone).

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Paul Bolle &lt;paul.bollee@gmail.com&gt;
Cc: Zlatko Calusic &lt;zcalusic@bitsync.net&gt;
Tested-by: Kevin Hilman &lt;khilman@linaro.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
