<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/memory_hotplug.c, branch v3.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-11-13T03:09:09Z</updated>
<entry>
<title>mem-hotplug: introduce movable_node boot option</title>
<updated>2013-11-13T03:09:09Z</updated>
<author>
<name>Tang Chen</name>
<email>tangchen@cn.fujitsu.com</email>
</author>
<published>2013-11-12T23:08:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c5320926e370b4cfb8f10c2169e26f960079cf67'/>
<id>urn:sha1:c5320926e370b4cfb8f10c2169e26f960079cf67</id>
<content type='text'>
The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel, it
cannot be hot-removed.  So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE.  By doing this, the
kernel cannot use memory in movable nodes.  This will cause NUMA
performance down.  And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movable_node boot option to allow users to
choose to not to consume hotpluggable memory at early boot time and later
we can set it as ZONE_MOVABLE.

To achieve this, the movable_node boot option will control the memblock
allocation direction.  That said, after memblock is ready, before SRAT is
parsed, we should allocate memory near the kernel image as we explained in
the previous patches.  So if movable_node boot option is set, the kernel
does the following:

1. After memblock is ready, make memblock allocate memory bottom up.
2. After SRAT is parsed, make memblock behave as default, allocate memory
   top down.

Users can specify "movable_node" in kernel commandline to enable this
functionality.  For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything.  The kernel
will work as before.

Signed-off-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Signed-off-by: Zhang Yanfei &lt;zhangyanfei@cn.fujitsu.com&gt;
Suggested-by: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Suggested-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Toshi Kani &lt;toshi.kani@hp.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Thomas Renninger &lt;trenn@suse.de&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Cc: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Taku Izumi &lt;izumi.taku@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameter</title>
<updated>2013-11-13T03:09:06Z</updated>
<author>
<name>Zhang Yanfei</name>
<email>zhangyanfei@cn.fujitsu.com</email>
</author>
<published>2013-11-12T23:07:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=85b35feaecd4d2284505b22708795bc1f03fc897'/>
<id>urn:sha1:85b35feaecd4d2284505b22708795bc1f03fc897</id>
<content type='text'>
For below functions,

- sparse_add_one_section()
- kmalloc_section_memmap()
- __kmalloc_section_memmap()
- __kfree_section_memmap()

they are always invoked to operate on one memory section, so it is
redundant to always pass a nr_pages parameter, which is the page numbers
in one section.  So we can directly use predefined macro PAGES_PER_SECTION
instead of passing the parameter.

Signed-off-by: Zhang Yanfei &lt;zhangyanfei@cn.fujitsu.com&gt;
Cc: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Toshi Kani &lt;toshi.kani@hp.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpu/mem hotplug: add try_online_node() for cpu_up()</title>
<updated>2013-11-13T03:09:04Z</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hp.com</email>
</author>
<published>2013-11-12T23:07:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=01b0f19707c51ef247404e6af1d4a97a11ba34f7'/>
<id>urn:sha1:01b0f19707c51ef247404e6af1d4a97a11ba34f7</id>
<content type='text'>
cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
mem_online_node() to put its node online if offlined and then call
build_all_zonelists() to initialize the zone list.

These steps are specific to memory hotplug, and should be managed in
mm/memory_hotplug.c.  lock_memory_hotplug() should also be held for the
whole steps.

For this reason, this patch replaces mem_online_node() with
try_online_node(), which performs the whole steps with
lock_memory_hotplug() held.  try_online_node() is named after
try_offline_node() as they have similar purpose.

There is no functional change in this patch.

Signed-off-by: Toshi Kani &lt;toshi.kani@hp.com&gt;
Reviewed-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: use pfn_to_nid() instead of page_to_nid(pfn_to_page())</title>
<updated>2013-11-13T03:09:04Z</updated>
<author>
<name>Xishi Qiu</name>
<email>qiuxishi@huawei.com</email>
</author>
<published>2013-11-12T23:07:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9c2606b77d6bffb422928bca66c8dc84d85089be'/>
<id>urn:sha1:9c2606b77d6bffb422928bca66c8dc84d85089be</id>
<content type='text'>
Use "pfn_to_nid(pfn)" instead of "page_to_nid(pfn_to_page(pfn))".

Signed-off-by: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Acked-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: rename the function is_memblock_offlined_cb()</title>
<updated>2013-11-13T03:09:04Z</updated>
<author>
<name>Xishi Qiu</name>
<email>qiuxishi@huawei.com</email>
</author>
<published>2013-11-12T23:07:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d6de9d5349db61e134ab7fb6b2436a4c7938714c'/>
<id>urn:sha1:d6de9d5349db61e134ab7fb6b2436a4c7938714c</id>
<content type='text'>
A is_memblock_offlined() return or 1 means memory block is offlined, but
is_memblock_offlined_cb() returning 1 means memory block is not offlined,
this will confuse somebody, so rename the function.

Signed-off-by: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Acked-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use pgdat_end_pfn() to simplify the code in others</title>
<updated>2013-11-13T03:09:03Z</updated>
<author>
<name>Xishi Qiu</name>
<email>qiuxishi@huawei.com</email>
</author>
<published>2013-11-12T23:07:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=83285c72e08c42848808039ef2d3b67a1bb88832'/>
<id>urn:sha1:83285c72e08c42848808039ef2d3b67a1bb88832</id>
<content type='text'>
Use "pgdat_end_pfn()" instead of "pgdat-&gt;node_start_pfn +
pgdat-&gt;node_spanned_pages".  Simplify the code, no functional change.

Signed-off-by: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2013-09-12T18:22:45Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-09-12T18:22:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02b9735c12892e04d3e101b06e4c6d64a814f566'/>
<id>urn:sha1:02b9735c12892e04d3e101b06e4c6d64a814f566</id>
<content type='text'>
Pull ACPI and power management fixes from Rafael Wysocki:
 "All of these commits are fixes that have emerged recently and some of
  them fix bugs introduced during this merge window.

  Specifics:

   1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events

      After the recent ACPIPHP changes we've seen some interesting
      breakage on a system that triggers device check notifications
      during boot for non-existing devices.  Although those
      notifications are really spurious, we should be able to deal with
      them nevertheless and that shouldn't introduce too much overhead.
      Four commits to make that work properly.

   2) Memory hotplug and hibernation mutual exclusion rework

      This was maent to be a cleanup, but it happens to fix a classical
      ABBA deadlock between system suspend/hibernation and ACPI memory
      hotplug which is possible if they are started roughly at the same
      time.  Three commits rework memory hotplug so that it doesn't
      acquire pm_mutex and make hibernation use device_hotplug_lock
      which prevents it from racing with memory hotplug.

   3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix

      The ACPI LPSS driver crashes during boot on Apple Macbook Air with
      Haswell that has slightly unusual BIOS configuration in which one
      of the LPSS device's _CRS method doesn't return all of the
      information expected by the driver.  Fix from Mika Westerberg, for
      stable.

   4) ACPICA fix related to Store-&gt;ArgX operation

      AML interpreter fix for obscure breakage that causes AML to be
      executed incorrectly on some machines (observed in practice).
      From Bob Moore.

   5) ACPI core fix for PCI ACPI device objects lookup

      There still are cases in which there is more than one ACPI device
      object matching a given PCI device and we don't choose the one
      that the BIOS expects us to choose, so this makes the lookup take
      more criteria into account in those cases.

   6) Fix to prevent cpuidle from crashing in some rare cases

      If the result of cpuidle_get_driver() is NULL, which can happen on
      some systems, cpuidle_driver_ref() will crash trying to use that
      pointer and the Daniel Fu's fix prevents that from happening.

   7) cpufreq fixes related to CPU hotplug

      Stephen Boyd reported a number of concurrency problems with
      cpufreq related to CPU hotplug which are addressed by a series of
      fixes from Srivatsa S Bhat and Viresh Kumar.

   8) cpufreq fix for time conversion in time_in_state attribute

      Time conversion carried out by cpufreq when user space attempts to
      read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state
      won't work correcty if cputime_t doesn't map directly to jiffies.
      Fix from Andreas Schwab.

   9) Revert of a troublesome cpufreq commit

      Commit 7c30ed5 (cpufreq: make sure frequency transitions are
      serialized) was intended to address some known concurrency
      problems in cpufreq related to the ordering of transitions, but
      unfortunately it introduced several problems of its own, so I
      decided to revert it now and address the original problems later
      in a more robust way.

  10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle.

  11) cpufreq fixes related to system suspend/resume

      The recent cpufreq changes that made it preserve CPU sysfs
      attributes over suspend/resume cycles introduced a possible NULL
      pointer dereference that caused it to crash during the second
      attempt to suspend.  Three commits from Srivatsa S Bhat fix that
      problem and a couple of related issues.

  12) cpufreq locking fix

      cpufreq_policy_restore() should acquire the lock for reading, but
      it acquires it for writing.  Fix from Lan Tianyu"

* tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
  cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
  cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
  cpufreq: Restructure if/else block to avoid unintended behavior
  cpufreq: Fix crash in cpufreq-stats during suspend/resume
  intel_pstate: Add Haswell CPU models
  Revert "cpufreq: make sure frequency transitions are serialized"
  cpufreq: Use signed type for 'ret' variable, to store negative error values
  cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
  cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
  cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
  cpufreq: Split __cpufreq_remove_dev() into two parts
  cpufreq: Fix wrong time unit conversion
  cpufreq: serialize calls to __cpufreq_governor()
  cpufreq: don't allow governor limits to be changed when it is disabled
  ACPI / bind: Prefer device objects with _STA to those without it
  ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks
  ACPI / hotplug / PCI: Use _OST to notify firmware about notify status
  ACPI / hotplug / PCI: Avoid doing too much for spurious notifies
  ACPICA: Fix for a Store-&gt;ArgX when ArgX contains a reference to a field.
  ACPI / hotplug / PCI: Don't trim devices before scanning the namespace
  ...
</content>
</entry>
<entry>
<title>mm: memory-hotplug: enable memory hotplug to handle hugepage</title>
<updated>2013-09-11T22:57:48Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-09-11T21:22:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8721bbbdd36382de51cd6b7a56322e0acca2414'/>
<id>urn:sha1:c8721bbbdd36382de51cd6b7a56322e0acca2414</id>
<content type='text'>
Until now we can't offline memory blocks which contain hugepages because a
hugepage is considered as an unmovable page.  But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.

What's different from other users of hugepage migration is that we need to
decompose all the hugepages inside the target memory block into free buddy
pages after hugepage migration, because otherwise free hugepages remaining
in the memory block intervene the memory offlining.  For this reason we
introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().

Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().

As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block.  So we now simply leave
it to fail as it is.

[yongjun_wei@trendmicro.com.cn: remove duplicated include]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hotplug: remove stop_machine() from try_offline_node()</title>
<updated>2013-09-11T22:57:41Z</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hp.com</email>
</author>
<published>2013-09-11T21:21:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f1cfe9d0d06fe44c2b310401d2db101968e8c58'/>
<id>urn:sha1:0f1cfe9d0d06fe44c2b310401d2db101968e8c58</id>
<content type='text'>
lock_device_hotplug() serializes hotplug &amp; online/offline operations.  The
lock is held in common sysfs online/offline interfaces and ACPI hotplug
code paths.

And here are the code paths:

- CPU &amp; Mem online/offline via sysfs online
	store_online()-&gt;lock_device_hotplug()

- Mem online via sysfs state:
	store_mem_state()-&gt;lock_device_hotplug()

- ACPI CPU &amp; Mem hot-add:
	acpi_scan_bus_device_check()-&gt;lock_device_hotplug()

- ACPI CPU &amp; Mem hot-delete:
	acpi_scan_hot_remove()-&gt;lock_device_hotplug()

try_offline_node() off-lines a node if all memory sections and cpus are
removed on the node.  It is called from acpi_processor_remove() and
acpi_memory_remove_memory()-&gt;remove_memory() paths, both of which are in
the ACPI hotplug code.

try_offline_node() calls stop_machine() to stop all cpus while checking
all cpu status with the assumption that the caller is not protected from
CPU hotplug or CPU online/offline operations.  However, the caller is
always serialized with lock_device_hotplug().  Also, the code needs to be
properly serialized with a lock, not by stopping all cpus at a random
place with stop_machine().

This patch removes the use of stop_machine() in try_offline_node() and
adds comments to try_offline_node() and remove_memory() that
lock_device_hotplug() is required.

Signed-off-by: Toshi Kani &lt;toshi.kani@hp.com&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hotplug: verify hotplug memory range</title>
<updated>2013-09-11T22:57:39Z</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hp.com</email>
</author>
<published>2013-09-11T21:21:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27356f54c8c32609ff45b4ed333bb64fb2eef374'/>
<id>urn:sha1:27356f54c8c32609ff45b4ed333bb64fb2eef374</id>
<content type='text'>
add_memory() and remove_memory() can only handle a memory range aligned
with section.  There are problems when an unaligned range is added and
then deleted as follows:

 - add_memory() with an unaligned range succeeds, but __add_pages()
   called from add_memory() adds a whole section of pages even though
   a given memory range is less than the section size.
 - remove_memory() to the added unaligned range hits BUG_ON() in
   __remove_pages().

This patch changes add_memory() and remove_memory() to check if a given
memory range is aligned with section at the beginning.  As the result,
add_memory() fails with -EINVAL when a given range is unaligned, and does
not add such memory range.  This prevents remove_memory() to be called
with an unaligned range as well.  Note that remove_memory() has to use
BUG_ON() since this function cannot fail.

[akpm@linux-foundation.org: avoid printk warnings]
Signed-off-by: Toshi Kani &lt;toshi.kani@hp.com&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
