linux/mm/memory_hotplug.c, branch v3.17

memory-hotplug: add zone_for_memory() for selecting zone for new memory

2014-08-07T01:01:21Z

This series of patches fixes a problem when adding memory in bad manner. For example: for a x86_64 machine booted with "mem=400M" and with 2GiB memory installed, following commands cause problem: # echo 0x40000000 > /sys/devices/system/memory/probe [ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff] # echo 0x48000000 > /sys/devices/system/memory/probe [ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff] # echo online_movable > /sys/devices/system/memory/memory9/state # echo 0x50000000 > /sys/devices/system/memory/probe [ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff] # echo 0x58000000 > /sys/devices/system/memory/probe [ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff] # echo online_movable > /sys/devices/system/memory/memory11/state # echo online> /sys/devices/system/memory/memory8/state # echo online> /sys/devices/system/memory/memory10/state # echo offline> /sys/devices/system/memory/memory9/state [ 30.558819] Offlined Pages 32768 # free total used free shared buffers cached Mem: 780588 18014398509432020 830552 0 0 51180 -/+ buffers/cache: 18014398509380840 881732 Swap: 0 0 0 This is because the above commands probe higher memory after online a section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE. After the second online_movable, the problem can be observed from zoneinfo: # cat /proc/zoneinfo ... Node 0, zone Movable pages free 65491 min 250 low 312 high 375 scanned 0 spanned 18446744073709518848 present 65536 managed 65536 ... This series of patches solve the problem by checking ZONE_MOVABLE when choosing zone for new memory. If new memory is inside or higher than ZONE_MOVABLE, makes it go there instead. After applying this series of patches, following are free and zoneinfo result (after offlining memory9): bash-4.2# free total used free shared buffers cached Mem: 780956 80112 700844 0 0 51180 -/+ buffers/cache: 28932 752024 Swap: 0 0 0 bash-4.2# cat /proc/zoneinfo Node 0, zone DMA pages free 3389 min 14 low 17 high 21 scanned 0 spanned 4095 present 3998 managed 3977 nr_free_pages 3389 ... start_pfn: 1 inactive_ratio: 1 Node 0, zone DMA32 pages free 73724 min 341 low 426 high 511 scanned 0 spanned 98304 present 98304 managed 92958 nr_free_pages 73724 ... start_pfn: 4096 inactive_ratio: 1 Node 0, zone Normal pages free 32630 min 120 low 150 high 180 scanned 0 spanned 32768 present 32768 managed 32768 nr_free_pages 32630 ... start_pfn: 262144 inactive_ratio: 1 Node 0, zone Movable pages free 65476 min 241 low 301 high 361 scanned 0 spanned 98304 present 65536 managed 65536 nr_free_pages 65476 ... start_pfn: 294912 inactive_ratio: 1 This patch (of 7): Introduce zone_for_memory() in arch independent code for arch_add_memory() use. Many arch_add_memory() function simply selects ZONE_HIGHMEM or ZONE_NORMAL and add new memory into it. However, with the existance of ZONE_MOVABLE, the selection method should be carefully considered: if new, higher memory is added after ZONE_MOVABLE is setup, the default zone and ZONE_MOVABLE may overlap each other. should_add_memory_movable() checks the status of ZONE_MOVABLE. If it has already contain memory, compare the address of new memory and movable memory. If new memory is higher than movable, it should be added into ZONE_MOVABLE instead of default zone. Signed-off-by: Wang Nan Cc: Zhang Yanfei Cc: Dave Hansen Cc: Ingo Molnar Cc: Yinghai Lu Cc: "Mel Gorman" Cc: Thomas Gleixner Cc: "H. Peter Anvin" Cc: "Luck, Tony" Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Chris Metcalf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1

2014-08-07T01:01:16Z

In store_mem_state(), we have: ... 334 else if (!strncmp(buf, "offline", min_t(int, count, 7))) 335 online_type = -1; ... 355 case -1: 356 ret = device_offline(&mem->dev); 357 break; ... Here, "offline" is hard coded as -1. This patch does the following renaming: ONLINE_KEEP -> MMOP_ONLINE_KEEP ONLINE_KERNEL -> MMOP_ONLINE_KERNEL ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE and introduces MMOP_OFFLINE = -1 to avoid hard coding. Signed-off-by: Tang Chen Cc: Hu Tao Cc: Greg Kroah-Hartman Cc: Lai Jiangshan Cc: Yasuaki Ishimatsu Cc: Gu Zheng Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/memory_hotplug.c: add __meminit to grow_zone_span/grow_pgdat_span

2014-08-07T01:01:15Z

grow_zone_span and grow_pgdat_span are only called by __meminit __add_zone Signed-off-by: Fabian Frederick Cc: Toshi Kani Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm, migration: add destination page freeing callback

2014-06-04T23:54:06Z

Memory migration uses a callback defined by the caller to determine how to allocate destination pages. When migration fails for a source page, however, it frees the destination page back to the system. This patch adds a memory migration callback defined by the caller to determine how to free destination pages. If a caller, such as memory compaction, builds its own freelist for migration targets, this can reuse already freed memory instead of scanning additional memory. If the caller provides a function to handle freeing of destination pages, it is called when page migration fails. If the caller passes NULL then freeing back to the system will be handled as usual. This patch introduces no functional change. Signed-off-by: David Rientjes Reviewed-by: Naoya Horiguchi Acked-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Greg Thelen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/memory_hotplug.c: use PFN_DOWN()

2014-06-04T23:54:02Z

Replace ((x) >> PAGE_SHIFT) with the pfn macro. Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mem-hotplug: implement get/put_online_mems

2014-06-04T23:53:59Z

kmem_cache_{create,destroy,shrink} need to get a stable value of cpu/node online mask, because they init/destroy/access per-cpu/node kmem_cache parts, which can be allocated or destroyed on cpu/mem hotplug. To protect against cpu hotplug, these functions use {get,put}_online_cpus. However, they do nothing to synchronize with memory hotplug - taking the slab_mutex does not eliminate the possibility of race as described in patch 2. What we need there is something like get_online_cpus, but for memory. We already have lock_memory_hotplug, which serves for the purpose, but it's a bit of a hammer right now, because it's backed by a mutex. As a result, it imposes some limitations to locking order, which are not desirable, and can't be used just like get_online_cpus. That's why in patch 1 I substitute it with get/put_online_mems, which work exactly like get/put_online_cpus except they block not cpu, but memory hotplug. [ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by myself, because it used an rw semaphore for get/put_online_mems, making them dead lock prune. ] This patch (of 2): {un}lock_memory_hotplug, which is used to synchronize against memory hotplug, is currently backed by a mutex, which makes it a bit of a hammer - threads that only want to get a stable value of online nodes mask won't be able to proceed concurrently. Also, it imposes some strong locking ordering rules on it, which narrows down the set of its usage scenarios. This patch introduces get/put_online_mems, which are the same as get/put_online_cpus, but for memory hotplug, i.e. executing a code inside a get/put_online_mems section will guarantee a stable value of online nodes, present pages, etc. lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether. Signed-off-by: Vladimir Davydov Cc: Christoph Lameter Cc: Pekka Enberg Cc: Tang Chen Cc: Zhang Yanfei Cc: Toshi Kani Cc: Xishi Qiu Cc: Jiang Liu Cc: Rafael J. Wysocki Cc: David Rientjes Cc: Wen Congyang Cc: Yasuaki Ishimatsu Cc: Lai Jiangshan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug

2014-01-24T00:36:52Z

We don't need to do register_memory_resource() under lock_memory_hotplug() since it has its own lock and doesn't make any callbacks. Also register_memory_resource return NULL on failure so we don't have anything to cleanup at this point. The reason for this rfc is I was doing some experiments with hotplugging of memory on some of our larger systems. While it seems to work, it can be quite slow. With some preliminary digging I found that lock_memory_hotplug is clearly ripe for breakup. It could be broken up per nid or something but it also covers the online_page_callback. The online_page_callback shouldn't be very hard to break out. Also there is the issue of various structures(wmarks come to mind) that are only updated under the lock_memory_hotplug that would need to be dealt with. Cc: Tang Chen Cc: Wen Congyang Cc: Kamezawa Hiroyuki Reviewed-by: Yasuaki Ishimatsu Cc: "Rafael J. Wysocki" Cc: Hedi Cc: Mike Travis Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: print more details for bad_page()

2014-01-24T00:36:50Z

bad_page() is cool in that it prints out a bunch of data about the page. But, I can never remember which page flags are good and which are bad, or whether ->index or ->mapping is required to be NULL. This patch allows bad/dump_page() callers to specify a string about why they are dumping the page and adds explanation strings to a number of places. It also adds a 'bad_flags' argument to bad_page(), which it then dumps out separately from the flags which are actually set. This way, the messages will show specifically why the page was bad, *specifically* which flags it is complaining about, if it was a page flag combination which was the problem. [akpm@linux-foundation.org: switch to pr_alert] Signed-off-by: Dave Hansen Reviewed-by: Christoph Lameter Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/memory_hotplug.c: use memblock apis for early memory allocations

2014-01-22T00:19:47Z

Correct ensure_zone_is_initialized() function description according to the introduced memblock APIs for early memory allocations. Signed-off-by: Grygorii Strashko Signed-off-by: Santosh Shilimkar Cc: "Rafael J. Wysocki" Cc: Arnd Bergmann Cc: Christoph Lameter Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Johannes Weiner Cc: KAMEZAWA Hiroyuki Cc: Konrad Rzeszutek Wilk Cc: Michal Hocko Cc: Paul Walmsley Cc: Pavel Machek Cc: Russell King Cc: Tejun Heo Cc: Tony Lindgren Cc: Yinghai Lu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/memblock: remove unnecessary inclusions of bootmem.h

2014-01-22T00:19:46Z

Clean-up to remove depedency with bootmem headers. Signed-off-by: Grygorii Strashko Signed-off-by: Santosh Shilimkar Reviewed-by: Tejun Heo Cc: Yinghai Lu Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Christoph Lameter Cc: H. Peter Anvin Cc: Johannes Weiner Cc: KAMEZAWA Hiroyuki Cc: Konrad Rzeszutek Wilk Cc: Michal Hocko Cc: Paul Walmsley Cc: Pavel Machek Cc: Russell King Cc: Tony Lindgren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds