From 55ac590c2fadad785d60dd70c12d62823bc2cd39 Mon Sep 17 00:00:00 2001 From: Tang Chen Date: Tue, 21 Jan 2014 15:49:35 -0800 Subject: memblock, mem_hotplug: make memblock skip hotpluggable regions if needed Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable memory used by the kernel won't be able to be hot-removed. To solve this problem, the basic idea is to prevent memblock from allocating hotpluggable memory for the kernel at early time, and arrange all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing zones. In the previous patches, we have marked hotpluggable memory regions with MEMBLOCK_HOTPLUG flag in memblock.memory. In this patch, we make memblock skip these hotpluggable memory regions in the default top-down allocation function if movable_node boot option is specified. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei Cc: "H. Peter Anvin" Cc: "Rafael J . Wysocki" Cc: Chen Tang Cc: Gong Chen Cc: Ingo Molnar Cc: Jiang Liu Cc: Johannes Weiner Cc: Lai Jiangshan Cc: Larry Woodman Cc: Len Brown Cc: Liu Jiang Cc: Mel Gorman Cc: Michal Nazarewicz Cc: Minchan Kim Cc: Prarit Bhargava Cc: Rik van Riel Cc: Taku Izumi Cc: Tejun Heo Cc: Thomas Gleixner Cc: Thomas Renninger Cc: Toshi Kani Cc: Vasilis Liaskovitis Cc: Wanpeng Li Cc: Wen Congyang Cc: Yasuaki Ishimatsu Cc: Yinghai Lu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory_hotplug.c | 1 + 1 file changed, 1 insertion(+) (limited to 'mm/memory_hotplug.c') diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 489f235502db..01e39afde1cb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1446,6 +1446,7 @@ static int __init cmdline_parse_movable_node(char *p) * the kernel away from hotpluggable memory. */ memblock_set_bottom_up(true); + movable_node_enabled = true; #else pr_warn("movable_node option not supported\n"); #endif -- cgit v1.2.3 From 869a84e1ca163b737236dae997db4a6a1e230b9b Mon Sep 17 00:00:00 2001 From: Grygorii Strashko Date: Tue, 21 Jan 2014 15:50:10 -0800 Subject: mm/memblock: remove unnecessary inclusions of bootmem.h Clean-up to remove depedency with bootmem headers. Signed-off-by: Grygorii Strashko Signed-off-by: Santosh Shilimkar Reviewed-by: Tejun Heo Cc: Yinghai Lu Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Christoph Lameter Cc: H. Peter Anvin Cc: Johannes Weiner Cc: KAMEZAWA Hiroyuki Cc: Konrad Rzeszutek Wilk Cc: Michal Hocko Cc: Paul Walmsley Cc: Pavel Machek Cc: Russell King Cc: Tony Lindgren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/mem.c | 1 - mm/memory_hotplug.c | 1 - 2 files changed, 2 deletions(-) (limited to 'mm/memory_hotplug.c') diff --git a/drivers/char/mem.c b/drivers/char/mem.c index f895a8c8a244..92c5937f80c3 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include #include diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 01e39afde1cb..af4935ee444f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -9,7 +9,6 @@ #include #include #include -#include #include #include #include -- cgit v1.2.3 From 9e43aa2b8d1cb3137bd7e60d5fead83d0569de2b Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Tue, 21 Jan 2014 15:50:43 -0800 Subject: mm/memory_hotplug.c: use memblock apis for early memory allocations Correct ensure_zone_is_initialized() function description according to the introduced memblock APIs for early memory allocations. Signed-off-by: Grygorii Strashko Signed-off-by: Santosh Shilimkar Cc: "Rafael J. Wysocki" Cc: Arnd Bergmann Cc: Christoph Lameter Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Johannes Weiner Cc: KAMEZAWA Hiroyuki Cc: Konrad Rzeszutek Wilk Cc: Michal Hocko Cc: Paul Walmsley Cc: Pavel Machek Cc: Russell King Cc: Tejun Heo Cc: Tony Lindgren Cc: Yinghai Lu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/memory_hotplug.c') diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index af4935ee444f..cc2ab37220b7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -268,7 +268,7 @@ static void fix_zone_id(struct zone *zone, unsigned long start_pfn, } /* Can fail with -ENOMEM from allocating a wait table with vmalloc() or - * alloc_bootmem_node_nopanic() */ + * alloc_bootmem_node_nopanic()/memblock_virt_alloc_node_nopanic() */ static int __ref ensure_zone_is_initialized(struct zone *zone, unsigned long start_pfn, unsigned long num_pages) { -- cgit v1.2.3 From f0b791a34cb3cffd2bbc3ca4365c9b719fa2c9f3 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Thu, 23 Jan 2014 15:52:49 -0800 Subject: mm: print more details for bad_page() bad_page() is cool in that it prints out a bunch of data about the page. But, I can never remember which page flags are good and which are bad, or whether ->index or ->mapping is required to be NULL. This patch allows bad/dump_page() callers to specify a string about why they are dumping the page and adds explanation strings to a number of places. It also adds a 'bad_flags' argument to bad_page(), which it then dumps out separately from the flags which are actually set. This way, the messages will show specifically why the page was bad, *specifically* which flags it is complaining about, if it was a page flag combination which was the problem. [akpm@linux-foundation.org: switch to pr_alert] Signed-off-by: Dave Hansen Reviewed-by: Christoph Lameter Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 4 ++- mm/balloon_compaction.c | 4 +-- mm/memory.c | 2 +- mm/memory_hotplug.c | 2 +- mm/page_alloc.c | 72 ++++++++++++++++++++++++++++++++++++------------- 5 files changed, 61 insertions(+), 23 deletions(-) (limited to 'mm/memory_hotplug.c') diff --git a/include/linux/mm.h b/include/linux/mm.h index a512dd836931..03bbcb84d96e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2029,7 +2029,9 @@ extern void shake_page(struct page *p, int access); extern atomic_long_t num_poisoned_pages; extern int soft_offline_page(struct page *page, int flags); -extern void dump_page(struct page *page); +extern void dump_page(struct page *page, char *reason); +extern void dump_page_badflags(struct page *page, char *reason, + unsigned long badflags); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c index 07dbc8ec46cf..6e45a5074bf0 100644 --- a/mm/balloon_compaction.c +++ b/mm/balloon_compaction.c @@ -267,7 +267,7 @@ void balloon_page_putback(struct page *page) put_page(page); } else { WARN_ON(1); - dump_page(page); + dump_page(page, "not movable balloon page"); } unlock_page(page); } @@ -287,7 +287,7 @@ int balloon_page_migrate(struct page *newpage, BUG_ON(!trylock_page(newpage)); if (WARN_ON(!__is_movable_balloon_page(page))) { - dump_page(page); + dump_page(page, "not movable balloon page"); unlock_page(newpage); return rc; } diff --git a/mm/memory.c b/mm/memory.c index 86487dfa5e59..71d70c082b98 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -671,7 +671,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, current->comm, (long long)pte_val(pte), (long long)pmd_val(*pmd)); if (page) - dump_page(page); + dump_page(page, "bad pte"); printk(KERN_ALERT "addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n", (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index cc2ab37220b7..a512a47241a4 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1309,7 +1309,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) #ifdef CONFIG_DEBUG_VM printk(KERN_ALERT "removing pfn %lx from LRU failed\n", pfn); - dump_page(page); + dump_page(page, "failed to remove from LRU"); #endif put_page(page); /* Because we don't have big zone->lock. we should diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 533e2147d14f..1939f4446a36 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -295,7 +295,7 @@ static inline int bad_range(struct zone *zone, struct page *page) } #endif -static void bad_page(struct page *page) +static void bad_page(struct page *page, char *reason, unsigned long bad_flags) { static unsigned long resume; static unsigned long nr_shown; @@ -329,7 +329,7 @@ static void bad_page(struct page *page) printk(KERN_ALERT "BUG: Bad page state in process %s pfn:%05lx\n", current->comm, page_to_pfn(page)); - dump_page(page); + dump_page_badflags(page, reason, bad_flags); print_modules(); dump_stack(); @@ -383,7 +383,7 @@ static int destroy_compound_page(struct page *page, unsigned long order) int bad = 0; if (unlikely(compound_order(page) != order)) { - bad_page(page); + bad_page(page, "wrong compound order", 0); bad++; } @@ -392,8 +392,11 @@ static int destroy_compound_page(struct page *page, unsigned long order) for (i = 1; i < nr_pages; i++) { struct page *p = page + i; - if (unlikely(!PageTail(p) || (p->first_page != page))) { - bad_page(page); + if (unlikely(!PageTail(p))) { + bad_page(page, "PageTail not set", 0); + bad++; + } else if (unlikely(p->first_page != page)) { + bad_page(page, "first_page not consistent", 0); bad++; } __ClearPageTail(p); @@ -618,12 +621,23 @@ out: static inline int free_pages_check(struct page *page) { - if (unlikely(page_mapcount(page) | - (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | - (page->flags & PAGE_FLAGS_CHECK_AT_FREE) | - (mem_cgroup_bad_page_check(page)))) { - bad_page(page); + char *bad_reason = NULL; + unsigned long bad_flags = 0; + + if (unlikely(page_mapcount(page))) + bad_reason = "nonzero mapcount"; + if (unlikely(page->mapping != NULL)) + bad_reason = "non-NULL mapping"; + if (unlikely(atomic_read(&page->_count) != 0)) + bad_reason = "nonzero _count"; + if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_FREE)) { + bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; + bad_flags = PAGE_FLAGS_CHECK_AT_FREE; + } + if (unlikely(mem_cgroup_bad_page_check(page))) + bad_reason = "cgroup check failed"; + if (unlikely(bad_reason)) { + bad_page(page, bad_reason, bad_flags); return 1; } page_cpupid_reset_last(page); @@ -843,12 +857,23 @@ static inline void expand(struct zone *zone, struct page *page, */ static inline int check_new_page(struct page *page) { - if (unlikely(page_mapcount(page) | - (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | - (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | - (mem_cgroup_bad_page_check(page)))) { - bad_page(page); + char *bad_reason = NULL; + unsigned long bad_flags = 0; + + if (unlikely(page_mapcount(page))) + bad_reason = "nonzero mapcount"; + if (unlikely(page->mapping != NULL)) + bad_reason = "non-NULL mapping"; + if (unlikely(atomic_read(&page->_count) != 0)) + bad_reason = "nonzero _count"; + if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { + bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set"; + bad_flags = PAGE_FLAGS_CHECK_AT_PREP; + } + if (unlikely(mem_cgroup_bad_page_check(page))) + bad_reason = "cgroup check failed"; + if (unlikely(bad_reason)) { + bad_page(page, bad_reason, bad_flags); return 1; } return 0; @@ -6494,12 +6519,23 @@ static void dump_page_flags(unsigned long flags) printk(")\n"); } -void dump_page(struct page *page) +void dump_page_badflags(struct page *page, char *reason, unsigned long badflags) { printk(KERN_ALERT "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n", page, atomic_read(&page->_count), page_mapcount(page), page->mapping, page->index); dump_page_flags(page->flags); + if (reason) + pr_alert("page dumped because: %s\n", reason); + if (page->flags & badflags) { + pr_alert("bad because of flags:\n"); + dump_page_flags(page->flags & badflags); + } mem_cgroup_print_bad_page(page); } + +void dump_page(struct page *page, char *reason) +{ + dump_page_badflags(page, reason, 0); +} -- cgit v1.2.3 From ac13c4622bda2a9ff8f57bbbfeff48b2a42d0963 Mon Sep 17 00:00:00 2001 From: Nathan Zimmer Date: Thu, 23 Jan 2014 15:53:26 -0800 Subject: mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug We don't need to do register_memory_resource() under lock_memory_hotplug() since it has its own lock and doesn't make any callbacks. Also register_memory_resource return NULL on failure so we don't have anything to cleanup at this point. The reason for this rfc is I was doing some experiments with hotplugging of memory on some of our larger systems. While it seems to work, it can be quite slow. With some preliminary digging I found that lock_memory_hotplug is clearly ripe for breakup. It could be broken up per nid or something but it also covers the online_page_callback. The online_page_callback shouldn't be very hard to break out. Also there is the issue of various structures(wmarks come to mind) that are only updated under the lock_memory_hotplug that would need to be dealt with. Cc: Tang Chen Cc: Wen Congyang Cc: Kamezawa Hiroyuki Reviewed-by: Yasuaki Ishimatsu Cc: "Rafael J. Wysocki" Cc: Hedi Cc: Mike Travis Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory_hotplug.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'mm/memory_hotplug.c') diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a512a47241a4..a650db29606f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1107,17 +1107,18 @@ int __ref add_memory(int nid, u64 start, u64 size) if (ret) return ret; - lock_memory_hotplug(); - res = register_memory_resource(start, size); ret = -EEXIST; if (!res) - goto out; + return ret; { /* Stupid hack to suppress address-never-null warning */ void *p = NODE_DATA(nid); new_pgdat = !p; } + + lock_memory_hotplug(); + new_node = !node_online(nid); if (new_node) { pgdat = hotadd_new_pgdat(nid, start); -- cgit v1.2.3