linux/mm/vmalloc.c, branch v3.1

mm: sync vmalloc address space page tables in alloc_vm_area()

2011-09-15T01:09:38Z

Xen backend drivers (e.g., blkback and netback) would sometimes fail to map grant pages into the vmalloc address space allocated with alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen could not find the page (in the L2 table) containing the PTEs it needed to update. (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 netback and blkback were making the hypercall from a kernel thread where task->active_mm != &init_mm and alloc_vm_area() was only updating the page tables for init_mm. The usual method of deferring the update to the page tables of other processes (i.e., after taking a fault) doesn't work as a fault cannot occur during the hypercall. This would work on some systems depending on what else was using vmalloc. Fix this by reverting ef691947d8a3 ("vmalloc: remove vmalloc_sync_all() from alloc_vm_area()") and add a comment to explain why it's needed. Signed-off-by: David Vrabel Cc: Jeremy Fitzhardinge Cc: Konrad Rzeszutek Wilk Cc: Ian Campbell Cc: Keir Fraser Cc: [3.0.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: fix wrong vmap address calculations with odd NR_CPUS values

2011-08-14T19:32:52Z

Commit db64fe02258f ("mm: rewrite vmap layer") introduced code that does address calculations under the assumption that VMAP_BLOCK_SIZE is a power of two. However, this might not be true if CONFIG_NR_CPUS is not set to a power of two. Wrong vmap_block index/offset values could lead to memory corruption. However, this has never been observed in practice (or never been diagnosed correctly); what caught this was the BUG_ON in vb_alloc() that checks for inconsistent vmap_block indices. To fix this, ensure that VMAP_BLOCK_SIZE always is a power of two. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=31572 Reported-by: Pavel Kysilka Reported-by: Matias A. Fonzo Signed-off-by: Clemens Ladisch Signed-off-by: Stefan Richter Cc: Nick Piggin Cc: Jeremy Fitzhardinge Cc: Krzysztof Helt Cc: Andrew Morton Cc: 2.6.28+ Signed-off-by: Linus Torvalds

atomic: use

2011-07-26T23:49:47Z

This allows us to move duplicated code in (atomic_inc_not_zero() for now) to Signed-off-by: Arun Sharma Reviewed-by: Eric Dumazet Cc: Ingo Molnar Cc: David Miller Cc: Eric Dumazet Acked-by: Mike Frysinger Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

vmalloc,rcu: Convert call_rcu(rcu_free_vb) to kfree_rcu()

2011-07-20T21:10:18Z

The rcu callback rcu_free_vb() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(rcu_free_vb). Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney Cc: Andrew Morton Cc: Namhyung Kim Cc: David Rientjes Reviewed-by: Josh Triplett

vmalloc,rcu: Convert call_rcu(rcu_free_va) to kfree_rcu()

2011-07-20T21:10:17Z

The rcu callback rcu_free_va() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(rcu_free_va). Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney Cc: Andrew Morton Cc: Namhyung Kim Cc: David Rientjes Reviewed-by: Josh Triplett

Merge branch 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen

2011-05-27T02:01:15Z

* 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: fix compile without CONFIG_XEN_DEBUG_FS Use arbitrary_virt_to_machine() to deal with ioremapped pud updates. Use arbitrary_virt_to_machine() to deal with ioremapped pmd updates. xen/mmu: remove all ad-hoc stats stuff xen: use normal virt_to_machine for ptes xen: make a pile of mmu pvop functions static vmalloc: remove vmalloc_sync_all() from alloc_vm_area() xen: condense everything onto xen_set_pte xen: use mmu_update for xen_set_pte_at() xen: drop all the special iomap pte paths.

mm: print vmalloc() state after allocation failures

2011-05-25T15:39:22Z

I was tracking down a page allocation failure that ended up in vmalloc(). Since vmalloc() uses 0-order pages, if somebody asks for an insane amount of memory, we'll still get a warning with "order:0" in it. That's not very useful. During recovery, vmalloc() also nicely frees all of the memory that it got up to the point of the failure. That is wonderful, but it also quickly hides any issues. We have a much different sitation if vmalloc() repeatedly fails 10GB in to: vmalloc(100 * 1<<30); versus repeatedly failing 4096 bytes in to a: vmalloc(8192); This patch will print out messages that look like this: [ 68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes [ 68.124218] bash: page allocation failure: order:0, mode:0xd2 [ 68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333 [ 68.125579] Call Trace: [ 68.125853] [] warn_alloc_failed+0x146/0x170 [ 68.126464] [] ? printk+0x6c/0x70 [ 68.126791] [] ? alloc_pages_current+0x94/0xe0 [ 68.127661] [] __vmalloc_node_range+0x237/0x290 ... The 'order' variable is added for clarity when calling warn_alloc_failed() to avoid having an unexplained '0' as an argument. The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take us over 80 columns for the alloc_pages_node() call. If we are going to add a line, it might as well be one that makes the sucker easier to read. As a side issue, I also noticed that ctl_ioctl() does vmalloc() based solely on an unverified value passed in from userspace. Granted, it's under CAP_SYS_ADMIN, but it still frightens me a bit. Signed-off-by: Dave Hansen Cc: Johannes Weiner Cc: David Rientjes Cc: Michal Nazarewicz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/vmalloc: remove guard page from between vmap blocks

2011-05-25T15:39:11Z

The vmap allocator is used to, among other things, allocate per-cpu vmap blocks, where each vmap block is naturally aligned to its own size. Obviously, leaving a guard page after each vmap area forbids packing vmap blocks efficiently and can make the kernel run out of possible vmap blocks long before overall vmap space is exhausted. The new interface to map a user-supplied page array into linear vmalloc space (vm_map_ram) insists on allocating from a vmap block (instead of falling back to a custom area) when the area size is below a certain threshold. With heavy users of this interface (e.g. XFS) and limited vmalloc space on 32-bit, vmap block exhaustion is a real problem. Remove the guard page from the core vmap allocator. vmalloc and the old vmap interface enforce a guard page on their own at a higher level. Note that without this patch, we had accidental guard pages after those vm_map_ram areas that happened to be at the end of a vmap block, but not between every area. This patch removes this accidental guard page only. If we want guard pages after every vm_map_ram area, this should be done separately. And just like with vmalloc and the old interface on a different level, not in the core allocator. Mel pointed out: "If necessary, the guard page could be reintroduced as a debugging-only option (CONFIG_DEBUG_PAGEALLOC?). Otherwise it seems reasonable." Signed-off-by: Johannes Weiner Cc: Nick Piggin Cc: Dave Chinner Acked-by: Mel Gorman Cc: Hugh Dickins Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

vmalloc: remove vmalloc_sync_all() from alloc_vm_area()

2011-05-20T21:14:32Z

There's no need for it: it will get faulted into the current pagetable as needed. Signed-off-by: Jeremy Fitzhardinge

vmalloc: remove confusing comment on vwrite()

2011-03-23T00:44:09Z

KM_USER1 is never used for vwrite() path so the caller doesn't need to guarantee it is not used. Only the caller should guarantee is KM_USER0 and it is commented already. Signed-off-by: Namhyung Kim Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds