<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/vmalloc.c, branch v3.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2011-05-27T02:01:15Z</updated>
<entry>
<title>Merge branch 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen</title>
<updated>2011-05-27T02:01:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-05-27T02:01:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc7acbb2518f250050179c8581a972df3b6a24f1'/>
<id>urn:sha1:dc7acbb2518f250050179c8581a972df3b6a24f1</id>
<content type='text'>
* 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: fix compile without CONFIG_XEN_DEBUG_FS
  Use arbitrary_virt_to_machine() to deal with ioremapped pud updates.
  Use arbitrary_virt_to_machine() to deal with ioremapped pmd updates.
  xen/mmu: remove all ad-hoc stats stuff
  xen: use normal virt_to_machine for ptes
  xen: make a pile of mmu pvop functions static
  vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
  xen: condense everything onto xen_set_pte
  xen: use mmu_update for xen_set_pte_at()
  xen: drop all the special iomap pte paths.
</content>
</entry>
<entry>
<title>mm: print vmalloc() state after allocation failures</title>
<updated>2011-05-25T15:39:22Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave@linux.vnet.ibm.com</email>
</author>
<published>2011-05-25T00:12:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=22943ab116af1ead4dc112ec408a93cf1365b34a'/>
<id>urn:sha1:22943ab116af1ead4dc112ec408a93cf1365b34a</id>
<content type='text'>
I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it.  That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it got
up to the point of the failure.  That is wonderful, but it also quickly
hides any issues.  We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

	vmalloc(100 * 1&lt;&lt;30);

versus repeatedly failing 4096 bytes in to a:

	vmalloc(8192);

This patch will print out messages that look like this:

[   68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
[   68.124218] bash: page allocation failure: order:0, mode:0xd2
[   68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333
[   68.125579] Call Trace:
[   68.125853]  [&lt;ffffffff810f6da6&gt;] warn_alloc_failed+0x146/0x170
[   68.126464]  [&lt;ffffffff8107e05c&gt;] ? printk+0x6c/0x70
[   68.126791]  [&lt;ffffffff8112b5d4&gt;] ? alloc_pages_current+0x94/0xe0
[   68.127661]  [&lt;ffffffff8111ed37&gt;] __vmalloc_node_range+0x237/0x290
...

The 'order' variable is added for clarity when calling warn_alloc_failed()
to avoid having an unexplained '0' as an argument.

The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take
us over 80 columns for the alloc_pages_node() call.  If we are going to
add a line, it might as well be one that makes the sucker easier to read.

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace.  Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc: remove guard page from between vmap blocks</title>
<updated>2011-05-25T15:39:11Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2011-05-25T00:11:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=248ac0e1943ad1796393d281b096184719eb3f97'/>
<id>urn:sha1:248ac0e1943ad1796393d281b096184719eb3f97</id>
<content type='text'>
The vmap allocator is used to, among other things, allocate per-cpu vmap
blocks, where each vmap block is naturally aligned to its own size.
Obviously, leaving a guard page after each vmap area forbids packing vmap
blocks efficiently and can make the kernel run out of possible vmap blocks
long before overall vmap space is exhausted.

The new interface to map a user-supplied page array into linear vmalloc
space (vm_map_ram) insists on allocating from a vmap block (instead of
falling back to a custom area) when the area size is below a certain
threshold.  With heavy users of this interface (e.g.  XFS) and limited
vmalloc space on 32-bit, vmap block exhaustion is a real problem.

Remove the guard page from the core vmap allocator.  vmalloc and the old
vmap interface enforce a guard page on their own at a higher level.

Note that without this patch, we had accidental guard pages after those
vm_map_ram areas that happened to be at the end of a vmap block, but not
between every area.  This patch removes this accidental guard page only.

If we want guard pages after every vm_map_ram area, this should be done
separately.  And just like with vmalloc and the old interface on a
different level, not in the core allocator.

Mel pointed out: "If necessary, the guard page could be reintroduced as a
debugging-only option (CONFIG_DEBUG_PAGEALLOC?).  Otherwise it seems
reasonable."

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmalloc: remove vmalloc_sync_all() from alloc_vm_area()</title>
<updated>2011-05-20T21:14:32Z</updated>
<author>
<name>Jeremy Fitzhardinge</name>
<email>jeremy.fitzhardinge@citrix.com</email>
</author>
<published>2010-12-01T23:45:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ef691947d8a3d479e67652312783aedcf629320a'/>
<id>urn:sha1:ef691947d8a3d479e67652312783aedcf629320a</id>
<content type='text'>
There's no need for it: it will get faulted into the current pagetable
as needed.

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
</content>
</entry>
<entry>
<title>vmalloc: remove confusing comment on vwrite()</title>
<updated>2011-03-23T00:44:09Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@gmail.com</email>
</author>
<published>2011-03-22T23:33:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a42931bf9c02fbf3628a27a2a5c55d2b83e4ff20'/>
<id>urn:sha1:a42931bf9c02fbf3628a27a2a5c55d2b83e4ff20</id>
<content type='text'>
KM_USER1 is never used for vwrite() path so the caller doesn't need to
guarantee it is not used.  Only the caller should guarantee is KM_USER0
and it is commented already.

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmap area cache</title>
<updated>2011-03-23T00:44:00Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2011-03-22T23:30:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=89699605fe7cfd8611900346f61cb6cbf179b10a'/>
<id>urn:sha1:89699605fe7cfd8611900346f61cb6cbf179b10a</id>
<content type='text'>
Provide a free area cache for the vmalloc virtual address allocator, based
on the algorithm used by the user virtual memory allocator.

This reduces the number of rbtree operations and linear traversals over
the vmap extents in order to find a free area, by starting off at the last
point that a free area was found.

The free area cache is reset if areas are freed behind it, or if we are
searching for a smaller area or alignment than last time.  So allocation
patterns are not changed (verified by corner-case and random test cases in
userspace testing).

This solves a regression caused by lazy vunmap TLB purging introduced in
db64fe02 (mm: rewrite vmap layer).  That patch will leave extents in the
vmap allocator after they are vunmapped, and until a significant number
accumulate that can be flushed in a single batch.  So in a workload that
vmalloc/vfree frequently, a chain of extents will build up from
VMALLOC_START address, which have to be iterated over each time (giving an
O(n) type of behaviour).

After this patch, the search will start from where it left off, giving
closer to an amortized O(1).

This is verified to solve regressions reported Steven in GFS2, and Avi in
KVM.

Hugh's update:

: I tried out the recent mmotm, and on one machine was fortunate to hit
: the BUG_ON(first-&gt;va_start &lt; addr) which seems to have been stalling
: your vmap area cache patch ever since May.

: I can get you addresses etc, I did dump a few out; but once I stared
: at them, it was easier just to look at the code: and I cannot see how
: you would be so sure that first-&gt;va_start &lt; addr, once you've done
: that addr = ALIGN(max(...), align) above, if align is over 0x1000
: (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve).

: I originally got around it by just changing the
: 		if (first-&gt;va_start &lt; addr) {
: to
: 		while (first-&gt;va_start &lt; addr) {
: without thinking about it any further; but that seemed unsatisfactory,
: why would we want to loop here when we've got another very similar
: loop just below it?

: I am never going to admit how long I've spent trying to grasp your
: "while (n)" rbtree loop just above this, the one with the peculiar
: 		if (!first &amp;&amp; tmp-&gt;va_start &lt; addr + size)
: in.  That's unfamiliar to me, I'm guessing it's designed to save a
: subsequent rb_next() in a few circumstances (at risk of then setting
: a wrong cached_hole_size?); but they did appear few to me, and I didn't
: feel I could sign off something with that in when I don't grasp it,
: and it seems responsible for extra code and mistaken BUG_ON below it.

: I've reverted to the familiar rbtree loop that find_vma() does (but
: with va_end &gt;= addr as you had, to respect the additional guard page):
: and then (given that cached_hole_size starts out 0) I don't see the
: need for any complications below it.  If you do want to keep that loop
: as you had it, please add a comment to explain what it's trying to do,
: and where addr is relative to first when you emerge from it.

: Aren't your tests "size &lt;= cached_hole_size" and
: "addr + size &gt; first-&gt;va_start" forgetting the guard page we want
: before the next area?  I've changed those.

: I have not changed your many "addr + size - 1 &lt; addr" overflow tests,
: but have since come to wonder, shouldn't they be "addr + size &lt; addr"
: tests - won't the vend checks go wrong if addr + size is 0?

: I have added a few comments - Wolfgang Wander's 2.6.13 description of
: 1363c3cd8603a913a27e2995dccbd70d5312d8e6 Avoiding mmap fragmentation
: helped me a lot, perhaps a pointer to that would be good too.  And I found
: it easier to understand when I renamed cached_start slightly and moved the
: overflow label down.

: This patch would go after your mm-vmap-area-cache.patch in mmotm.
: Trivially, nobody is going to get that BUG_ON with this patch, and it
: appears to work fine on my machines; but I have not given it anything like
: the testing you did on your original, and may have broken all the
: performance you were aiming for.  Please take a look and test it out
: integrate with yours if you're satisfied - thanks.

[akpm@linux-foundation.org: add locking comment]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Reported-and-tested-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Reported-and-tested-by: Avi Kivity &lt;avi@redhat.com&gt;
Tested-by: "Barry J. Marson" &lt;bmarson@redhat.com&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6</title>
<updated>2011-01-14T04:15:35Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-01-14T04:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=52cfd503ad7176d23a5dd7af3981744feb60622f'/>
<id>urn:sha1:52cfd503ad7176d23a5dd7af3981744feb60622f</id>
<content type='text'>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
  ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
  ACPI: fix resource check message
  ACPI / Battery: Update information on info notification and resume
  ACPI: Drop device flag wake_capable
  ACPI: Always check if _PRW is present before trying to evaluate it
  ACPI / PM: Check status of power resources under mutexes
  ACPI / PM: Rename acpi_power_off_device()
  ACPI / PM: Drop acpi_power_nocheck
  ACPI / PM: Drop acpi_bus_get_power()
  Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
  ACPI / Fan: Rework the handling of power resources
  ACPI / PM: Register power resource devices as soon as they are needed
  ACPI / PM: Register acpi_power_driver early
  ACPI / PM: Add function for updating device power state consistently
  ACPI / PM: Add function for device power state initialization
  ACPI / PM: Introduce __acpi_bus_get_power()
  ACPI / PM: Introduce function for refcounting device power resources
  ACPI / PM: Add functions for manipulating lists of power resources
  ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
  ACPICA: Update version to 20101209
  ...
</content>
</entry>
<entry>
<title>vmalloc: remove redundant unlikely()</title>
<updated>2011-01-14T01:32:36Z</updated>
<author>
<name>Tobias Klauser</name>
<email>tklauser@distanz.ch</email>
</author>
<published>2011-01-13T23:46:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ddf9c6d472825ceda66b3adff0f6437dbcd37f71'/>
<id>urn:sha1:ddf9c6d472825ceda66b3adff0f6437dbcd37f71</id>
<content type='text'>
IS_ERR() already implies unlikely(), so it can be omitted here.

Signed-off-by: Tobias Klauser &lt;tklauser@distanz.ch&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: unify module_alloc code for vmalloc</title>
<updated>2011-01-14T01:32:34Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-01-13T23:46:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d0a21265dfb5fa8ae54e90d0fb6d1c215b10a28a'/>
<id>urn:sha1:d0a21265dfb5fa8ae54e90d0fb6d1c215b10a28a</id>
<content type='text'>
Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init().  Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().

__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.

Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove gfp mask from pcpu_get_vm_areas</title>
<updated>2011-01-14T01:32:34Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-01-13T23:46:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec3f64fc9c196a304c4b7db3e1ff56d640628509'/>
<id>urn:sha1:ec3f64fc9c196a304c4b7db3e1ff56d640628509</id>
<content type='text'>
pcpu_get_vm_areas() only uses GFP_KERNEL allocations, so remove the gfp_t
formal and use the mask internally.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
