<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/vmalloc.c, branch v4.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-03-13T01:46:08Z</updated>
<entry>
<title>kasan, module, vmalloc: rework shadow allocation for modules</title>
<updated>2015-03-13T01:46:08Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>a.ryabinin@samsung.com</email>
</author>
<published>2015-03-12T23:26:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a5af5aa8b67dfdba36c853b70564fd2dfe73d478'/>
<id>urn:sha1:a5af5aa8b67dfdba36c853b70564fd2dfe73d478</id>
<content type='text'>
Current approach in handling shadow memory for modules is broken.

Shadow memory could be freed only after memory shadow corresponds it is no
longer used.  vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:

    void vfree(const void *addr)
    {
    ...
        if (unlikely(in_interrupt())) {
            struct vfree_deferred *p = this_cpu_ptr(&amp;vfree_deferred);
            if (llist_add((struct llist_node *)addr, &amp;p-&gt;list))
                    schedule_work(&amp;p-&gt;wq);

Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.

So shadow memory should be freed after module's memory.  However, such
deallocation order could race with kasan_module_alloc() in module_alloc().

Free shadow right before releasing vm area.  At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.

Signed-off-by: Andrey Ryabinin &lt;a.ryabinin@samsung.com&gt;
Acked-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmalloc: pass additional vm_flags to __vmalloc_node_range()</title>
<updated>2015-02-14T05:21:42Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>a.ryabinin@samsung.com</email>
</author>
<published>2015-02-13T22:40:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cb9e3c292d0115499c660028ad35ac5501d722b5'/>
<id>urn:sha1:cb9e3c292d0115499c660028ad35ac5501d722b5</id>
<content type='text'>
For instrumenting global variables KASan will shadow memory backing memory
for modules.  So on module loading we will need to allocate memory for
shadow and map it at address in shadow that corresponds to the address
allocated in module_alloc().

__vmalloc_node_range() could be used for this purpose, except it puts a
guard hole after allocated area.  Guard hole in shadow memory should be a
problem because at some future point we might need to have a shadow memory
at address occupied by guard hole.  So we could fail to allocate shadow
for module_alloc().

Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
__vmalloc_node_range().  Add new parameter 'vm_flags' to
__vmalloc_node_range() function.

Signed-off-by: Andrey Ryabinin &lt;a.ryabinin@samsung.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Konstantin Serebryany &lt;kcc@google.com&gt;
Cc: Dmitry Chernenkov &lt;dmitryc@google.com&gt;
Signed-off-by: Andrey Konovalov &lt;adech.fo@gmail.com&gt;
Cc: Yuri Gribov &lt;tetra2005@gmail.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmalloc: add flag preventing guard hole allocation</title>
<updated>2015-02-14T05:21:42Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>a.ryabinin@samsung.com</email>
</author>
<published>2015-02-13T22:40:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=71394fe50146202f2c8d92cf50f5ebc761acf254'/>
<id>urn:sha1:71394fe50146202f2c8d92cf50f5ebc761acf254</id>
<content type='text'>
For instrumenting global variables KASan will shadow memory backing memory
for modules.  So on module loading we will need to allocate memory for
shadow and map it at address in shadow that corresponds to the address
allocated in module_alloc().

__vmalloc_node_range() could be used for this purpose, except it puts a
guard hole after allocated area.  Guard hole in shadow memory should be a
problem because at some future point we might need to have a shadow memory
at address occupied by guard hole.  So we could fail to allocate shadow
for module_alloc().

Add a new vm_struct flag 'VM_NO_GUARD' indicating that vm area doesn't
have a guard hole.

Signed-off-by: Andrey Ryabinin &lt;a.ryabinin@samsung.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Konstantin Serebryany &lt;kcc@google.com&gt;
Cc: Dmitry Chernenkov &lt;dmitryc@google.com&gt;
Signed-off-by: Andrey Konovalov &lt;adech.fo@gmail.com&gt;
Cc: Yuri Gribov &lt;tetra2005@gmail.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc.c: fix memory ordering bug</title>
<updated>2014-12-13T20:42:49Z</updated>
<author>
<name>Dmitry Vyukov</name>
<email>dvyukov@google.com</email>
</author>
<published>2014-12-13T00:56:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e5b528b4ce31208ef5c240c14beec4853d8262c'/>
<id>urn:sha1:7e5b528b4ce31208ef5c240c14beec4853d8262c</id>
<content type='text'>
Read memory barriers must follow the read operations.

Signed-off-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc.c: replace printk with pr_warn</title>
<updated>2014-12-11T01:41:05Z</updated>
<author>
<name>Pintu Kumar</name>
<email>pintu.k@samsung.com</email>
</author>
<published>2014-12-10T23:42:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0cbc8533b75b6d8e3416e598e9dbf40d8bcf4e01'/>
<id>urn:sha1:0cbc8533b75b6d8e3416e598e9dbf40d8bcf4e01</id>
<content type='text'>
This patch replaces printk(KERN_WARNING..) with pr_warn.
Thus it also reduces one line extra because of formatting.

Signed-off-by: Pintu Kumar &lt;pintu.k@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc.c: use seq_open_private() instead of seq_open()</title>
<updated>2014-10-10T02:25:56Z</updated>
<author>
<name>Rob Jones</name>
<email>rob.jones@codethink.co.uk</email>
</author>
<published>2014-10-09T22:28:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=703394c1005caeccaaf64945c1b6d6cc3af0cd1d'/>
<id>urn:sha1:703394c1005caeccaaf64945c1b6d6cc3af0cd1d</id>
<content type='text'>
Using seq_open_private() removes boilerplate code from vmalloc_open().

The resultant code is shorter and easier to follow.

However, please note that seq_open_private() call kzalloc() rather than
kmalloc() which may affect timing due to the memory initialisation
overhead.

Signed-off-by: Rob Jones &lt;rob.jones@codethink.co.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc.c: clean up map_vm_area third argument</title>
<updated>2014-08-07T01:01:19Z</updated>
<author>
<name>WANG Chao</name>
<email>chaowang@redhat.com</email>
</author>
<published>2014-08-06T23:06:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f6f8ed47353597dcb895eb4a15a28af657392e72'/>
<id>urn:sha1:f6f8ed47353597dcb895eb4a15a28af657392e72</id>
<content type='text'>
Currently map_vm_area() takes (struct page *** pages) as third argument,
and after mapping, it moves (*pages) to point to (*pages +
nr_mappped_pages).

It looks like this kind of increment is useless to its caller these
days.  The callers don't care about the increments and actually they're
trying to avoid this by passing another copy to map_vm_area().

The caller can always guarantee all the pages can be mapped into vm_area
as specified in first argument and the caller only cares about whether
map_vm_area() fails or not.

This patch cleans up the pointer movement in map_vm_area() and updates
its callers accordingly.

Signed-off-by: WANG Chao &lt;chaowang@redhat.com&gt;
Cc: Zhang Yanfei &lt;zhangyanfei@cn.fujitsu.com&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Nitin Gupta &lt;ngupta@vflare.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, vmalloc: constify allocation mask</title>
<updated>2014-08-07T01:01:18Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2014-08-06T23:06:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=930f036b4ff6501b91e09bba4bf94423203dabd9'/>
<id>urn:sha1:930f036b4ff6501b91e09bba4bf94423203dabd9</id>
<content type='text'>
tmp_mask in the __vmalloc_area_node() iteration never changes so it can
be moved into function scope and marked with const.  This causes the
movl and orl to only be done once per call rather than area-&gt;nr_pages
times.

nested_gfp can also be marked const.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc.c: add a schedule point to vmalloc()</title>
<updated>2014-08-07T01:01:18Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-08-06T23:06:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=660654f90e7f8f6d8163276d47fc1573a39c7007'/>
<id>urn:sha1:660654f90e7f8f6d8163276d47fc1573a39c7007</id>
<content type='text'>
It is not uncommon on busy servers to get stuck hundred of ms in
vmalloc() calls (like file descriptor expansions).

Add a cond_resched() to __vmalloc_area_node() to be gentle to
other tasks.

[akpm@linux-foundation.org: only do it for __GFP_WAIT, per David]
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmalloc: use rcu list iterator to reduce vmap_area_lock contention</title>
<updated>2014-08-07T01:01:15Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2014-08-06T23:05:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=474750aba88817c53f39424e5567b8e4acc4b39b'/>
<id>urn:sha1:474750aba88817c53f39424e5567b8e4acc4b39b</id>
<content type='text'>
Richard Yao reported a month ago that his system have a trouble with
vmap_area_lock contention during performance analysis by /proc/meminfo.
Andrew asked why his analysis checks /proc/meminfo stressfully, but he
didn't answer it.

  https://lkml.org/lkml/2014/4/10/416

Although I'm not sure that this is right usage or not, there is a
solution reducing vmap_area_lock contention with no side-effect.  That
is just to use rcu list iterator in get_vmalloc_info().

rcu can be used in this function because all RCU protocol is already
respected by writers, since Nick Piggin commit db64fe02258f1 ("mm:
rewrite vmap layer") back in linux-2.6.28

Specifically :
   insertions use list_add_rcu(),
   deletions use list_del_rcu() and kfree_rcu().

Note the rb tree is not used from rcu reader (it would not be safe),
only the vmap_area_list has full RCU protection.

Note that __purge_vmap_area_lazy() already uses this rcu protection.

        rcu_read_lock();
        list_for_each_entry_rcu(va, &amp;vmap_area_list, list) {
                if (va-&gt;flags &amp; VM_LAZY_FREE) {
                        if (va-&gt;va_start &lt; *start)
                                *start = va-&gt;va_start;
                        if (va-&gt;va_end &gt; *end)
                                *end = va-&gt;va_end;
                        nr += (va-&gt;va_end - va-&gt;va_start) &gt;&gt; PAGE_SHIFT;
                        list_add_tail(&amp;va-&gt;purge_list, &amp;valist);
                        va-&gt;flags |= VM_LAZY_FREEING;
                        va-&gt;flags &amp;= ~VM_LAZY_FREE;
                }
        }
        rcu_read_unlock();

Peter:

: While rcu list traversal over the vmap_area_list is safe, this may
: arrive at different results than the spinlocked version. The rcu list
: traversal version will not be a 'snapshot' of a single, valid instant
: of the entire vmap_area_list, but rather a potential amalgam of
: different list states.

Joonsoo:

: Yes, you are right, but I don't think that we should be strict here.
: Meminfo is already not a 'snapshot' at specific time.  While we try to get
: certain stats, the other stats can change.  And, although we may arrive at
: different results than the spinlocked version, the difference would not be
: large and would not make serious side-effect.

[edumazet@google.com: add more commit description]
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Reported-by: Richard Yao &lt;ryao@gentoo.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Zhang Yanfei &lt;zhangyanfei.yes@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
