<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/backing-dev.c, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-06-03T13:49:07Z</updated>
<entry>
<title>backing-dev: no need to check return value of debugfs_create functions</title>
<updated>2019-06-03T13:49:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2019-01-22T15:21:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2d146b924ec3c0873f06308d149684dc1105d9a3'/>
<id>urn:sha1:2d146b924ec3c0873f06308d149684dc1105d9a3</id>
<content type='text'>
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

And as the return value does not matter at all, no need to save the
dentry in struct backing_dev_info, so delete it.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Anders Roxell &lt;anders.roxell@linaro.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: linux-mm@kvack.org
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Add SPDX license identifier for missed files</title>
<updated>2019-05-21T08:50:45Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-19T12:08:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=457c89965399115e5cd8bf38f9c597293405703d'/>
<id>urn:sha1:457c89965399115e5cd8bf38f9c597293405703d</id>
<content type='text'>
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: synchronize sync(2) against cgroup writeback membership switches</title>
<updated>2019-01-22T21:39:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-12-12T16:38:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7fc5854f8c6efae9e7624970ab49a1eac2faefb1'/>
<id>urn:sha1:7fc5854f8c6efae9e7624970ab49a1eac2faefb1</id>
<content type='text'>
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info-&gt;wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jiufei Xue &lt;xuejiufei@gmail.com&gt;
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: delay blkg destruction until after writeback has finished</title>
<updated>2018-08-31T20:48:56Z</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2018-08-31T20:22:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=59b57717fff8b562825d9d25e0180ad7e8048ca9'/>
<id>urn:sha1:59b57717fff8b562825d9d25e0180ad7e8048ca9</id>
<content type='text'>
Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Cc: Jiufei Xue &lt;jiufei.xue@linux.alibaba.com&gt;
Cc: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: use irqsave variant of refcount_dec_and_lock()</title>
<updated>2018-08-22T17:52:46Z</updated>
<author>
<name>Anna-Maria Gleixner</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2018-08-22T04:55:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=060288a7320b2837a8ff2af1b3643bdbd5e568f6'/>
<id>urn:sha1:060288a7320b2837a8ff2af1b3643bdbd5e568f6</id>
<content type='text'>
The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock.  With this variant the call of
local_irq_save/restore is no longer required.

[bigeasy@linutronix.de: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
Link: http://lkml.kernel.org/r/20180703200141.28415-5-bigeasy@linutronix.de
Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bdi: use refcount_t for reference counting instead atomic_t</title>
<updated>2018-08-22T17:52:46Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2018-08-22T04:55:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e58dd0de5eadf145895b13451a1fef8ef03946eb'/>
<id>urn:sha1:e58dd0de5eadf145895b13451a1fef8ef03946eb</id>
<content type='text'>
refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This permits avoiding
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/20180703200141.28415-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bdi: Fix another oops in wb_workfn()</title>
<updated>2018-06-22T18:08:07Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-06-18T13:46:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ee7e8697d5860b173132606d80a9cd35e7113ee'/>
<id>urn:sha1:3ee7e8697d5860b173132606d80a9cd35e7113ee</id>
<content type='text'>
syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
wb-&gt;bdi-&gt;dev being NULL. And Dmitry confirmed that wb-&gt;state was
WB_shutting_down after wb-&gt;bdi-&gt;dev became NULL. This indicates that
unregister_bdi() failed to call wb_shutdown() on one of wb objects.

The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
drops bdi's reference to wb structures before going through the list of
wbs again and calling wb_shutdown() on each of them. This way the loop
iterating through all wbs can easily miss a wb if that wb has already
passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
from cgwb_release_workfn() and as a result fully shutdown bdi although
wb_workfn() for this wb structure is still running. In fact there are
also other ways cgwb_bdi_unregister() can race with
cgwb_release_workfn() leading e.g. to use-after-free issues:

CPU1                            CPU2
                                cgwb_bdi_unregister()
                                  cgwb_kill(*slot);

cgwb_release()
  queue_work(cgwb_release_wq, &amp;wb-&gt;release_work);
cgwb_release_workfn()
                                  wb = list_first_entry(&amp;bdi-&gt;wb_list, ...)
                                  spin_unlock_irq(&amp;cgwb_lock);
  wb_shutdown(wb);
  ...
  kfree_rcu(wb, rcu);
                                  wb_shutdown(wb); -&gt; oops use-after-free

We solve these issues by synchronizing writeback structure shutdown from
cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
way we also no longer need synchronization using WB_shutting_down as the
mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
bdi_unregister().

Reported-by: syzbot &lt;syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>memcg: writeback: use memcg-&gt;cgwb_list directly</title>
<updated>2018-06-08T00:34:36Z</updated>
<author>
<name>Wang Long</name>
<email>wanglong19@meituan.com</email>
</author>
<published>2018-06-08T00:07:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9ccc361716362e8abb01d9944f8df97b4aac3e23'/>
<id>urn:sha1:9ccc361716362e8abb01d9944f8df97b4aac3e23</id>
<content type='text'>
mem_cgroup_cgwb_list is a very simple wrapper and it will never be used
outside of code under CONFIG_CGROUP_WRITEBACK.  so use memcg-&gt;cgwb_list
directly.

Link: http://lkml.kernel.org/r/1524406173-212182-1-git-send-email-wanglong19@meituan.com
Signed-off-by: Wang Long &lt;wanglong19@meituan.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue</title>
<updated>2018-05-23T21:28:50Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-05-23T17:56:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f183464684190bacbfb14623bd3e4e51b7575b4c'/>
<id>urn:sha1:f183464684190bacbfb14623bd3e4e51b7575b4c</id>
<content type='text'>
From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
From: Tejun Heo &lt;tj@kernel.org&gt;
Date: Wed, 23 May 2018 10:29:00 -0700

cgwb_release() punts the actual release to cgwb_release_workfn() on
system_wq.  Depending on the number of cgroups or block devices, there
can be a lot of cgwb_release_workfn() in flight at the same time.

We're periodically seeing close to 256 kworkers getting stuck with the
following stack trace and overtime the entire system gets stuck.

  [&lt;ffffffff810ee40c&gt;] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330
  [&lt;ffffffff810ee634&gt;] synchronize_rcu_expedited+0x24/0x30
  [&lt;ffffffff811ccf23&gt;] bdi_unregister+0x53/0x290
  [&lt;ffffffff811cd1e9&gt;] release_bdi+0x89/0xc0
  [&lt;ffffffff811cd645&gt;] wb_exit+0x85/0xa0
  [&lt;ffffffff811cdc84&gt;] cgwb_release_workfn+0x54/0xb0
  [&lt;ffffffff810a68d0&gt;] process_one_work+0x150/0x410
  [&lt;ffffffff810a71fd&gt;] worker_thread+0x6d/0x520
  [&lt;ffffffff810ad3dc&gt;] kthread+0x12c/0x160
  [&lt;ffffffff81969019&gt;] ret_from_fork+0x29/0x40
  [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

The events leading to the lockup are...

1. A lot of cgwb_release_workfn() is queued at the same time and all
   system_wq kworkers are assigned to execute them.

2. They all end up calling synchronize_rcu_expedited().  One of them
   wins and tries to perform the expedited synchronization.

3. However, that invovles queueing rcu_exp_work to system_wq and
   waiting for it.  Because #1 is holding all available kworkers on
   system_wq, rcu_exp_work can't be executed.  cgwb_release_workfn()
   is waiting for synchronize_rcu_expedited() which in turn is waiting
   for cgwb_release_workfn() to free up some of the kworkers.

We shouldn't be scheduling hundreds of cgwb_release_workfn() at the
same time.  There's nothing to be gained from that.  This patch
updates cgwb release path to use a dedicated percpu workqueue with
@max_active of 1.

While this resolves the problem at hand, it might be a good idea to
isolate rcu_exp_work to its own workqueue too as it can be used from
various paths and is prone to this sort of indirect A-A deadlocks.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: Fix use after free bug in debugfs_remove()</title>
<updated>2018-05-03T15:36:24Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2018-04-23T02:21:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f53823c18131e755905b4f654196fd2cc3953f6e'/>
<id>urn:sha1:f53823c18131e755905b4f654196fd2cc3953f6e</id>
<content type='text'>
syzbot is reporting use after free bug in debugfs_remove() [1].

This is because fault injection made memory allocation for
debugfs_create_file() from bdi_debug_register() from bdi_register_va()
fail and continued with setting WB_registered. But when debugfs_remove()
is called from debugfs_remove(bdi-&gt;debug_dir) from bdi_debug_unregister()
 from bdi_unregister() from release_bdi() because WB_registered was set
by bdi_register_va(), IS_ERR_OR_NULL(bdi-&gt;debug_dir) == false despite
debugfs_remove(bdi-&gt;debug_dir) was already called from bdi_register_va().

Fix this by making IS_ERR_OR_NULL(bdi-&gt;debug_dir) == true.

[1] https://syzkaller.appspot.com/bug?id=5ab4efd91a96dcea9b68104f159adf4af2a6dfc1

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Reported-by: syzbot &lt;syzbot+049cb4ae097049dac137@syzkaller.appspotmail.com&gt;
Fixes: 97f07697932e6faf ("bdi: convert bdi_debug_register to int")
Cc: weiping zhang &lt;zhangweiping@didichuxing.com&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
