<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/fs-writeback.c, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-07-10T15:00:57Z</updated>
<entry>
<title>blkcg, writeback: Add wbc-&gt;no_cgroup_owner</title>
<updated>2019-07-10T15:00:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-06-27T20:39:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27b36d8fa81fa8274fb72f4eb1484026f6b6daa8'/>
<id>urn:sha1:27b36d8fa81fa8274fb72f4eb1484026f6b6daa8</id>
<content type='text'>
When writeback IOs are bounced through async layers, the IOs should
only be accounted against the wbc from the original bdi writeback to
avoid confusing cgroup inode ownership arbitration.  Add
wbc-&gt;no_cgroup_owner to allow disabling wbc cgroup owner accounting.
This will be used make btrfs compression work well with cgroup IO
control.

v2: Renamed from no_wbc_acct to no_cgroup_owner and added comment as
    per Jan.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()</title>
<updated>2019-07-10T15:00:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-06-27T20:39:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=34e51a5e1a6e939ed7d99c38173821ab86d577f4'/>
<id>urn:sha1:34e51a5e1a6e939ed7d99c38173821ab86d577f4</id>
<content type='text'>
wbc_account_io() does a very specific job - try to see which cgroup is
actually dirtying an inode and transfer its ownership to the majority
dirtier if needed.  The name is too generic and confusing.  Let's
rename it to something more specific.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages</title>
<updated>2019-07-10T15:00:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-06-27T20:39:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9b0eb69b75bccada2d341d7e7ca342f0cb1c9a6a'/>
<id>urn:sha1:9b0eb69b75bccada2d341d7e7ca342f0cb1c9a6a</id>
<content type='text'>
btrfs is going to use css_put() and wbc helpers to improve cgroup
writeback support.  Add dummy css_get() definition and export wbc
helpers to prepare for module and !CONFIG_CGROUP builds.

Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration</title>
<updated>2019-06-15T16:39:42Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-06-13T22:30:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6631142229005e1b1c311a09efe9fb3cfdac8559'/>
<id>urn:sha1:6631142229005e1b1c311a09efe9fb3cfdac8559</id>
<content type='text'>
wbc_account_io() collects information on cgroup ownership of writeback
pages to determine which cgroup should own the inode.  Pages can stay
associated with dead memcgs but we want to avoid attributing IOs to
dead blkcgs as much as possible as the association is likely to be
stale.  However, currently, pages associated with dead memcgs
contribute to the accounting delaying and/or confusing the
arbitration.

Fix it by ignoring pages associated with dead memcgs.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>treewide: Add SPDX license identifier for missed files</title>
<updated>2019-05-21T08:50:45Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-19T12:08:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=457c89965399115e5cd8bf38f9c597293405703d'/>
<id>urn:sha1:457c89965399115e5cd8bf38f9c597293405703d</id>
<content type='text'>
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount</title>
<updated>2019-05-18T22:52:26Z</updated>
<author>
<name>Jiufei Xue</name>
<email>jiufei.xue@linux.alibaba.com</email>
</author>
<published>2019-05-17T21:31:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec084de929e419e51bcdafaafe567d9e7d0273b7'/>
<id>urn:sha1:ec084de929e419e51bcdafaafe567d9e7d0273b7</id>
<content type='text'>
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu().  Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.

  VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
    evict+0xb3/0x180
    iput+0x1b0/0x230
    inode_switch_wbs_work_fn+0x3c0/0x6a0
    worker_thread+0x4e/0x490
    ? process_one_work+0x410/0x410
    kthread+0xe6/0x100
    ret_from_fork+0x39/0x50

Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish.  And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.

Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com
Signed-off-by: Jiufei Xue &lt;jiufei.xue@linux.alibaba.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: synchronize sync(2) against cgroup writeback membership switches</title>
<updated>2019-01-22T21:39:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-12-12T16:38:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7fc5854f8c6efae9e7624970ab49a1eac2faefb1'/>
<id>urn:sha1:7fc5854f8c6efae9e7624970ab49a1eac2faefb1</id>
<content type='text'>
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info-&gt;wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jiufei Xue &lt;xuejiufei@gmail.com&gt;
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: Convert writeback to XArray</title>
<updated>2018-10-21T14:46:42Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2017-12-04T15:46:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04edf02cdd37a7e2e23ce8de77cb06d22ef2f503'/>
<id>urn:sha1:04edf02cdd37a7e2e23ce8de77cb06d22ef2f503</id>
<content type='text'>
A couple of short loops.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>bdi: Fix oops in wb_workfn()</title>
<updated>2018-05-03T22:11:37Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-05-03T16:26:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b8b784958eccbf8f51ebeee65282ca3fd59ea391'/>
<id>urn:sha1:b8b784958eccbf8f51ebeee65282ca3fd59ea391</id>
<content type='text'>
Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb-&gt;bdi-&gt;dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:

	mod_delayed_work(bdi_wq, &amp;wb-&gt;dwork, 0);

and if this happens while wb_shutdown() is waiting in:

	flush_delayed_work(&amp;wb-&gt;dwork);

the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb-&gt;bdi-&gt;dev.

Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.

CC: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
CC: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reported-by: syzbot &lt;syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>writeback: safer lock nesting</title>
<updated>2018-04-21T00:18:35Z</updated>
<author>
<name>Greg Thelen</name>
<email>gthelen@google.com</email>
</author>
<published>2018-04-20T21:55:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2e898e4c0a3897ccd434adac5abb8330194f527b'/>
<id>urn:sha1:2e898e4c0a3897ccd434adac5abb8330194f527b</id>
<content type='text'>
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &amp;locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end
    &lt;interrupts mistakenly enabled&gt;
                                    &lt;interrupt&gt;
                                    end_page_writeback
                                    test_clear_page_writeback
                                    lock_page_memcg
                                    &lt;deadlock&gt;
    unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 &gt; a/memory.move_charge_at_immigrate
  echo 1 &gt; b/memory.move_charge_at_immigrate
  (
    echo $BASHPID &gt; a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &amp;
  while true; do
    sync
  done &amp;
  sleep 1h &amp;
  SLEEP=$!
  while true; do
    echo $SLEEP &gt; a/cgroup.procs
    echo $SLEEP &gt; b/cgroup.procs
  done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
  Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Reported-by: Wang Long &lt;wanglong19@meituan.com&gt;
Acked-by: Wang Long &lt;wanglong19@meituan.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v4.2+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
