<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/backing-dev.c, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-04-16T17:41:26Z</updated>
<entry>
<title>writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs</title>
<updated>2023-04-16T17:41:26Z</updated>
<author>
<name>Baokun Li</name>
<email>libaokun1@huawei.com</email>
</author>
<published>2023-04-10T13:08:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1ba1199ec5747f475538c0d25a32804e5ba1dfde'/>
<id>urn:sha1:1ba1199ec5747f475538c0d25a32804e5ba1dfde</id>
<content type='text'>
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
 &lt;TASK&gt;
 dump_stack_lvl+0x7f/0xc0
 print_report+0x2ba/0x340
 kasan_report+0xc4/0x120
 kasan_check_range+0x1b7/0x2e0
 __kasan_check_write+0x24/0x40
 bdi_split_work_to_wbs+0x5c5/0x7b0
 sync_inodes_sb+0x195/0x630
 sync_inodes_one_sb+0x3a/0x50
 iterate_supers+0x106/0x1b0
 ksys_sync+0x98/0x160
[...]
==================================================================

The race that causes the above issue is as follows:

           cpu1                     cpu2
-------------------------|-------------------------
inode_switch_wbs
 INIT_WORK(&amp;isw-&gt;work, inode_switch_wbs_work_fn)
 queue_rcu_work(isw_wq, &amp;isw-&gt;work)
 // queue_work async
  inode_switch_wbs_work_fn
   wb_put_many(old_wb, nr_switched)
    percpu_ref_put_many
     ref-&gt;data-&gt;release(ref)
     cgwb_release
      queue_work(cgwb_release_wq, &amp;wb-&gt;release_work)
      // queue_work async
       &amp;wb-&gt;release_work
       cgwb_release_workfn
                            ksys_sync
                             iterate_supers
                              sync_inodes_one_sb
                               sync_inodes_sb
                                bdi_split_work_to_wbs
                                 kmalloc(sizeof(*work), GFP_ATOMIC)
                                 // alloc memory failed
        percpu_ref_exit
         ref-&gt;data = NULL
         kfree(data)
                                 wb_get(wb)
                                  percpu_ref_get(&amp;wb-&gt;refcnt)
                                   percpu_ref_get_many(ref, 1)
                                    atomic_long_add(nr, &amp;ref-&gt;data-&gt;count)
                                     atomic64_add(i, v)
                                     // trigger null-ptr-deref

bdi_split_work_to_wbs() traverses &amp;bdi-&gt;wb_list to split work into all
wbs.  If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards. 
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.

This issue was introduced in v4.3-rc7 (see fix tag1).  Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb-&gt;state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.

To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). 
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.

Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Hou Tao &lt;houtao1@huawei.com&gt;
Cc: yangerkun &lt;yangerkun@huawei.com&gt;
Cc: Zhang Yi &lt;yi.zhang@huawei.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add /sys/class/bdi/&lt;bdi&gt;/min_ratio_fine knob</title>
<updated>2022-11-30T23:59:06Z</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2022-11-19T00:52:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad3e6dabf6f7d9ffd68eb711191ef16cdbdd25f0'/>
<id>urn:sha1:ad3e6dabf6f7d9ffd68eb711191ef16cdbdd25f0</id>
<content type='text'>
This adds the min_ratio_fine knob. The knob specifies the values not
based on 1 of 100, but instead 1 per million.

Link: https://lkml.kernel.org/r/20221119005215.3052436-20-shr@devkernel.io
Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add /sys/class/bdi/&lt;bdi&gt;/max_ratio_fine knob</title>
<updated>2022-11-30T23:59:06Z</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2022-11-19T00:52:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bca52dcbadc583f4db6435599c44a79f97293f06'/>
<id>urn:sha1:bca52dcbadc583f4db6435599c44a79f97293f06</id>
<content type='text'>
This adds the max_ratio_fine knob. The knob specifies the values not
based on 1 of 100, but instead 1 per million.

Link: https://lkml.kernel.org/r/20221119005215.3052436-17-shr@devkernel.io
Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add /sys/class/bdi/&lt;bdi&gt;/min_bytes knob</title>
<updated>2022-11-30T23:59:05Z</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2022-11-19T00:52:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9c84819bd64ec15cb15d041c45ebe4725e9d4f3b'/>
<id>urn:sha1:9c84819bd64ec15cb15d041c45ebe4725e9d4f3b</id>
<content type='text'>
bdi has two existing knobs to limit the amount of dirty memory:
min_ratio and max_ratio. However the granularity of the knobs is limited
and often it is more convenient to specify limits in terms of bytes.
This change adds the min_bytes knob.

It does not store the min_bytes value, instead it converts the max_bytes
value to a ratio. The value is therefore more an approximation than an
absolute value.

It also maintains the sum over all the bdi min_ratio values stored in
the variable bdi_min_ratio.

Link: https://lkml.kernel.org/r/20221119005215.3052436-14-shr@devkernel.io
Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add knob /sys/class/bdi/&lt;bdi&gt;/max_bytes</title>
<updated>2022-11-30T23:59:04Z</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2022-11-19T00:52:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c56e049a5e401a177c7c9b39a3bcc973ff5cec0b'/>
<id>urn:sha1:c56e049a5e401a177c7c9b39a3bcc973ff5cec0b</id>
<content type='text'>
This adds the new knob max_bytes to specify a dirty memory limit for the
corresponding bdi. The specified bytes value is converted to a ratio.

Link: https://lkml.kernel.org/r/20221119005215.3052436-9-shr@devkernel.io
Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use part per 1000000 for bdi ratios</title>
<updated>2022-11-30T23:59:03Z</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2022-11-19T00:51:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ae82291e9ca47c3d6da6b77a00f427754aca413e'/>
<id>urn:sha1:ae82291e9ca47c3d6da6b77a00f427754aca413e</id>
<content type='text'>
To get finer granularity for ratio calculations use part per million
instead of percentiles. This is especially important if we want to
automatically convert byte values to ratios. Otherwise the values that
are actually used can be quite different. This is also important for
machines with more main memory (1% of 256GB is already 2.5GB).

Link: https://lkml.kernel.org/r/20221119005215.3052436-5-shr@devkernel.io
Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add knob /sys/class/bdi/&lt;bdi&gt;/strict_limit</title>
<updated>2022-11-30T23:59:03Z</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@devkernel.io</email>
</author>
<published>2022-11-19T00:51:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27bbe9d48d4e298864e18b39f091342c68b81637'/>
<id>urn:sha1:27bbe9d48d4e298864e18b39f091342c68b81637</id>
<content type='text'>
Add a new knob to /sys/class/bdi/&lt;bdi&gt;/strict_limit. This new knob
allows to set/unset the flag BDI_CAP_STRICTLIMIT in the bdi
capabilities.

Link: https://lkml.kernel.org/r/20221119005215.3052436-3-shr@devkernel.io
Signed-off-by: Stefan Roesch &lt;shr@devkernel.io&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: backing-dev: Remove the unneeded result variable</title>
<updated>2022-09-12T03:26:02Z</updated>
<author>
<name>ye xingchen</name>
<email>ye.xingchen@zte.com.cn</email>
</author>
<published>2022-08-26T07:19:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3083da7bcf56a4922b996ea3551847488a43a8b6'/>
<id>urn:sha1:3083da7bcf56a4922b996ea3551847488a43a8b6</id>
<content type='text'>
Return the value cgwb_bdi_init() directly instead of storing it in another
redundant variable.

Link: https://lkml.kernel.org/r/20220826071906.252419-1-ye.xingchen@zte.com.cn
Signed-off-by: ye xingchen &lt;ye.xingchen@zte.com.cn&gt;
Reported-by: Zeal Robot &lt;zealci@zte.com.cn&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: avoid use-after-free after removing device</title>
<updated>2022-08-28T21:02:43Z</updated>
<author>
<name>Khazhismel Kumykov</name>
<email>khazhy@chromium.org</email>
</author>
<published>2022-08-01T15:50:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f87904c075515f3e1d8f4a7115869d3b914674fd'/>
<id>urn:sha1:f87904c075515f3e1d8f4a7115869d3b914674fd</id>
<content type='text'>
When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete.  However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.

Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.

Since this requires wb-&gt;work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb-&gt;work_lock to an irqsafe lock.

Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
Signed-off-by: Khazhismel Kumykov &lt;khazhy@google.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>init: Initialize noop_backing_dev_info early</title>
<updated>2022-06-16T08:55:57Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-15T13:22:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4bca7e80b6455772b4bf3f536dcbc19aac424d6a'/>
<id>urn:sha1:4bca7e80b6455772b4bf3f536dcbc19aac424d6a</id>
<content type='text'>
noop_backing_dev_info is used by superblocks of various
pseudofilesystems such as kdevtmpfs. After commit 10e14073107d
("writeback: Fix inode-&gt;i_io_list not be protected by inode-&gt;i_lock
error") this broke because __mark_inode_dirty() started to access more
fields from noop_backing_dev_info and this led to crashes inside
locked_inode_to_wb_and_lock_list() called from __mark_inode_dirty().
Fix the problem by initializing noop_backing_dev_info before the
filesystems get mounted.

Fixes: 10e14073107d ("writeback: Fix inode-&gt;i_io_list not be protected by inode-&gt;i_lock error")
Reported-and-tested-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Reported-and-tested-by: Alexandru Elisei &lt;alexandru.elisei@arm.com&gt;
Reported-and-tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
</feed>
