<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/backing-dev.c, branch v5.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-07-24T00:43:28Z</updated>
<entry>
<title>writeback, cgroup: remove wb from offline list before releasing refcnt</title>
<updated>2021-07-24T00:43:28Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2021-07-23T22:50:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b43a9e76b4cc78cdaa8c809dd31cd452797b7661'/>
<id>urn:sha1:b43a9e76b4cc78cdaa8c809dd31cd452797b7661</id>
<content type='text'>
Boyang reported that the commit c22d70a162d3 ("writeback, cgroup:
release dying cgwbs by switching attached inodes") causes the kernel to
crash while running xfstests generic/256 on ext4 on aarch64 and ppc64le.

  run fstests generic/256 at 2021-07-12 05:41:40
  EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: . Quota mode: none.
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  Mem abort info:
     ESR = 0x96000005
     EC = 0x25: DABT (current EL), IL = 32 bits
     SET = 0, FnV = 0
     EA = 0, S1PTW = 0
     FSC = 0x05: level 1 translation fault
  Data abort info:
     ISV = 0, ISS = 0x00000005
     CM = 0, WnR = 0
  user pgtable: 64k pages, 48-bit VAs, pgdp=00000000b0502000
  [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
  Internal error: Oops: 96000005 [#1] SMP
  Modules linked in: dm_flakey dm_snapshot dm_bufio dm_zero dm_mod loop tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc ext4 vfat fat mbcache jbd2 drm fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_net net_failover virtio_console failover virtio_mmio aes_neon_bs [last unloaded: scsi_debug]
  CPU: 0 PID: 408468 Comm: kworker/u8:5 Tainted: G X --------- ---  5.14.0-0.rc1.15.bx.el9.aarch64 #1
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Workqueue: events_unbound cleanup_offline_cgwbs_workfn
  pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO BTYPE=--)
  pc : cleanup_offline_cgwbs_workfn+0x320/0x394
  lr : cleanup_offline_cgwbs_workfn+0xe0/0x394
  sp : ffff80001554fd10
  x29: ffff80001554fd10 x28: 0000000000000000 x27: 0000000000000001
  x26: 0000000000000000 x25: 00000000000000e0 x24: ffffd2a2fbe671a8
  x23: ffff80001554fd88 x22: ffffd2a2fbe67198 x21: ffffd2a2fc25a730
  x20: ffff210412bc3000 x19: ffff210412bc3280 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000040
  x11: ffff210481572238 x10: ffff21048157223a x9 : ffffd2a2fa276c60
  x8 : ffff210484106b60 x7 : 0000000000000000 x6 : 000000000007d18a
  x5 : ffff210416a86400 x4 : ffff210412bc0280 x3 : 0000000000000000
  x2 : ffff80001554fd88 x1 : ffff210412bc0280 x0 : 0000000000000003
  Call trace:
     cleanup_offline_cgwbs_workfn+0x320/0x394
     process_one_work+0x1f4/0x4b0
     worker_thread+0x184/0x540
     kthread+0x114/0x120
     ret_from_fork+0x10/0x18
  Code: d63f0020 97f99963 17ffffa6 f8588263 (f9400061)
  ---[ end trace e250fe289272792a ]---
  Kernel panic - not syncing: Oops: Fatal exception
  SMP: stopping secondary CPUs
  SMP: failed to stop secondary CPUs 0-2
  Kernel Offset: 0x52a2e9fa0000 from 0xffff800010000000
  PHYS_OFFSET: 0xfff0defca0000000
  CPU features: 0x00200251,23200840
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

The problem happens when cgwb_release_workfn() races with
cleanup_offline_cgwbs_workfn(): wb_tryget() in
cleanup_offline_cgwbs_workfn() can be called after percpu_ref_exit() is
cgwb_release_workfn(), which is basically a use-after-free error.

Fix the problem by making removing the writeback structure from the
offline list before releasing the percpu reference counter.  It will
guarantee that cleanup_offline_cgwbs_workfn() will not see and not
access writeback structures which are about to be released.

Link: https://lkml.kernel.org/r/20210716201039.3762203-1-guro@fb.com
Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Reported-by: Boyang Xue &lt;bxue@redhat.com&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Tested-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Murphy Zhou &lt;jencce.kernel@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback, cgroup: release dying cgwbs by switching attached inodes</title>
<updated>2021-06-29T17:53:48Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2021-06-29T02:36:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c22d70a162d3cc177282c4487be4d54876ca55c8'/>
<id>urn:sha1:c22d70a162d3cc177282c4487be4d54876ca55c8</id>
<content type='text'>
Asynchronously try to release dying cgwbs by switching attached inodes to
the nearest living ancestor wb.  It helps to get rid of per-cgroup
writeback structures themselves and of pinned memory and block cgroups,
which are significantly larger structures (mostly due to large per-cpu
statistics data).  This prevents memory waste and helps to avoid different
scalability problems caused by large piles of dying cgroups.

Reuse the existing mechanism of inode switching used for foreign inode
detection.  To speed things up batch up to 115 inode switching in a single
operation (the maximum number is selected so that the resulting struct
inode_switch_wbs_context can fit into 1024 bytes).  Because every
switching consists of two steps divided by an RCU grace period, it would
be too slow without batching.  Please note that the whole batch counts as
a single operation (when increasing/decreasing isw_nr_in_flight).  This
allows to keep umounting working (flush the switching queue), however
prevents cleanups from consuming the whole switching quota and effectively
blocking the frn switching.

A cgwb cleanup operation can fail due to different reasons (e.g.  not
enough memory, the cgwb has an in-flight/pending io, an attached inode in
a wrong state, etc).  In this case the next scheduled cleanup will make a
new attempt.  An attempt is made each time a new cgwb is offlined (in
other words a memcg and/or a blkcg is deleted by a user).  In the future
an additional attempt scheduled by a timer can be implemented.

[guro@fb.com: replace open-coded "115" with arithmetic]
  Link: https://lkml.kernel.org/r/YMEcSBcq/VXMiPPO@carbon.dhcp.thefacebook.com
[guro@fb.com: add smp_mb() to inode_prepare_wbs_switch()]
  Link: https://lkml.kernel.org/r/YMFa+guFw7OFjf3X@carbon.dhcp.thefacebook.com
[willy@infradead.org: fix documentation]
  Link: https://lkml.kernel.org/r/20210615200242.1716568-2-willy@infradead.org

Link: https://lkml.kernel.org/r/20210608230225.2078447-9-guro@fb.com
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback, cgroup: keep list of inodes attached to bdi_writeback</title>
<updated>2021-06-29T17:53:48Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2021-06-29T02:35:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f3b6a6df38aa514d97e8c6fcc748be1d4142bec9'/>
<id>urn:sha1:f3b6a6df38aa514d97e8c6fcc748be1d4142bec9</id>
<content type='text'>
Currently there is no way to iterate over inodes attached to a specific
cgwb structure.  It limits the ability to efficiently reclaim the
writeback structure itself and associated memory and block cgroup
structures without scanning all inodes belonging to a sb, which can be
prohibitively expensive.

While dirty/in-active-writeback an inode belongs to one of the
bdi_writeback's io lists: b_dirty, b_io, b_more_io and b_dirty_time.  Once
cleaned up, it's removed from all io lists.  So the inode-&gt;i_io_list can
be reused to maintain the list of inodes, attached to a bdi_writeback
structure.

This patch introduces a new wb-&gt;b_attached list, which contains all inodes
which were dirty at least once and are attached to the given cgwb.  Inodes
attached to the root bdi_writeback structures are never placed on such
list.  The following patch will use this list to try to release cgwbs
structures more efficiently.

Link: https://lkml.kernel.org/r/20210608230225.2078447-6-guro@fb.com
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/backing-dev.c: use might_alloc()</title>
<updated>2021-02-26T17:41:01Z</updated>
<author>
<name>Daniel Vetter</name>
<email>daniel.vetter@ffwll.ch</email>
</author>
<published>2021-02-26T01:18:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c1ca59a1f21e360b26e26c187a4e42f22bb768d3'/>
<id>urn:sha1:c1ca59a1f21e360b26e26c187a4e42f22bb768d3</id>
<content type='text'>
Now that my little helper has landed, use it more.  On top of the existing
check this also uses lockdep through the fs_reclaim annotations.

[akpm@linux-foundation.org: include linux/sched/mm.h]

Link: https://lkml.kernel.org/r/20210113135009.3606813-2-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter &lt;daniel.vetter@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: backing-dev: Remove duplicated macro definition</title>
<updated>2021-02-24T21:38:28Z</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2021-02-24T20:02:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6986c3e2b19505e9b2112fc2e548e9f99fa3021f'/>
<id>urn:sha1:6986c3e2b19505e9b2112fc2e548e9f99fa3021f</id>
<content type='text'>
Move the K() macro a little forward to remove the same macro definition.

Link: https://lkml.kernel.org/r/d1ccdf2d3116dce9814f2bcc1f0415ecb4c76ea5.1612862230.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm:backing-dev: use sysfs_emit in macro defining functions</title>
<updated>2020-12-15T20:13:47Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2020-12-15T03:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e4c0d86cf4a7a22abb9468e84f4480dd6b67032'/>
<id>urn:sha1:5e4c0d86cf4a7a22abb9468e84f4480dd6b67032</id>
<content type='text'>
The cocci script used in commit bdacbb8d04f ("mm: Use sysfs_emit for
struct kobject * uses") does not convert the name##_show macro because the
macro uses concatenation via ##.

Convert it by hand.

Link: https://lkml.kernel.org/r/45ec6cfc177d743f9c0ebaf35e43969dce43af42.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag</title>
<updated>2020-09-24T19:43:39Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-09-24T06:51:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f56753ac2a90810726334df04d735e9f8f5a32d9'/>
<id>urn:sha1:f56753ac2a90810726334df04d735e9f8f5a32d9</id>
<content type='text'>
Replace the two negative flags that are always used together with a
single positive flag that indicates the writeback capability instead
of two related non-capabilities.  Also remove the pointless wrappers
to just check the flag.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: invert BDI_CAP_NO_ACCT_WB</title>
<updated>2020-09-24T19:43:39Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-09-24T06:51:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=823423ef55f4d9c470b1edc9c5b5c93d06abfaae'/>
<id>urn:sha1:823423ef55f4d9c470b1edc9c5b5c93d06abfaae</id>
<content type='text'>
Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
make the checks more obvious.  Also remove the pointless
bdi_cap_account_writeback wrapper that just obsfucates the check.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag</title>
<updated>2020-09-24T19:43:39Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-09-24T06:51:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1cb039f3dc1619eb795c54aad0a98fdb379b4237'/>
<id>urn:sha1:1cb039f3dc1619eb795c54aad0a98fdb379b4237</id>
<content type='text'>
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bdi: initialize -&gt;ra_pages and -&gt;io_pages in bdi_init</title>
<updated>2020-09-24T19:43:39Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-09-24T06:51:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=55b2598e84e97efc5d952958cb5e34236c43276b'/>
<id>urn:sha1:55b2598e84e97efc5d952958cb5e34236c43276b</id>
<content type='text'>
Set up a readahead size by default, as very few users have a good
reason to change it.  This means code, ecryptfs, and orangefs now
set up the values while they were previously missing it, while ubifs,
mtd and vboxsf manually set it to 0 to avoid readahead.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: David Sterba &lt;dsterba@suse.com&gt; [btrfs]
Acked-by: Richard Weinberger &lt;richard@nod.at&gt; [ubifs, mtd]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
