<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-09-15T19:32:03Z</updated>
<entry>
<title>Revert "ext4: make __ext4_get_inode_loc plug"</title>
<updated>2019-09-15T19:32:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-15T19:32:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=72dbcf72156641fde4d8ea401e977341bfd35a05'/>
<id>urn:sha1:72dbcf72156641fde4d8ea401e977341bfd35a05</id>
<content type='text'>
This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7.

This is sad, and done for all the wrong reasons.  Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.

However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.

But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom).  And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.

It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration.  Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).

The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion.  Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?

So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.

Reported-by: Ahmed S. Darwish &lt;darwish.07@gmail.com&gt;
Cc: Ted Ts'o &lt;tytso@mit.edu&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Cc: Alexander E. Patrakov &lt;patrakov@gmail.com&gt;
Cc: Lennart Poettering &lt;mzxreary@0pointer.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2019-09-13T08:48:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-13T08:48:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1b304a1ae45de4df7d773f0a39d1100aabca615b'/>
<id>urn:sha1:1b304a1ae45de4df7d773f0a39d1100aabca615b</id>
<content type='text'>
Pull btrfs fixes from David Sterba:
 "Here are two fixes, one of them urgent fixing a bug introduced in 5.2
  and reported by many users. It took time to identify the root cause,
  catching the 5.3 release is higly desired also to push the fix to 5.2
  stable tree.

  The bug is a mess up of return values after adding proper error
  handling and honestly the kind of bug that can cause sleeping
  disorders until it's caught. My appologies to everybody who was
  affected.

  Summary of what could happen:

  1) either a hang when committing a transaction, if this happens
     there's no risk of corruption, still the hang is very inconvenient
     and can't be resolved without a reboot

  2) writeback for some btree nodes may never be started and we end up
     committing a transaction without noticing that, this is really
     serious and that will lead to the "parent transid verify failed"
     messages"

* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
  Btrfs: fix assertion failure during fsync and use of stale transaction
</content>
</entry>
<entry>
<title>Btrfs: fix unwritten extent buffers and hangs on future writeback attempts</title>
<updated>2019-09-12T11:37:25Z</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2019-09-11T16:42:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=18dfa7117a3f379862dcd3f67cadd678013bb9dd'/>
<id>urn:sha1:18dfa7117a3f379862dcd3f67cadd678013bb9dd</id>
<content type='text'>
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.

When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).

Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:

  [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
  [49887.347059]       Not tainted 5.2.13-gentoo #2
  [49887.347060] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
  [49887.347064] Call Trace:
  [49887.347069]  ? __schedule+0x265/0x830
  [49887.347071]  ? bit_wait+0x50/0x50
  [49887.347072]  ? bit_wait+0x50/0x50
  [49887.347074]  schedule+0x24/0x90
  [49887.347075]  io_schedule+0x3c/0x60
  [49887.347077]  bit_wait_io+0x8/0x50
  [49887.347079]  __wait_on_bit+0x6c/0x80
  [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
  [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
  [49887.347084]  ? var_wake_function+0x20/0x20
  [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
  [49887.347089]  btree_write_cache_pages+0x18e/0x340
  [49887.347091]  do_writepages+0x29/0xb0
  [49887.347093]  ? kmem_cache_free+0x132/0x160
  [49887.347095]  ? convert_extent_bit+0x544/0x680
  [49887.347097]  filemap_fdatawrite_range+0x70/0x90
  [49887.347099]  btrfs_write_marked_extents+0x53/0x120
  [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
  [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
  [49887.347103]  ? start_transaction+0x33e/0x500
  [49887.347105]  transaction_kthread+0x139/0x15c

So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).

This is a regression introduced in the 5.2 kernel.

Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka &lt;zsojka@seznam.cz&gt;
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG &lt;s.priebe@profihost.ag&gt;
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar &lt;drazen.kacar@oradian.com&gt;
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Btrfs: fix assertion failure during fsync and use of stale transaction</title>
<updated>2019-09-12T11:37:19Z</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2019-09-10T14:26:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=410f954cb1d1c79ae485dd83a175f21954fd87cd'/>
<id>urn:sha1:410f954cb1d1c79ae485dd83a175f21954fd87cd</id>
<content type='text'>
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current-&gt;journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (-&gt;block_rsv) and set its -&gt;bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (-&gt;use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL -&gt;block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a -&gt;bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL -&gt;block_rsv and a non-zero -&gt;bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because -&gt;bytes_reserved is not zero but
   the -&gt;block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans-&gt;bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's -&gt;use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its -&gt;use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfs</title>
<updated>2019-09-06T19:44:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-06T19:44:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=30d7030b2fb39d9b53270b2fe0caac8e8792890c'/>
<id>urn:sha1:30d7030b2fb39d9b53270b2fe0caac8e8792890c</id>
<content type='text'>
Pull configfs fixes from Christoph Hellwig:
 "Late configfs fixes from Al that fix pretty nasty removal vs attribute
  access races"

* tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfs:
  configfs: provide exclusion between IO and removals
  configfs: new object reprsenting tree fragments
  configfs_register_group() shouldn't be (and isn't) called in rmdirable parts
  configfs: stash the data we need into configfs_buffer at open time
</content>
</entry>
<entry>
<title>configfs: provide exclusion between IO and removals</title>
<updated>2019-09-04T20:33:51Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-08-31T07:43:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0841eefd9693827afb9888235e26ddd098f9cef'/>
<id>urn:sha1:b0841eefd9693827afb9888235e26ddd098f9cef</id>
<content type='text'>
Make sure that attribute methods are not called after the item
has been removed from the tree.  To do so, we
	* at the point of no return in removals, grab -&gt;frag_sem
exclusive and mark the fragment dead.
	* call the methods of attributes with -&gt;frag_sem taken
shared and only after having verified that the fragment is still
alive.

	The main benefit is for method instances - they are
guaranteed that the objects they are accessing *and* all ancestors
are still there.  Another win is that we don't need to bother
with extra refcount on config_item when opening a file -
the item will be alive for as long as it stays in the tree, and
we won't touch it/attributes/any associated data after it's
been removed from the tree.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>configfs: new object reprsenting tree fragments</title>
<updated>2019-09-02T20:10:44Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-08-25T23:56:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=47320fbe11a6059ae502c9c16b668022fdb4cf76'/>
<id>urn:sha1:47320fbe11a6059ae502c9c16b668022fdb4cf76</id>
<content type='text'>
Refcounted, hangs of configfs_dirent, created by operations that add
fragments to configfs tree (mkdir and configfs_register_{subsystem,group}).
Will be used in the next commit to provide exclusion between fragment
removal and -&gt;show/-&gt;store calls.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>configfs_register_group() shouldn't be (and isn't) called in rmdirable parts</title>
<updated>2019-09-02T20:10:44Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-08-30T03:13:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f19e4ed1e1edbfa3c9ccb9fed17759b7d6db24c6'/>
<id>urn:sha1:f19e4ed1e1edbfa3c9ccb9fed17759b7d6db24c6</id>
<content type='text'>
revert cc57c07343bd "configfs: fix registered group removal"
It was an attempt to handle something that fundamentally doesn't
work - configfs_register_group() should never be done in a part
of tree that can be rmdir'ed.  And in mainline it never had been,
so let's not borrow trouble; the fix was racy anyway, it would take
a lot more to make that work and desired semantics is not clear.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>configfs: stash the data we need into configfs_buffer at open time</title>
<updated>2019-09-02T20:10:43Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-08-30T15:30:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ff4dd081977da56566a848f071aed8fa92d604a1'/>
<id>urn:sha1:ff4dd081977da56566a848f071aed8fa92d604a1</id>
<content type='text'>
simplifies the -&gt;read()/-&gt;write()/-&gt;release() instances nicely

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>NFS: Fix inode fileid checks in attribute revalidation code</title>
<updated>2019-09-02T17:10:19Z</updated>
<author>
<name>Trond Myklebust</name>
<email>trond.myklebust@hammerspace.com</email>
</author>
<published>2019-08-28T15:26:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eb3d8f42231aec65b64b079dd17bd6c008a3fe29'/>
<id>urn:sha1:eb3d8f42231aec65b64b079dd17bd6c008a3fe29</id>
<content type='text'>
We want to throw out the attrbute if it refers to the mounted on fileid,
and not the real fileid. However we do not want to block cache consistency
updates from NFSv4 writes.

Reported-by: Murphy Zhou &lt;jencce.kernel@gmail.com&gt;
Fixes: 7e10cc25bfa0 ("NFS: Don't refresh attributes with mounted-on-file...")
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
</feed>
