<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/ext4/fsync.c, branch v4.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-06-13T02:38:04Z</updated>
<entry>
<title>ext4: Fix fsync error handling after filesystem abort</title>
<updated>2013-06-13T02:38:04Z</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2013-06-13T02:38:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4418e14112e3ca85e8492a4489a3552b0cc526a8'/>
<id>urn:sha1:4418e14112e3ca85e8492a4489a3552b0cc526a8</id>
<content type='text'>
If filesystem was aborted after inode's write back is complete
but before its metadata was updated we may return success
results in data loss.
In order to handle fs abort correctly we have to check
fs state once we discover that it is in MS_RDONLY state

Test case: http://patchwork.ozlabs.org/patch/244297

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: remove i_mutex from ext4_file_sync()</title>
<updated>2013-06-04T18:40:39Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2013-06-04T18:40:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=92e6222dfb85db780ebd8caea6a3f9326c375bc0'/>
<id>urn:sha1:92e6222dfb85db780ebd8caea6a3f9326c375bc0</id>
<content type='text'>
After removal of ext4_flush_unwritten_io() call, ext4_file_sync()
doesn't need i_mutex anymore. Forcing of transaction commits doesn't
need i_mutex as there's nothing inode specific in that code apart from
grabbing transaction ids from the inode. So remove the lock.

Reviewed-by: Zheng Liu &lt;wenqing.lz@taobao.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: use generic_file_fsync() in ext4_file_fsync() in nojournal mode</title>
<updated>2013-06-04T18:40:09Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2013-06-04T18:40:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=37b10dd06334ebc89f551d405a0fe27e1a622458'/>
<id>urn:sha1:37b10dd06334ebc89f551d405a0fe27e1a622458</id>
<content type='text'>
Just use the generic function instead of duplicating it.  We only need
to reshuffle the read-only check a bit (which is there to prevent
writing to a filesystem which has been remounted read-only after error
I assume).

Reviewed-by: Zheng Liu &lt;wenqing.lz@taobao.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: defer clearing of PageWriteback after extent conversion</title>
<updated>2013-06-04T18:23:41Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2013-06-04T18:23:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0857d309faefaf5443752458e8af1a4b22b3e92'/>
<id>urn:sha1:b0857d309faefaf5443752458e8af1a4b22b3e92</id>
<content type='text'>
Currently PageWriteback bit gets cleared from put_io_page() called
from ext4_end_bio().  This is somewhat inconvenient as extent tree is
not fully updated at that time (unwritten extents are not marked as
written) so we cannot read the data back yet.  This design was
dictated by lock ordering as we cannot start a transaction while
PageWriteback bit is set (we could easily deadlock with
ext4_da_writepages()).  But now that we use transaction reservation
for extent conversion, locking issues are solved and we can move
PageWriteback bit clearing after extent conversion is done.  As a
result we can remove wait for unwritten extent conversion from
ext4_sync_file() because it already implicitely happens through
wait_on_page_writeback().

We implement deferring of PageWriteback clearing by queueing completed
bios to appropriate io_end and processing all the pages when io_end is
going to be freed instead of at the moment ext4_io_end() is called.

Reviewed-by: Zheng Liu &lt;wenqing.lz@taobao.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4/jbd2: don't wait (forever) for stale tid caused by wraparound</title>
<updated>2013-04-04T02:02:52Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2013-04-04T02:02:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d76a3a77113db020d9bb1e894822869410450bd9'/>
<id>urn:sha1:d76a3a77113db020d9bb1e894822869410450bd9</id>
<content type='text'>
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time.  To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started.  This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reported-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Reported-by: George Barnett &lt;gbarnett@atlassian.com&gt;
Cc: stable@vger.kernel.org


</content>
</entry>
<entry>
<title>ext4: fix an incorrect comment about i_mutex</title>
<updated>2012-12-25T18:31:52Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2012-12-25T18:31:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad96f7115593e962dd22a0519021eafaba56f5e3'/>
<id>urn:sha1:ad96f7115593e962dd22a0519021eafaba56f5e3</id>
<content type='text'>
i_mutex is not held when -&gt;sync_file is called.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: use sync_inode_metadata() when syncing inode metadata</title>
<updated>2012-12-10T19:06:03Z</updated>
<author>
<name>Guo Chao</name>
<email>yan@linux.vnet.ibm.com</email>
</author>
<published>2012-12-10T19:06:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=64744e03c6871e5e4678478bab1b8c3ba6cca395'/>
<id>urn:sha1:64744e03c6871e5e4678478bab1b8c3ba6cca395</id>
<content type='text'>
We have a dedicated interface to sync inode metadata.  Use it to
simplify ext4's code some.

Signed-off-by: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
</content>
</entry>
<entry>
<title>ext4: fix ext4_flush_completed_IO wait semantics</title>
<updated>2012-10-05T15:31:55Z</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2012-10-05T15:31:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c278531d39f3158bfee93dc67da0b77e09776de2'/>
<id>urn:sha1:c278531d39f3158bfee93dc67da0b77e09776de2</id>
<content type='text'>
BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished  io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
     1) punch_hole and truncate _must_ wait for all outstanding unwritten io
      regardless to it's state.
     2) fsync and nolock_dio_read should also wait because there is
        a time window between end_page_writeback() and ext4_add_complete_io()
        As result integrity fsync is broken in case of buffered write
        to fallocated region:
        fsync                                      blkdev_completion
	 -&gt;filemap_write_and_wait_range
                                                   -&gt;ext4_end_bio
                                                     -&gt;end_page_writeback
          &lt;-- filemap_write_and_wait_range return
	 -&gt;ext4_flush_completed_IO
   	 sees empty i_completed_io_list but pended
   	 conversion still exist
                                                     -&gt;ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
   wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
   prevent endless wait

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>ext4: completed_io locking cleanup</title>
<updated>2012-09-29T04:14:55Z</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2012-09-29T04:14:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28a535f9a0df060569dcc786e5bc2e1de43d7dc7'/>
<id>urn:sha1:28a535f9a0df060569dcc786e5bc2e1de43d7dc7</id>
<content type='text'>
Current unwritten extent conversion state-machine is very fuzzy.
- For unknown reason it performs conversion under i_mutex. What for?
  My diagnosis:
  We already protect extent tree with i_data_sem, truncate and punch_hole
  should wait for DIO, so the only data we have to protect is end_io-&gt;flags
  modification, but only flush_completed_IO and end_io_work modified this
  flags and we can serialize them via i_completed_io_lock.

  Currently all these games with mutex_trylock result in the following deadlock
   truncate:                          kworker:
    ext4_setattr                       ext4_end_io_work
    mutex_lock(i_mutex)
    inode_dio_wait(inode)  -&gt;BLOCK
                             DEADLOCK&lt;- mutex_trylock()
                                        inode_dio_done()
  #TEST_CASE1_BEGIN
  MNT=/mnt_scrach
  unlink $MNT/file
  fallocate -l $((1024*1024*1024)) $MNT/file
  aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
  sleep 2
  truncate -s 0 $MNT/file
  #TEST_CASE1_END

Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286

This patch makes state machine simple and clean:

(1) xxx_end_io schedule final extent conversion simply by calling
    ext4_add_complete_io(), which append it to ei-&gt;i_completed_io_list
    NOTE1: because of (2A) work should be queued only if
    -&gt;i_completed_io_list was empty, otherwise the work is scheduled already.

(2) ext4_flush_completed_IO is responsible for handling all pending
    end_io from ei-&gt;i_completed_io_list
    Flushing sequence consists of following stages:
    A) LOCKED: Atomically drain completed_io_list to local_list
    B) Perform extents conversion
    C) LOCKED: move converted io's to to_free list for final deletion
       	     This logic depends on context which we was called from.
    D) Final end_io context destruction
    NOTE1: i_mutex is no longer required because end_io-&gt;flags modification
    is protected by ei-&gt;ext4_complete_io_lock

Full list of changes:
- Move all completion end_io related routines to page-io.c in order to improve
  logic locality
- Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
- remove EXT4_IO_END_FSYNC
- Improve SMP scalability by removing useless i_mutex which does not
  protect io-&gt;flags anymore.
- Reduce lock contention on i_completed_io_lock by optimizing list walk.
- Rename ext4_end_io_nolock to end4_end_io and make it static
- Check flush completion status to ext4_ext_punch_hole(). Because it is
  not good idea to punch blocks from corrupted inode.

Changes since V3 (in request to Jan's comments):
  Fall back to active flush_completed_IO() approach in order to prevent
  performance issues with nolocked DIO reads.
Changes since V2:
  Fix use-after-free caused by race truncate vs end_io_work

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: check return value of blkdev_issue_flush()</title>
<updated>2012-08-17T13:58:17Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2012-08-17T13:58:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a4a39040e9dacb5fa28b7ee7f842237ee534e7bf'/>
<id>urn:sha1:a4a39040e9dacb5fa28b7ee7f842237ee534e7bf</id>
<content type='text'>
    
blkdev_issue_flush() can fail; make sure the error gets properly
propagated.
    
This is a port of the equivalent ext3 patch from commit 44f4f729e7a1.
    
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
</feed>
