<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/ext4/fsync.c, branch v3.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2011-08-01T23:56:03Z</updated>
<entry>
<title>Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4</title>
<updated>2011-08-01T23:56:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-08-01T23:56:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60ad4466821a96913a9b567115e194ed1087c2d7'/>
<id>urn:sha1:60ad4466821a96913a9b567115e194ed1087c2d7</id>
<content type='text'>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
  ext4: prevent memory leaks from ext4_mb_init_backend() on error path
  ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
  ext4: use ext4_msg() instead of printk in mballoc
  ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
  ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
  ext4: use the correct error exit path in ext4_init_inode_table()
  ext4: add missing kfree() on error return path in add_new_gdb()
  ext4: change umode_t in tracepoint headers to be an explicit __u16
  ext4: fix races in ext4_sync_parent()
  ext4: Fix overflow caused by missing cast in ext4_fallocate()
  ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
  ext4: simplify parameters of reserve_backup_gdb()
  ext4: simplify parameters of add_new_gdb()
  ext4: remove lock_buffer in bclean() and setup_new_group_blocks()
  ext4: simplify journal handling in setup_new_group_blocks()
  ext4: let setup_new_group_blocks() set multiple bits at a time
  ext4: fix a typo in ext4_group_extend()
  ext4: let ext4_group_add_blocks() handle 0 blocks quickly
  ext4: let ext4_group_add_blocks() return an error code
  ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
  ...

Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify
the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO()
function for the new simplified calling convention, while commit
dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to
indirect.c") moved the function to another file.
</content>
</entry>
<entry>
<title>ext4: fix races in ext4_sync_parent()</title>
<updated>2011-07-30T16:34:19Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-07-30T16:34:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d59729f4e794f814b25ccd2aebfbe606242c4544'/>
<id>urn:sha1:d59729f4e794f814b25ccd2aebfbe606242c4544</id>
<content type='text'>
Fix problems if fsync() races against a rename of a parent directory
as pointed out by Al Viro in his own inimitable way:

&gt;While we are at it, could somebody please explain what the hell is ext4
&gt;doing in
&gt;static int ext4_sync_parent(struct inode *inode)
&gt;{
&gt;        struct writeback_control wbc;
&gt;        struct dentry *dentry = NULL;
&gt;        int ret = 0;
&gt;
&gt;        while (inode &amp;&amp; ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
&gt;                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
&gt;                dentry = list_entry(inode-&gt;i_dentry.next,
&gt;                                    struct dentry, d_alias);
&gt;                if (!dentry || !dentry-&gt;d_parent || !dentry-&gt;d_parent-&gt;d_inode)
&gt;                        break;
&gt;                inode = dentry-&gt;d_parent-&gt;d_inode;
&gt;                ret = sync_mapping_buffers(inode-&gt;i_mapping);
&gt;                ...
&gt;Note that dentry obviously can't be NULL there.  dentry-&gt;d_parent is never
&gt;NULL.  And dentry-&gt;d_parent would better not be negative, for crying out
&gt;loud!  What's worse, there's no guarantees that dentry-&gt;d_parent will
&gt;remain our parent over that sync_mapping_buffers() *and* that inode won't
&gt;just be freed under us (after rename() and memory pressure leading to
&gt;eviction of what used to be our dentry-&gt;d_parent)......

Reported-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>fs: push i_mutex and filemap_write_and_wait down into -&gt;fsync() handlers</title>
<updated>2011-07-21T00:47:59Z</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2011-07-17T00:44:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02c24a82187d5a628c68edfe71ae60dc135cd178'/>
<id>urn:sha1:02c24a82187d5a628c68edfe71ae60dc135cd178</id>
<content type='text'>
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the -&gt;fsync() handlers.  Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2.  For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>ext4: fix waiting and sending of a barrier in ext4_sync_file()</title>
<updated>2011-05-24T16:00:54Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2011-05-24T16:00:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=93628ffb9ba67c154849ac6c387f98f5e3198b84'/>
<id>urn:sha1:93628ffb9ba67c154849ac6c387f98f5e3198b84</id>
<content type='text'>
jbd2_log_start_commit() returns 1 only when we really start a
transaction.  But we also need to wait for a transaction when the
commit is already running.  Fix this problem by waiting for
transaction commit unconditionally (which is just a quick check if the
transaction is already committed).

Also we have to be more careful with sending of a barrier because when
transaction is being committed in parallel to ext4_sync_file()
running, we cannot be sure that the barrier the journalling code sends
happens after we wrote all the data for fsync (note that not every
data writeout needs to trigger metadata changes thus commit of some
metadata changes can be running while other data is still written
out). So use jbd2_will_send_data_barrier() helper to detect the common
cases when we can be sure barrier will be issued by the commit code
and issue the barrier ourselves in the remaining cases.

Reported-by: Edward Goggin &lt;egoggin@vmware.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: use EXT4FS_DEBUG instead of EXT4_DEBUG in fsync.c</title>
<updated>2011-05-09T14:25:54Z</updated>
<author>
<name>Tao Ma</name>
<email>boyu.mt@taobao.com</email>
</author>
<published>2011-05-09T14:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e8bbe8c401c61408ea226b33b824f231c8f9ccae'/>
<id>urn:sha1:e8bbe8c401c61408ea226b33b824f231c8f9ccae</id>
<content type='text'>
We have EXT4FS_DEBUG for some old debug and CONFIG_EXT4_DEBUG
for the new mballoc debug, but there isn't any EXT4_DEBUG.

As CONFIG_EXT4_DEBUG seems to be only used in mballoc, use
EXT4FS_DEBUG in fsync.c.

[ It doesn't really matter; although I'm including this commit for
  consistency's sake.  The whole point of the #ifdef's is to disable
  the debugging code.  In general you're not going to want to enable
  all of the code protected by EXT4FS_DEBUG at the same time.  -- Ted ]

Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4</title>
<updated>2011-04-11T22:45:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-04-11T22:45:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a97b52022a73ec12e43f0b2c7d4bd1f40f89c81d'/>
<id>urn:sha1:a97b52022a73ec12e43f0b2c7d4bd1f40f89c81d</id>
<content type='text'>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix data corruption regression by reverting commit 6de9843dab3f
  ext4: Allow indirect-block file to grow the file size to max file size
  ext4: allow an active handle to be started when freezing
  ext4: sync the directory inode in ext4_sync_parent()
  ext4: init timer earlier to avoid a kernel panic in __save_error_info
  jbd2: fix potential memory leak on transaction commit
  ext4: fix a double free in ext4_register_li_request
  ext4: fix credits computing for indirect mapped files
  ext4: remove unnecessary [cm]time update of quota file
  jbd2: move bdget out of critical section
</content>
</entry>
<entry>
<title>ext4: sync the directory inode in ext4_sync_parent()</title>
<updated>2011-04-11T02:05:31Z</updated>
<author>
<name>Curt Wohlgemuth</name>
<email>curtw@google.com</email>
</author>
<published>2011-04-11T02:05:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0893ed458b4b1d7c7667ca7ffb8b11febe7e7e6c'/>
<id>urn:sha1:0893ed458b4b1d7c7667ca7ffb8b11febe7e7e6c</id>
<content type='text'>
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*.  This patch fixes this.

Also now return error status from ext4_sync_parent().

I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client.  Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on &gt; 6 runs, I see zero files being lost.

Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth &lt;curtw@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>Fix common misspellings</title>
<updated>2011-03-31T14:26:23Z</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@profusion.mobi</email>
</author>
<published>2011-03-31T01:57:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=25985edcedea6396277003854657b5f3cb31a628'/>
<id>urn:sha1:25985edcedea6396277003854657b5f3cb31a628</id>
<content type='text'>
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@profusion.mobi&gt;
</content>
</entry>
<entry>
<title>ext4: add more tracepoints and use dev_t in the trace buffer</title>
<updated>2011-03-22T01:38:05Z</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-03-22T01:38:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0562e0bad483d10e9651fbb8f21dc3d0bad57374'/>
<id>urn:sha1:0562e0bad483d10e9651fbb8f21dc3d0bad57374</id>
<content type='text'>
- Add more ext4 tracepoints.
- Change ext4 tracepoints to use dev_t field with MAJOR/MINOR macros
so that we can save 4 bytes in the ring buffer on some platforms.
- Add sync_mode to ext4_da_writepages, ext4_da_write_pages, and
ext4_da_writepages_result tracepoints. Also remove for_reclaim
field from ext4_da_writepages since it is usually not very useful.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>ext4: flush the i_completed_io_list during ext4_truncate</title>
<updated>2011-01-10T17:47:05Z</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-01-10T17:47:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3889fd57ea3c58209354862523275774fca9db03'/>
<id>urn:sha1:3889fd57ea3c58209354862523275774fca9db03</id>
<content type='text'>
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls 
ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
been truncated, so the inode size becomes inconsistent with the allocated
blocks. 

The following patch flushes the i_completed_io_list during truncate to reduce 
the risk that some pending end_io requests are executed later and convert 
already truncated blocks to initialized. 

Note that although the fix helps reduce the problem a lot there may still 
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain 
the newly allocated blocks and update i_size both at the same time --- when 
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and 
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</content>
</entry>
</feed>
