<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/buffer.c, branch v2.6.25</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.25</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.25'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2008-04-04T21:38:17Z</updated>
<entry>
<title>Be more careful about marking buffers dirty</title>
<updated>2008-04-04T21:38:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-04-04T21:38:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1be62dc190ebaca331038962c873e7967de6cc4b'/>
<id>urn:sha1:1be62dc190ebaca331038962c873e7967de6cc4b</id>
<content type='text'>
Mikulas Patocka noted that the optimization where we check if a buffer
was already dirty (and we avoid re-dirtying it) was not really SMP-safe.

Since the read of the old status was not synchronized with anything, an
aggressive CPU re-ordering of memory accesses might have moved that read
up to before the data was even written to the buffer, and another CPU
that cleaned it again, causing the newly dirty state to never actually
hit the disk.

Admittedly this would probably never trigger in practice, but it's still
wrong.

Mikulas sent a patch that fixed the problem, but I dislike the subtlety
of the whole optimization, so this is an alternate fix that is more
explicit about the particular SMP ordering for the optimization, and
separates out the speculative reads of the buffer state into its own
conditional (and makes the memory barrier only happen if we are likely
to actually hit the optimized case in the first place).

I considered removing the optimization entirely, but Andrew argued for
it's continued existence. I'm a push-over.

Cc: Mikulas Patocka &lt;mikulas@artax.karlin.mff.cuni.cz&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: fix data leak in nobh_write_end()</title>
<updated>2008-03-28T21:45:21Z</updated>
<author>
<name>Dmitri Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2008-03-28T21:15:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b41e74ad1b0bf7bc51765ae74e5dc564afc3e48'/>
<id>urn:sha1:5b41e74ad1b0bf7bc51765ae74e5dc564afc3e48</id>
<content type='text'>
Current nobh_write_end() implementation ignore partial writes(copied &lt; len)
case if page was fully mapped and simply mark page as Uptodate, which is
totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in
previous write_begin call.  It simply contains garbage from pagecache and
result in data leakage.

#TEST_CASE_BEGIN:
~~~~~~~~~~~~~~~~
In fact issue triggered by classical testcase
	open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
	ftruncate(3, 409600)                    = 0
	writev(3, [{"a", 1}, {NULL, 4095}], 2)  = 1
##TESTCASE_SOURCE:
~~~~~~~~~~~~~~~~~
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;sys/uio.h&gt;
#include &lt;sys/mman.h&gt;
#include &lt;errno.h&gt;
int main(int argc, char **argv)
{
	int fd,  ret;
	void* p;
	struct iovec iov[2];
	fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
	ftruncate(fd, 409600);
	iov[0].iov_base="a";
	iov[0].iov_len=1;
	iov[1].iov_base=NULL;
	iov[1].iov_len=4096;
	ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec));
	printf("writev  = %d, err = %d\n", ret, errno);
	return 0;
}
##TESTCASE RESULT:
~~~~~~~~~~~~~~~~~~
[root@ts63 ~]# mount | grep mnt2
/dev/mapper/test on /mnt2 type ext2 (rw,nobh)
[root@ts63 ~]#  /tmp/writev /mnt2/test
writev  = 1, err = 0
[root@ts63 ~]# hexdump -C /mnt2/test

00000000  61 65 62 6f 6f 74 00 00  f0 b9 b4 59 3a 00 00 00  |aeboot.....Y:...|
00000010  20 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00  | .......!.......|
00000020  df df df df df df df df  df df df df df df df df  |................|
00000030  3a 00 00 00 2a 00 00 00  21 00 00 00 00 00 00 00  |:...*...!.......|
00000040  60 c0 8c 00 00 00 00 00  40 4a 8d 00 00 00 00 00  |`.......@J......|
00000050  00 00 00 00 00 00 00 00  41 00 00 00 00 00 00 00  |........A.......|
00000060  74 69 6d 65 20 64 64 20  69 66 3d 2f 64 65 76 2f  |time dd if=/dev/|
00000070  6c 6f 6f 70 30 20 20 6f  66 3d 2f 64 65 76 2f 6e  |loop0  of=/dev/n|
skip..
00000f50  00 00 00 00 00 00 00 00  31 00 00 00 00 00 00 00  |........1.......|
00000f60  6d 6b 66 73 2e 65 78 74  33 20 2f 64 65 76 2f 76  |mkfs.ext3 /dev/v|
00000f70  7a 76 67 2f 74 65 73 74  20 2d 62 34 30 39 36 00  |zvg/test -b4096.|
00000f80  a0 fe 8c 00 00 00 00 00  21 00 00 00 00 00 00 00  |........!.......|
00000f90  23 31 32 30 35 39 35 30  34 30 34 00 3a 00 00 00  |#1205950404.:...|
00000fa0  20 00 8d 00 00 00 00 00  21 00 00 00 00 00 00 00  | .......!.......|
00000fb0  d0 cf 8c 00 00 00 00 00  10 d0 8c 00 00 00 00 00  |................|
00000fc0  00 00 00 00 00 00 00 00  41 00 00 00 00 00 00 00  |........A.......|
00000fd0  6d 6f 75 6e 74 20 2f 64  65 76 2f 76 7a 76 67 2f  |mount /dev/vzvg/|
00000fe0  74 65 73 74 20 20 2f 76  7a 20 2d 6f 20 64 61 74  |test  /vz -o dat|
00000ff0  61 3d 77 72 69 74 65 62  61 63 6b 00 00 00 00 00  |a=writeback.....|
00001000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

As you can see file's page contains garbage from pagecache instead of zeros.
#TEST_CASE_END

Attached patch:
- Add sanity check BUG_ON in order to prevent incorrect usage by caller,
  This is function invariant because page can has buffers and in no zero
  *fadata pointer at the same time.
- Always attach buffers to page is it is partial write case.
- Always switch back to generic_write_end if page has buffers.
  This is reasonable because if page already has buffer then generic_write_begin
  was called previously.

Signed-off-by: Dmitri Monakhov &lt;dmonakhov@openvz.org&gt;
Reviewed-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: fix kernel-doc notation warnings</title>
<updated>2008-03-20T01:53:36Z</updated>
<author>
<name>Randy Dunlap</name>
<email>randy.dunlap@oracle.com</email>
</author>
<published>2008-03-20T00:01:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a6b91919e0881a0d0a4ae5211d5c879a8c7ca92b'/>
<id>urn:sha1:a6b91919e0881a0d0a4ae5211d5c879a8c7ca92b</id>
<content type='text'>
Fix kernel-doc notation warnings in fs/.

Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
 *	mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
 *	lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
 *	lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
 * lookup_one_len:  filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
 * bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
 * bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
 * writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
 * writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
 * writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
 * void journal_invalidatepage()

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: fix NULL pointer dereference in fsync_buffers_list()</title>
<updated>2008-03-05T00:35:10Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2008-03-04T22:28:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e3892296de632e3f9299d9fabe0c746740004891'/>
<id>urn:sha1:e3892296de632e3f9299d9fabe0c746740004891</id>
<content type='text'>
Fix NULL pointer dereference in fsync_buffers_list() introduced by recent fix
of races in private_list handling.  Since bh-&gt;b_assoc_map has been cleared in
__remove_assoc_queue() we should really use original value stored in the
'mapping' variable.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>docbook: fix filesystems.tmpl source files</title>
<updated>2008-03-03T18:47:13Z</updated>
<author>
<name>Randy Dunlap</name>
<email>randy.dunlap@oracle.com</email>
</author>
<published>2008-03-01T06:02:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=78a4a50a86b0a54f7ecbc164267b6c762760254c'/>
<id>urn:sha1:78a4a50a86b0a54f7ecbc164267b6c762760254c</id>
<content type='text'>
Fix docbook problems in filesystems.tmpl.
These cause the generated docbook to be incorrect.

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>buffer_head: fix private_list handling</title>
<updated>2008-02-08T17:22:42Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2008-02-08T12:21:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=535ee2fbf79ab52d26bce3d2e127c9007503581e'/>
<id>urn:sha1:535ee2fbf79ab52d26bce3d2e127c9007503581e</id>
<content type='text'>
There are two possible races in handling of private_list in buffer cache.

1) When fsync_buffers_list() processes a private_list, it clears
   b_assoc_mapping and moves buffer to its private list.  Now
   drop_buffers() comes, sees a buffer is on list so it calls
   __remove_assoc_queue() which complains about b_assoc_mapping being
   cleared (as it cannot propagate possible IO error).  This race has been
   actually observed in the wild.

2) When fsync_buffers_list() processes a private_list,
   mark_buffer_dirty_inode() can be called on bh which is already on the
   private list of fsync_buffers_list().  As buffer is on some list (note
   that the check is performed without private_lock), it is not readded to
   the mapping's private_list and after fsync_buffers_list() finishes, we
   have a dirty buffer which should be on private_list but it isn't.  This
   race has not been reported, probably because most (but not all) callers
   of mark_buffer_dirty_inode() hold i_mutex and thus are serialized with
   fsync().

Fix these issues by not clearing b_assoc_map when fsync_buffers_list()
moves buffer to a dedicated list and by reinserting buffer in private_list
when it is found dirty after we have submitted buffer for IO.  We also
change the tests whether a buffer is on a private list from
!list_empty(&amp;bh-&gt;b_assoc_buffers) to bh-&gt;b_assoc_map so that they are
single word reads and hence lockless checks are safe.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: remove fastcall, it is always empty</title>
<updated>2008-02-08T17:22:31Z</updated>
<author>
<name>Harvey Harrison</name>
<email>harvey.harrison@gmail.com</email>
</author>
<published>2008-02-08T12:19:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc9b52cd8f5f459b88adcf67c47668425ae31a78'/>
<id>urn:sha1:fc9b52cd8f5f459b88adcf67c47668425ae31a78</id>
<content type='text'>
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison &lt;harvey.harrison@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>rewrite rd</title>
<updated>2008-02-08T17:22:30Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-02-08T12:19:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9db5579be4bb5320c3248f6acf807aedf05ae143'/>
<id>urn:sha1:9db5579be4bb5320c3248f6acf807aedf05ae143</id>
<content type='text'>
This is a rewrite of the ramdisk block device driver.

The old one is really difficult because it effectively implements a block
device which serves data out of its own buffer cache.  It relies on the dirty
bit being set, to pin its backing store in cache, however there are non
trivial paths which can clear the dirty bit (eg.  try_to_free_buffers()),
which had recently lead to data corruption.  And in general it is completely
wrong for a block device driver to do this.

The new one is more like a regular block device driver.  It has no idea about
vm/vfs stuff.  It's backing store is similar to the buffer cache (a simple
radix-tree of pages), but it doesn't know anything about page cache (the pages
in the radix tree are not pagecache pages).

There is one slight downside -- direct block device access and filesystem
metadata access goes through an extra copy and gets stored in RAM twice.
However, this downside is only slight, because the real buffercache of the
device is now reclaimable (because we're not playing crazy games with it), so
under memory intensive situations, footprint should effectively be the same --
maybe even a slight advantage to the new driver because it can also reclaim
buffer heads.

The fact that it now goes through all the regular vm/fs paths makes it
much more useful for testing, too.

   text    data     bss     dec     hex filename
   2837     849     384    4070     fe6 drivers/block/rd.o
   3528     371      12    3911     f47 drivers/block/brd.o

Text is larger, but data and bss are smaller, making total size smaller.

A few other nice things about it:
- Similar structure and layout to the new loop device handlinag.
- Dynamic ramdisk creation.
- Runtime flexible buffer head size (because it is no longer part of the
  ramdisk code).
- Boot / load time flexible ramdisk size, which could easily be extended
  to a per-ramdisk runtime changeable size (eg. with an ioctl).
- Can use highmem for the backing store.

[akpm@linux-foundation.org: fix build]
[byron.bbradley@gmail.com: make rd_size non-static]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Byron Bradley &lt;byron.bbradley@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bufferhead: revert constructor removal</title>
<updated>2008-02-05T17:44:14Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2008-02-05T06:28:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b98938c373117043598002f197200d7ed08acd49'/>
<id>urn:sha1:b98938c373117043598002f197200d7ed08acd49</id>
<content type='text'>
The constructor for buffer_head slabs was removed recently.  We need the
constructor back in slab defrag in order to insure that slab objects always
have a definite state even before we allocated them.

I think we mistakenly merged the removal of the constuctor into a cleanup
patch.  You (ie: akpm) had a test that showed that the removal of the
constructor led to a small regression.  The prior state makes things easier
for slab defrag.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user</title>
<updated>2008-02-05T17:44:13Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2008-02-05T06:28:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eebd2aa355692afaf9906f62118620f1a1c19dbb'/>
<id>urn:sha1:eebd2aa355692afaf9906f62118620f1a1c19dbb</id>
<content type='text'>
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Steven French &lt;sfrench@us.ibm.com&gt;
Cc: Michael Halcrow &lt;mhalcrow@us.ibm.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Anton Altaparmakov &lt;aia21@cantab.net&gt;
Cc: Mark Fasheh &lt;mark.fasheh@oracle.com&gt;
Cc: David Chinner &lt;dgc@sgi.com&gt;
Cc: Michael Halcrow &lt;mhalcrow@us.ibm.com&gt;
Cc: Steven French &lt;sfrench@us.ibm.com&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
