<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/buffer.c, branch v3.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-09-22T15:41:16Z</updated>
<entry>
<title>Fix nasty 32-bit overflow bug in buffer i/o code.</title>
<updated>2014-09-22T15:41:16Z</updated>
<author>
<name>Anton Altaparmakov</name>
<email>aia21@cam.ac.uk</email>
</author>
<published>2014-09-22T00:53:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f2d5a94436cc7cc0221b9a81bba2276a25187dd3'/>
<id>urn:sha1:f2d5a94436cc7cc0221b9a81bba2276a25187dd3</id>
<content type='text'>
On 32-bit architectures, the legacy buffer_head functions are not always
handling the sector number with the proper 64-bit types, and will thus
fail on 4TB+ disks.

Any code that uses __getblk() (and thus bread(), breadahead(),
sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
in __getblk_slow() with an infinite stream of errors logged to dmesg
like this:

  __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
  b_state=0x00000020, b_size=512
  device sda1 blocksize: 512

Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
top 32-bits are missing (in this case the 0x1 at the top).

This is because grow_dev_page() is broken and has a 32-bit overflow due
to shifting the page index value (a pgoff_t - which is just 32 bits on
32-bit architectures) left-shifted as the block number.  But the top
bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
before the shift.

This patch fixes this issue by type casting "index" to sector_t before
doing the left shift.

Note this is not a theoretical bug but has been seen in the field on a
4TiB hard drive with logical sector size 512 bytes.

This patch has been verified to fix the infinite loop problem on 3.17-rc5
kernel using a 4TB disk image mounted using "-o loop".  Without this patch
doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
100% reproducibly whilst with the patch it works fine as expected.

Signed-off-by: Anton Altaparmakov &lt;aia21@cantab.net&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Remove proliferation of wait_on_bit() action functions</title>
<updated>2014-07-16T13:10:39Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-07-07T05:16:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=743162013d40ca612b4cb53d3a200dff2d9ab26e'/>
<id>urn:sha1:743162013d40ca612b4cb53d3a200dff2d9ab26e</id>
<content type='text'>
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt; (fscache, keys)
Acked-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt; (gfs2)
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Steve French &lt;sfrench@samba.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: non-atomically mark page accessed during page cache allocation where possible</title>
<updated>2014-06-04T23:54:10Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-06-04T23:10:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2457aec63745e235bcafb7ef312b182d8682f0fc'/>
<id>urn:sha1:2457aec63745e235bcafb7ef312b182d8682f0fc</id>
<content type='text'>
aops-&gt;write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after.  Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage.  The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page.  This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

	find_get_page
	find_lock_page
	find_or_create_page
	grab_cache_page_nowait
	grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not.  Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job.  There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted.  This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change.  It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations.  The size of the
file is 1/10th physical memory to avoid dirty page balancing.  In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO.  The sync results are expected to be
more stable.  The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts.  Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison.  As async results were variable
do to writback timings, I'm only reporting the maximum figures.  The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
                                    3.15.0-rc3            3.15.0-rc3
                                       vanilla           accessed-v2
ext3    Max      elapsed     13.9900 (  0.00%)     11.5900 ( 17.16%)
tmpfs	Max      elapsed      0.5100 (  0.00%)      0.4900 (  3.92%)
btrfs   Max      elapsed     12.8100 (  0.00%)     12.7800 (  0.23%)
ext4	Max      elapsed     18.6000 (  0.00%)     13.3400 ( 28.28%)
xfs	Max      elapsed     12.5600 (  0.00%)      2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

        samples percentage
ext3       86107    0.9783  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext3       23833    0.2710  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3        5036    0.0573  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4       64566    0.8961  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext4        5322    0.0713  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4        2869    0.0384  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs        62126    1.7675  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
xfs         1904    0.0554  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs          103    0.0030  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs      10655    0.1338  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
btrfs       2020    0.0273  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs        587    0.0079  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs      59562    3.2628  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
tmpfs       1210    0.0696  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs         94    0.0054  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Tested-by: Prabhakar Lad &lt;prabhakar.csengg@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: buffer: do not use unnecessary atomic operations when discarding buffers</title>
<updated>2014-06-04T23:54:10Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-06-04T23:10:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e7470ee89f003634a88e7b5e5a7b65b3025987de'/>
<id>urn:sha1:e7470ee89f003634a88e7b5e5a7b65b3025987de</id>
<content type='text'>
Discarding buffers uses a bunch of atomic operations when discarding
buffers because ......  I can't think of a reason.  Use a cmpxchg loop to
clear all the necessary flags.  In most (all?) cases this will be a single
atomic operations.

[akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file]
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/buffer.c: remove block_write_full_page_endio()</title>
<updated>2014-06-04T23:54:02Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>matthew.r.wilcox@intel.com</email>
</author>
<published>2014-06-04T23:07:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1b938c0827478df268d2336469ec48d400a2eb3e'/>
<id>urn:sha1:1b938c0827478df268d2336469ec48d400a2eb3e</id>
<content type='text'>
The last in-tree caller of block_write_full_page_endio() was removed in
January 2013.  It's time to remove the EXPORT_SYMBOL, which leaves
block_write_full_page() as the only caller of
block_write_full_page_endio(), so inline block_write_full_page_endio()
into block_write_full_page().

Signed-off-by: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dheeraj Reddy &lt;dheeraj.reddy@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>arch: Mass conversion of smp_mb__*()</title>
<updated>2014-04-18T12:20:48Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-03-17T17:06:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4e857c58efeb99393cba5a5d0d8ec7117183137c'/>
<id>urn:sha1:4e857c58efeb99393cba5a5d0d8ec7117183137c</id>
<content type='text'>
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2014-04-12T21:49:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-12T21:49:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5166701b368caea89d57b14bf41cf39e819dad51'/>
<id>urn:sha1:5166701b368caea89d57b14bf41cf39e819dad51</id>
<content type='text'>
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe-&gt;buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
</content>
</entry>
<entry>
<title>switch -&gt;is_partially_uptodate() to saner arguments</title>
<updated>2014-04-02T03:19:19Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-02-03T02:16:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c186afb4dbd0050a537b96c7fbee2dba3b57fc38'/>
<id>urn:sha1:c186afb4dbd0050a537b96c7fbee2dba3b57fc38</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' into for-next</title>
<updated>2014-02-20T13:54:28Z</updated>
<author>
<name>Jiri Kosina</name>
<email>jkosina@suse.cz</email>
</author>
<published>2014-02-20T13:54:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d4263348f796f29546f90802177865dd4379dd0a'/>
<id>urn:sha1:d4263348f796f29546f90802177865dd4379dd0a</id>
<content type='text'>
</content>
</entry>
<entry>
<title>treewide: Fix typo in Documentation/DocBook</title>
<updated>2014-02-19T13:58:17Z</updated>
<author>
<name>Masanari Iida</name>
<email>standby24x7@gmail.com</email>
</author>
<published>2014-02-18T13:54:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e227867f12302633737bd2a48a10a9a72c0630cb'/>
<id>urn:sha1:e227867f12302633737bd2a48a10a9a72c0630cb</id>
<content type='text'>
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.

Signed-off-by: Masanari Iida &lt;standby24x7@gmail.com&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
</entry>
</feed>
