<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/block_dev.c, branch v3.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2011-09-10T07:20:21Z</updated>
<entry>
<title>Avoid dereferencing a 'request_queue' after last close.</title>
<updated>2011-09-10T07:20:21Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2011-09-10T07:20:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=94007751bb02797ba87bac7aacee2731ac2039a3'/>
<id>urn:sha1:94007751bb02797ba87bac7aacee2731ac2039a3</id>
<content type='text'>
On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed.  The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* -&gt;release has been
called.

Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock.  This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
-&gt;release.

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Acked-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: stable@kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>fix block device fallout from -&gt;fsync() changes</title>
<updated>2011-08-02T01:33:47Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rjw@sisk.pl</email>
</author>
<published>2011-08-02T00:17:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=da5aa861bea09197e6ae4d7c46618616064891e4'/>
<id>urn:sha1:da5aa861bea09197e6ae4d7c46618616064891e4</id>
<content type='text'>
blkdev_fsync() needs to write pages in pagecache...

Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>block: initialise bd_super in bdget()</title>
<updated>2011-08-01T05:57:44Z</updated>
<author>
<name>Lachlan McIlroy</name>
<email>lmcilroy@redhat.com</email>
</author>
<published>2011-06-30T01:01:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=782b94cdf577b4df1feb376f372dccc28e66a771'/>
<id>urn:sha1:782b94cdf577b4df1feb376f372dccc28e66a771</id>
<content type='text'>
bd_super is currently reset to NULL in kill_block_super() so we rely on previous
users of the block_device object to initialise this value for the next user.
This quirk was exposed on RHEL5 when a third party filesystem did not always use
kill_block_super() and therefore bd_super wasn't being reset when a block_device
object was recycled within the cache.  This may not be a problem upstream but
makes sense to be defensive.

Signed-off-by: Lachlan McIlroy &lt;lmcilroy@redhat.com&gt;
Reviewed-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback</title>
<updated>2011-07-26T17:39:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-07-26T17:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f01ef569cddb1a8627b1c6b3a134998ad1cf4b22'/>
<id>urn:sha1:f01ef569cddb1a8627b1c6b3a134998ad1cf4b22</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits)
  mm: properly reflect task dirty limits in dirty_exceeded logic
  writeback: don't busy retry writeback on new/freeing inodes
  writeback: scale IO chunk size up to half device bandwidth
  writeback: trace global_dirty_state
  writeback: introduce max-pause and pass-good dirty limits
  writeback: introduce smoothed global dirty limit
  writeback: consolidate variable names in balance_dirty_pages()
  writeback: show bdi write bandwidth in debugfs
  writeback: bdi write bandwidth estimation
  writeback: account per-bdi accumulated written pages
  writeback: make writeback_control.nr_to_write straight
  writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr()
  writeback: trace event writeback_queue_io
  writeback: trace event writeback_single_inode
  writeback: remove .nonblocking and .encountered_congestion
  writeback: remove writeback_control.more_io
  writeback: skip balance_dirty_pages() for in-memory fs
  writeback: add bdi_dirty_limit() kernel-doc
  writeback: avoid extra sync work at enqueue time
  writeback: elevate queue_io() into wb_writeback()
  ...

Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c
</content>
</entry>
<entry>
<title>Merge branch 'for-3.1/core' of git://git.kernel.dk/linux-block</title>
<updated>2011-07-25T17:33:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-07-25T17:33:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=096a705bbc080a4041636d07514560da8d78acbe'/>
<id>urn:sha1:096a705bbc080a4041636d07514560da8d78acbe</id>
<content type='text'>
* 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits)
  block: strict rq_affinity
  backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu
  block: fix patch import error in max_discard_sectors check
  block: reorder request_queue to remove 64 bit alignment padding
  CFQ: add think time check for group
  CFQ: add think time check for service tree
  CFQ: move think time check variables to a separate struct
  fixlet: Remove fs_excl from struct task.
  cfq: Remove special treatment for metadata rqs.
  block: document blk_plug list access
  block: avoid building too big plug list
  compat_ioctl: fix make headers_check regression
  block: eliminate potential for infinite loop in blkdev_issue_discard
  compat_ioctl: fix warning caused by qemu
  block: flush MEDIA_CHANGE from drivers on close(2)
  blk-throttle: Make total_nr_queued unsigned
  block: Add __attribute__((format(printf...) and fix fallout
  fs/partitions/check.c: make local symbols static
  block:remove some spare spaces in genhd.c
  block:fix the comment error in blkdev.h
  ...
</content>
</entry>
<entry>
<title>fs: push i_mutex and filemap_write_and_wait down into -&gt;fsync() handlers</title>
<updated>2011-07-21T00:47:59Z</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2011-07-17T00:44:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02c24a82187d5a628c68edfe71ae60dc135cd178'/>
<id>urn:sha1:02c24a82187d5a628c68edfe71ae60dc135cd178</id>
<content type='text'>
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the -&gt;fsync() handlers.  Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2.  For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek</title>
<updated>2011-07-21T00:47:58Z</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2011-07-18T17:21:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=06222e491e663dac939f04b125c9dc52126a75c4'/>
<id>urn:sha1:06222e491e663dac939f04b125c9dc52126a75c4</id>
<content type='text'>
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly.  In some cases
we just return -EINVAL, in others we do the normal generic thing, and in others
we're simply making sure that the properly due-dilligence is done.  For example
in NFS/CIFS we need to make sure the file size is update properly for the
SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself
that is all we have to do.  Thanks,

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>block: flush MEDIA_CHANGE from drivers on close(2)</title>
<updated>2011-07-01T14:17:47Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-07-01T14:17:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=85ef06d1d252f6a2e73b678591ab71caad4667bb'/>
<id>urn:sha1:85ef06d1d252f6a2e73b678591ab71caad4667bb</id>
<content type='text'>
Currently, only open(2) is defined as the 'clearing' point.  It has
two roles - first, it's an acknowledgement from userland indicating
that the event has been received and kernel can clear pending states
and proceed to generate more events.  Secondly, it's passed on to
device drivers as a hint indicating that a synchronization point has
been reached and it might want to take a deeper look at the device.

The latter currently is only used by sr which uses two different
mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
to discover events, where the former is lighter weight and safe to be
used repeatedly but may not provide full coverage.  Among other
things, GET_EVENT can't detect media removal while TUR can.

This patch makes close(2) - blkdev_put() - indicate clearing hint for
MEDIA_CHANGE to drivers.  disk_check_events() is renamed to
disk_flush_events() and updated to take @mask for events to flush
which is or'd to ev-&gt;clearing and will be passed to the driver on the
next -&gt;check_events() invocation.

This change makes sr generate MEDIA_CHANGE when media is ejected from
userland - e.g. with eject(1).

Note: Given the current usage, it seems @clearing hint is needlessly
complex.  disk_clear_events() can simply clear all events and the hint
can be boolean @flush.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: use the passed in @bdev when claiming if partno is zero</title>
<updated>2011-06-13T10:45:48Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-06-13T10:45:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d4c208b86b8be4254eba0e74071496e599f94639'/>
<id>urn:sha1:d4c208b86b8be4254eba0e74071496e599f94639</id>
<content type='text'>
6b4517a791 (block: implement bd_claiming and claiming block)
introduced claiming block to support O_EXCL blkdev opens properly.

bd_start_claiming() looks up the part 0 bdev and starts claiming
block.  The function assumed that there is only one part 0 bdev and
always used bdget_disk(disk, 0) to look it up; unfortunately, this
isn't true for some drivers (floppy) which use multiple block devices
to denote different operating parameters for the same physical device.
There can be multiple part 0 bdev's for the same device number.

This incorrect assumption caused the wrong bdev to be used during
claiming leading to unbalanced bd_holders as reported in the following
bug.

  https://bugzilla.kernel.org/show_bug.cgi?id=28522

This patch updates bd_start_claiming() such that it uses the bdev
specified as argument if its partno is zero.

Note that this means that different bdev's can be used for the same
device and O_EXCL check can be effectively bypassed.  It has always
been broken that way and floppy is fortunately on its way out.  Leave
that breakage alone.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Alex Villacis Lasso &lt;avillaci@ceibo.fiec.espol.edu.ec&gt;
Tested-by: Alex Villacis Lasso &lt;avillaci@ceibo.fiec.espol.edu.ec&gt;
Cc: stable@kernel.org	# &gt;= v2.6.36
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>writeback: split inode_wb_list_lock into bdi_writeback.list_lock</title>
<updated>2011-06-08T00:25:21Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-04-22T00:19:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f758eeabeb96f878c860e8f110f94ec8820822a9'/>
<id>urn:sha1:f758eeabeb96f878c860e8f110f94ec8820822a9</id>
<content type='text'>
Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
as it's currently the most contended lock in the system for metadata
heavy workloads.  It won't help for single-filesystem workloads for
which we'll need the I/O-less balance_dirty_pages, but at least we
can dedicate a cpu to spinning on each bdi now for larger systems.

Based on earlier patches from Nick Piggin and Dave Chinner.

It reduces lock contentions to 1/4 in this test case:
10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                      ------------------
                      inode_wb_list_lock              2          [&lt;ffffffff81165da5&gt;] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             34          [&lt;ffffffff8115bd0b&gt;] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock          12893          [&lt;ffffffff8115bb53&gt;] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock          10702          [&lt;ffffffff8115afef&gt;] writeback_single_inode+0x16d/0x20a
                      ------------------
                      inode_wb_list_lock              2          [&lt;ffffffff81165da5&gt;] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             19          [&lt;ffffffff8115bd0b&gt;] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock           5550          [&lt;ffffffff8115bb53&gt;] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock           8511          [&lt;ffffffff8115b4ad&gt;] writeback_sb_inodes+0x10f/0x157

2.6.39-rc3 + patch:
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                ------------------------
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock             10          [&lt;ffffffff8115b189&gt;] inode_wb_list_del+0x5f/0x86
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           1493          [&lt;ffffffff8115b1ed&gt;] writeback_inodes_wb+0x3d/0x150
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           3652          [&lt;ffffffff8115a8e9&gt;] writeback_sb_inodes+0x123/0x16f
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           1412          [&lt;ffffffff8115a38e&gt;] writeback_single_inode+0x17f/0x223
                ------------------------
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock              3          [&lt;ffffffff8110b5af&gt;] bdi_lock_two+0x46/0x4b
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock              6          [&lt;ffffffff8115b189&gt;] inode_wb_list_del+0x5f/0x86
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           2061          [&lt;ffffffff8115af97&gt;] __mark_inode_dirty+0x173/0x1cf
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           2629          [&lt;ffffffff8115a8e9&gt;] writeback_sb_inodes+0x123/0x16f

hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
</feed>
