<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/block_dev.c, branch v3.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-04-01T22:48:47Z</updated>
<entry>
<title>loop: prevent bdev freeing while device in use</title>
<updated>2013-04-01T22:48:47Z</updated>
<author>
<name>Anatol Pomozov</name>
<email>anatol.pomozov@gmail.com</email>
</author>
<published>2013-04-01T16:47:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c1681bf8a7b1b98edee8b862a42c19c4e53205fd'/>
<id>urn:sha1:c1681bf8a7b1b98edee8b862a42c19c4e53205fd</id>
<content type='text'>
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 &gt; /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device-&gt;lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device-&gt;lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov &lt;anatol.pomozov@gmail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block</title>
<updated>2013-02-28T20:52:24Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-28T20:52:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee89f81252179dcbf6cd65bd48299f5e52292d88'/>
<id>urn:sha1:ee89f81252179dcbf6cd65bd48299f5e52292d88</id>
<content type='text'>
Pull block IO core bits from Jens Axboe:
 "Below are the core block IO bits for 3.9.  It was delayed a few days
  since my workstation kept crashing every 2-8h after pulling it into
  current -git, but turns out it is a bug in the new pstate code (divide
  by zero, will report separately).  In any case, it contains:

   - The big cfq/blkcg update from Tejun and and Vivek.

   - Additional block and writeback tracepoints from Tejun.

   - Improvement of the should sort (based on queues) logic in the plug
     flushing.

   - _io() variants of the wait_for_completion() interface, using
     io_schedule() instead of schedule() to contribute to io wait
     properly.

   - Various little fixes.

  You'll get two trivial merge conflicts, which should be easy enough to
  fix up"

Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42ca: "hlist: drop the node parameter from iterators").

* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
  block: remove redundant check to bd_openers()
  block: use i_size_write() in bd_set_size()
  cfq: fix lock imbalance with failed allocations
  drivers/block/swim3.c: fix null pointer dereference
  block: don't select PERCPU_RWSEM
  block: account iowait time when waiting for completion of IO request
  sched: add wait_for_completion_io[_timeout]
  writeback: add more tracepoints
  block: add block_{touch|dirty}_buffer tracepoint
  buffer: make touch_buffer() an exported function
  block: add @req to bio_{front|back}_merge tracepoints
  block: add missing block_bio_complete() tracepoint
  block: Remove should_sort judgement when flush blk_plug
  block,elevator: use new hashtable implementation
  cfq-iosched: add hierarchical cfq_group statistics
  cfq-iosched: collect stats from dead cfqgs
  cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
  blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
  block: RCU free request_queue
  blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
  ...
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2013-02-27T04:16:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-27T04:16:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d895cb1af15c04c522a25c79cc429076987c089b'/>
<id>urn:sha1:d895cb1af15c04c522a25c79cc429076987c089b</id>
<content type='text'>
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing -&gt;d_name/-&gt;d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has -&gt;d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both -&gt;f_pos and -&gt;f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
</content>
</entry>
<entry>
<title>new helper: file_inode(file)</title>
<updated>2013-02-23T04:31:31Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2013-01-23T22:07:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=496ad9aa8ef448058e36ca7a787c61f2e63f0f54'/>
<id>urn:sha1:496ad9aa8ef448058e36ca7a787c61f2e63f0f54</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>block: remove redundant check to bd_openers()</title>
<updated>2013-02-22T09:42:46Z</updated>
<author>
<name>Guo Chao</name>
<email>yan@linux.vnet.ibm.com</email>
</author>
<published>2013-02-21T23:16:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=de33127d8d3f1d570aad8c2223cd81b206636bc1'/>
<id>urn:sha1:de33127d8d3f1d570aad8c2223cd81b206636bc1</id>
<content type='text'>
bd_openers is stable under bd_mutex, no need to check it twice.

Signed-off-by: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Cc: M. Hindess &lt;hindessm@uk.ibm.com&gt;
Cc: Nikanth Karthikesan &lt;knikanth@suse.de&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use i_size_write() in bd_set_size()</title>
<updated>2013-02-22T09:42:46Z</updated>
<author>
<name>Guo Chao</name>
<email>yan@linux.vnet.ibm.com</email>
</author>
<published>2013-02-21T23:16:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d646a02a9d44d1421f273ae3923d97b47b918176'/>
<id>urn:sha1:d646a02a9d44d1421f273ae3923d97b47b918176</id>
<content type='text'>
blkdev_ioctl(GETBLKSIZE) uses i_size_read() to read size of block device.
If we update block size directly, reader may see intermediate result in
some machines and configurations.  Use i_size_write() instead.

Signed-off-by: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Cc: M. Hindess &lt;hindessm@uk.ibm.com&gt;
Cc: Nikanth Karthikesan &lt;knikanth@suse.de&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs/block_dev.c: page cache wrongly left invalidated after revalidate_disk()</title>
<updated>2013-02-22T01:22:16Z</updated>
<author>
<name>MITSUNARI Shigeo</name>
<email>herumi@nifty.com</email>
</author>
<published>2013-02-22T00:42:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7630b661da330b35dd57b6f5d6d62b386f2dd751'/>
<id>urn:sha1:7630b661da330b35dd57b6f5d6d62b386f2dd751</id>
<content type='text'>
We found that bdev-&gt;bd_invalidated was left set once revalidate_disk()
is called, which results in page cache flush every time that device is
open.

Specifically, we found this problem in MD block device.  Once we resize
a MD device, mdadm --monitor periodically flush all page cache for that
device every 60 or 1000 seconds when it opens the device.

This bug lies since at least 3.2.0 till the latest kernel(3.6.2).  Patch
is attached.

The following steps will reproduce the problem.

1. prepair a block device (eg /dev/sdb).

2. create two partitions:

   sudo parted /dev/sdb
   mklabel gpt
   mkpart primary 0% 50%
   mkpart primary 50% 100%

3. create a md device.

   sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2

4. create file system and mount it

   sudo mkfs.ext3 /dev/md/hoge
   sudo mkdir /mnt/test
   sudo mount /dev/md/hoge /mnt/test

5. try to resize the device

   sudo mdadm -G /dev/md/hoge --size=max

6. create a file to fill file cache.

  sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10

and verify the current status of file by free command.

7. mdadm monitor will open the md device every 1000 seconds and you
   will find all file cache on the device are cleared.

The timing can be reduced by the following steps.

a) kill mdadm and restart it with --delay option

   /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog

or open the md device directly.

   sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1

Signed-off-by: MITSUNARI Shigeo &lt;herumi@nifty.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lseek: the "whence" argument is called "whence"</title>
<updated>2012-12-18T01:15:12Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2012-12-17T23:59:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=965c8e59cfcf845ecde2265a1d1bfee5f011d302'/>
<id>urn:sha1:965c8e59cfcf845ecde2265a1d1bfee5f011d302</id>
<content type='text'>
But the kernel decided to call it "origin" instead.  Fix most of the
sites.

Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: fix O_DIRECT read past end of block device</title>
<updated>2012-12-08T16:28:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-12-08T00:48:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=684c9aaebbb0ea3a9954d605d4908e650659e7db'/>
<id>urn:sha1:684c9aaebbb0ea3a9954d605d4908e650659e7db</id>
<content type='text'>
The direct-IO write path already had the i_size checks in mm/filemap.c,
but it turns out the read path did not, and removing the block size
checks in fs/block_dev.c (commit bbec0270bdd8: "blkdev_max_block: make
private to fs/buffer.c") removed the magic "shrink IO to past the end of
the device" code there.

Fix it by truncating the IO to the size of the block device, like the
write path already does.

NOTE! I suspect the write path would be *much* better off doing it this
way in fs/block_dev.c, rather than hidden deep in mm/filemap.c.  The
mm/filemap.c code is extremely hard to follow, and has various
conditionals on the target being a block device (ie the flag passed in
to 'generic_write_checks()', along with a conditional update of the
inode timestamp etc).

It is also quite possible that we should treat this whole block device
size as a "s_maxbytes" issue, and try to make the logic even more
generic.  However, in the meantime this is the fairly minimal targeted
fix.

Noted by Milan Broz thanks to a regression test for the cryptsetup
reencrypt tool.

Reported-and-tested-by: Milan Broz &lt;mbroz@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>blkdev_max_block: make private to fs/buffer.c</title>
<updated>2012-11-30T01:48:12Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-11-29T20:31:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bbec0270bdd887f96377065ee38b8848b5afa395'/>
<id>urn:sha1:bbec0270bdd887f96377065ee38b8848b5afa395</id>
<content type='text'>
We really don't want to look at the block size for the raw block device
accesses in fs/block-dev.c, because it may be changing from under us.
So get rid of the max_block logic entirely, since the caller should
already have done it anyway.

That leaves the only user of this function in fs/buffer.c, so move the
whole function there and make it static.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
