<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md/linear.c, branch v4.11</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.11</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.11'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2017-02-24T22:42:19Z</updated>
<entry>
<title>Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2017-02-24T22:42:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-02-24T22:42:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a682e0035494c449e53a57d039f86f75b9e2fe67'/>
<id>urn:sha1:a682e0035494c449e53a57d039f86f75b9e2fe67</id>
<content type='text'>
Pull md updates from Shaohua Li:
 "Mainly fixes bugs and improves performance:

   - Improve scalability for raid1 from Coly

   - Improve raid5-cache read performance, disk efficiency and IO
     pattern from Song and me

   - Fix a race condition of disk hotplug for linear from Coly

   - A few cleanup patches from Ming and Byungchul

   - Fix a memory leak from Neil

   - Fix WRITE SAME IO failure from me

   - Add doc for raid5-cache from me"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (23 commits)
  md/raid1: fix write behind issues introduced by bio_clone_bioset_partial
  md/raid1: handle flush request correctly
  md/linear: shutup lockdep warnning
  md/raid1: fix a use-after-free bug
  RAID1: avoid unnecessary spin locks in I/O barrier code
  RAID1: a new I/O barrier implementation to remove resync window
  md/raid5: Don't reinvent the wheel but use existing llist API
  md: fast clone bio in bio_clone_mddev()
  md: remove unnecessary check on mddev
  md/raid1: use bio_clone_bioset_partial() in case of write behind
  md: fail if mddev-&gt;bio_set can't be created
  block: introduce bio_clone_bioset_partial()
  md: disable WRITE SAME if it fails in underlayer disks
  md/raid5-cache: exclude reclaiming stripes in reclaim check
  md/raid5-cache: stripe reclaim only counts valid stripes
  MD: add doc for raid5-cache
  Documentation: move MD related doc into a separate dir
  md: ensure md devices are freed before module is unloaded.
  md/r5cache: improve journal device efficiency
  md/r5cache: enable chunk_aligned_read with write back cache
  ...
</content>
</entry>
<entry>
<title>md/linear: shutup lockdep warnning</title>
<updated>2017-02-23T19:59:42Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-02-21T19:57:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d939cdfde34f50b95254b375f498447c82190b3e'/>
<id>urn:sha1:d939cdfde34f50b95254b375f498447c82190b3e</id>
<content type='text'>
Commit 03a9e24(md linear: fix a race between linear_add() and
linear_congested()) introduces the warnning.

Acked-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md: disable WRITE SAME if it fails in underlayer disks</title>
<updated>2017-02-14T03:24:16Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-02-14T00:21:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=26483819f89c5cf9d27620d70c95afeeeb9bece5'/>
<id>urn:sha1:26483819f89c5cf9d27620d70c95afeeeb9bece5</id>
<content type='text'>
This makes md do the same thing as dm for write same IO failure. Please
see 7eee4ae(dm: disable WRITE SAME if it fails) for details why we need
this.

We did a little bit different than dm. Instead of disabling writesame in
the first IO error, we disable it till next writesame IO coming after
the first IO error. This way we don't need to clone a bio.

Also reported here: https://bugzilla.kernel.org/show_bug.cgi?id=118581

Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Acked-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md linear: fix a race between linear_add() and linear_congested()</title>
<updated>2017-02-13T17:17:50Z</updated>
<author>
<name>colyli@suse.de</name>
<email>colyli@suse.de</email>
</author>
<published>2017-01-28T13:11:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=03a9e24ef2aaa5f1f9837356aed79c860521407a'/>
<id>urn:sha1:03a9e24ef2aaa5f1f9837356aed79c860521407a</id>
<content type='text'>
Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev-&gt;raid_disks
contains value N, but conf-&gt;disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.

There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend().  After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.

Here I explain how the race still exists in current code.  For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.

Now I use a possible code execution time sequence to demo how the possible
race happens,

seq    linear_add()                linear_congested()
 0                                 conf=mddev-&gt;private
 1   oldconf=mddev-&gt;private
 2   mddev-&gt;raid_disks++
 3                              for (i=0; i&lt;mddev-&gt;raid_disks;i++)
 4                                bdev_get_queue(conf-&gt;disks[i].rdev-&gt;bdev)
 5   mddev-&gt;private=newconf

In linear_add() mddev-&gt;raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf-&gt;disks[i] by
the increased mddev-&gt;raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf-&gt;disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.

To fix this race, there are 2 parts of modification in the patch,
 1) Add 'int raid_disks' in struct linear_conf, as a copy of
    mddev-&gt;raid_disks. It is initialized in linear_conf(), always being
    consistent with pointers number of 'struct dev_info disks[]'. When
    iterating conf-&gt;disks[] in linear_congested(), use conf-&gt;raid_disks to
    replace mddev-&gt;raid_disks in the for-loop, then NULL pointer deference
    will not happen again.
 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
    free oldconf memory. Because oldconf may be referenced as mddev-&gt;private
    in linear_congested(), kfree_rcu() makes sure that its memory will not
    be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.

This patch can be applied for kernels since v4.0 after commit:
3be260cc18f8 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.

Changelog:
 - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
       replace rcu_call() in linear_add().
 - v2: add RCU stuffs by suggestion from Shaohua and Neil.
 - v1: initial effort.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>block: Use pointer to backing_dev_info from request_queue</title>
<updated>2017-02-02T15:20:48Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2017-02-02T14:56:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc3b17cc8bf21307c7e076e7c778d5db756f7871'/>
<id>urn:sha1:dc3b17cc8bf21307c7e076e7c778d5db756f7871</id>
<content type='text'>
We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>md: add block tracing for bio_remapping</title>
<updated>2016-11-18T17:32:50Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2016-11-18T02:22:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=109e37653033a5fcd3bf8cab0ed6a7ff433f758a'/>
<id>urn:sha1:109e37653033a5fcd3bf8cab0ed6a7ff433f758a</id>
<content type='text'>
The block tracing infrastructure (accessed with blktrace/blkparse)
supports the tracing of mapping bios from one device to another.
This is currently used when a bio in a partition is mapped to the
whole device, when bios are mapped by dm, and for mapping in md/raid5.
Other md personalities do not include this tracing yet, so add it.

When a read-error is detected we redirect the request to a different device.
This could justifiably be seen as a new mapping for the originial bio,
or a secondary mapping for the bio that errors.  This patch uses
the second option.

When md is used under dm-raid, the mappings are not traced as we do
not have access to the block device number of the parent.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md/linear: replace printk() with pr_*()</title>
<updated>2016-11-07T23:08:21Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2016-11-02T03:16:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a2e202afa6aecd0ca0f7863deca4267bf1346e3d'/>
<id>urn:sha1:a2e202afa6aecd0ca0f7863deca4267bf1346e3d</id>
<content type='text'>
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>block: rename bio bi_rw to bi_opf</title>
<updated>2016-08-07T20:41:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2016-08-05T21:35:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1eff9d322a444245c67515edb52bc0eb68374aa8'/>
<id>urn:sha1:1eff9d322a444245c67515edb52bc0eb68374aa8</id>
<content type='text'>
Since commit 63a4cc24867d, bio-&gt;bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH</title>
<updated>2016-06-07T19:41:38Z</updated>
<author>
<name>Mike Christie</name>
<email>mchristi@redhat.com</email>
</author>
<published>2016-06-05T19:32:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28a8f0d317bf225ff15008f5dd66ae16242dd843'/>
<id>urn:sha1:28a8f0d317bf225ff15008f5dd66ae16242dd843</id>
<content type='text'>
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>md: use bio op accessors</title>
<updated>2016-06-07T19:41:38Z</updated>
<author>
<name>Mike Christie</name>
<email>mchristi@redhat.com</email>
</author>
<published>2016-06-05T19:32:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=796a5cf083c2631180ad209c3ebb7d11d776cd72'/>
<id>urn:sha1:796a5cf083c2631180ad209c3ebb7d11d776cd72</id>
<content type='text'>
Separate the op from the rq_flag_bits and have md
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
