<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md/raid1.c, branch v4.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-10-24T05:24:22Z</updated>
<entry>
<title>md/raid1: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-10-24T05:24:22Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:02:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bd8688a199b864944bf62eebed0ca13b46249453'/>
<id>urn:sha1:bd8688a199b864944bf62eebed0ca13b46249453</id>
<content type='text'>
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R1BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-and-tested-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Cc: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Fixes: cd5ff9a16f08 ("md/raid1:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md/raid1: submit_bio_wait() returns 0 on success</title>
<updated>2015-10-20T20:20:15Z</updated>
<author>
<name>Jes Sorensen</name>
<email>Jes.Sorensen@redhat.com</email>
</author>
<published>2015-10-20T16:09:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=203d27b0226a05202438ddb39ef0ef1acb14a759'/>
<id>urn:sha1:203d27b0226a05202438ddb39ef0ef1acb14a759</id>
<content type='text'>
This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.

Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md")
Cc: stable@vger.kernel.org (v3.10)
Reported-by: Bill Kuzeja &lt;William.Kuzeja@stratus.com&gt;
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>crash in md-raid1 and md-raid10 due to incorrect list manipulation</title>
<updated>2015-10-08T21:33:46Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2015-10-01T19:17:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a452744bcbf706eac65abb4c98496a366820c60a'/>
<id>urn:sha1:a452744bcbf706eac65abb4c98496a366820c60a</id>
<content type='text'>
The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure
device failure recorded before write request returns) is causing crash in
the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100%
reproducible.

The reason for the crash is that the newly added code in raid1d moves the
list from conf-&gt;bio_end_io_list to tmp, then tests if tmp is non-empty and
then incorrectly pops the bio from conf-&gt;bio_end_io_list (which is empty
because the list was alrady moved).

Raid-10 has a similar bug.

Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000)
CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35
task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111000001111 Not tainted
r00-03  000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38
r04-07  000000001059d000 000000007f16be08 0000000000200200 0000000000000001
r08-11  000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000
r12-15  000000004056f320 0000000000000000 0000000000013dd0 0000000000000000
r16-19  00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001
r20-23  000000000800000f 0000000042200390 0000000000000000 0000000000000000
r24-27  0000000000000001 000000000800000f 000000007f16be08 000000001059d000
r28-31  0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000
sr00-03  0000000000249800 0000000000000000 0000000000000000 0000000000249800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620
 IIR: 0f8010c6    ISR: 0000000000000000  IOR: 0000000100000000
 CPU:        3   CR30: 000000006ccb8000 CR31: 0000000000000000
 ORIG_R28: 000000001059d000
 IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1]
 IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1]
 RP(r2): raid_end_bio_io+0x88/0x168 [raid1]
Backtrace:
 [&lt;000000001059f818&gt;] raid_end_bio_io+0x88/0x168 [raid1]
 [&lt;00000000105a4f64&gt;] raid1d+0x144/0x1640 [raid1]
 [&lt;000000004017fd5c&gt;] kthread+0x144/0x160

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md/raid1: Avoid raid1 resync getting stuck</title>
<updated>2015-10-02T07:23:44Z</updated>
<author>
<name>Jes Sorensen</name>
<email>Jes.Sorensen@redhat.com</email>
</author>
<published>2015-09-16T14:20:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e8ff8bf09ff49733534ff3cee91bde030186055f'/>
<id>urn:sha1:e8ff8bf09ff49733534ff3cee91bde030186055f</id>
<content type='text'>
close_sync() needs to set conf-&gt;next_resync to a large, but safe value
below MaxSector and use it to determine whether or not to set
start_next_window in wait_barrier()

Solution suggested by Neil Brown.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Tested-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md: drop null test before destroy functions</title>
<updated>2015-10-02T07:23:44Z</updated>
<author>
<name>Julia Lawall</name>
<email>Julia.Lawall@lip6.fr</email>
</author>
<published>2015-09-13T12:15:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=644df1a85fc4b0c7a16800f55717261546f4e651'/>
<id>urn:sha1:644df1a85fc4b0c7a16800f55717261546f4e651</id>
<content type='text'>
Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// &lt;smpl&gt;
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// &lt;/smpl&gt;

Signed-off-by: Julia Lawall &lt;Julia.Lawall@lip6.fr&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge linux-block/for-4.3/core into md/for-linux</title>
<updated>2015-09-05T09:08:32Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-09-05T09:07:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e89c6fdf9e0eb1b5a03574d4ca73e83eae8deb91'/>
<id>urn:sha1:e89c6fdf9e0eb1b5a03574d4ca73e83eae8deb91</id>
<content type='text'>
There were a few conflicts that are fairly easy to resolve.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block</title>
<updated>2015-09-02T20:10:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-02T20:10:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1081230b748de8f03f37f80c53dfa89feda9b8de'/>
<id>urn:sha1:1081230b748de8f03f37f80c53dfa89feda9b8de</id>
<content type='text'>
Pull core block updates from Jens Axboe:
 "This first core part of the block IO changes contains:

   - Cleanup of the bio IO error signaling from Christoph.  We used to
     rely on the uptodate bit and passing around of an error, now we
     store the error in the bio itself.

   - Improvement of the above from myself, by shrinking the bio size
     down again to fit in two cachelines on x86-64.

   - Revert of the max_hw_sectors cap removal from a revision again,
     from Jeff Moyer.  This caused performance regressions in various
     tests.  Reinstate the limit, bump it to a more reasonable size
     instead.

   - Make /sys/block/&lt;dev&gt;/queue/discard_max_bytes writeable, by me.
     Most devices have huge trim limits, which can cause nasty latencies
     when deleting files.  Enable the admin to configure the size down.
     We will look into having a more sane default instead of UINT_MAX
     sectors.

   - Improvement of the SGP gaps logic from Keith Busch.

   - Enable the block core to handle arbitrarily sized bios, which
     enables a nice simplification of bio_add_page() (which is an IO hot
     path).  From Kent.

   - Improvements to the partition io stats accounting, making it
     faster.  From Ming Lei.

   - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
     file in blk-mq, as well as a fix for a blk-mq timeout race
     condition.

   - Ming Lin has been carrying Kents above mentioned patches forward
     for a while, and testing them.  Ming also did a few fixes around
     that.

   - Sasha Levin found and fixed a use-after-free problem introduced by
     the bio-&gt;bi_error changes from Christoph.

   - Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
  blk: Fix bio_io_vec index when checking bvec gaps
  block: Replace SG_GAPS with new queue limits mask
  block: bump BLK_DEF_MAX_SECTORS to 2560
  Revert "block: remove artifical max_hw_sectors cap"
  blk-mq: fix race between timeout and freeing request
  blk-mq: fix buffer overflow when reading sysfs file of 'pending'
  Documentation: update notes in biovecs about arbitrarily sized bios
  block: remove bio_get_nr_vecs()
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: kill merge_bvec_fn() completely
  md/raid5: get rid of bio_fits_rdev()
  md/raid5: split bio for chunk_aligned_read
  block: remove split code in blkdev_issue_{discard,write_same}
  btrfs: remove bio splitting and merge_bvec_fn() calls
  bcache: remove driver private bio splitting code
  block: simplify bio_add_page()
  block: make generic_make_request handle arbitrarily sized bios
  blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
  block: don't access bio-&gt;bi_error after bio_put()
  block: shrink struct bio down to 2 cache lines again
  ...
</content>
</entry>
<entry>
<title>md/raid1: ensure device failure recorded before write request returns.</title>
<updated>2015-08-31T17:43:23Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-08-14T01:11:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=55ce74d4bfe1b9444436264c637f39a152d1e5ac'/>
<id>urn:sha1:55ce74d4bfe1b9444436264c637f39a152d1e5ac</id>
<content type='text'>
When a write to one of the legs of a RAID1 fails, the failure is
recorded in the metadata of the other leg(s) so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again  (maybe a cable was unplugged).

Similarly when we record a bad-block in response to a write failure,
we must not let the write complete until the bad-block update is safe.

Currently there is no interlock between the write request completing
and the metadata update.  So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.

This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.

So:
 - set MD_CHANGE_PENDING when requesting a metadata update for a
   failed device, so we can know with certainty when it completes
 - queue requests that experienced an error on a new queue which
   is only processed after the metadata update completes
 - call raid_end_bio_io() on bios in that queue when the time comes.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md: close some races between setting and checking sync_action.</title>
<updated>2015-08-31T17:30:40Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-07-06T02:26:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=985ca973b68cac0adfa83497db231da7f99c6ed9'/>
<id>urn:sha1:985ca973b68cac0adfa83497db231da7f99c6ed9</id>
<content type='text'>
When checking sync_action in a script, we want to be sure it is
as accurate as possible.
As resync/reshape etc doesn't always start immediately (a separate
thread is scheduled to do it), it is best if 'action_show'
checks if MD_RECOVER_NEEDED is set (which it does) and in that
case reports what is likely to start soon (which it only sometimes
does).

So:
 - report 'reshape' if reshape_position suggests one might start.
 - set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very
   likely to happen next.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>block: kill merge_bvec_fn() completely</title>
<updated>2015-08-13T18:31:57Z</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@gmail.com</email>
</author>
<published>2015-04-28T06:48:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ae126660fddbeebb9251a174e6fa45b6ad8f932'/>
<id>urn:sha1:8ae126660fddbeebb9251a174e6fa45b6ad8f932</id>
<content type='text'>
As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own -&gt;merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Lars Ellenberg &lt;drbd-dev@lists.linbit.com&gt;
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Yehuda Sadeh &lt;yehuda@inktank.com&gt;
Cc: Sage Weil &lt;sage@inktank.com&gt;
Cc: Alex Elder &lt;elder@kernel.org&gt;
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: dm-devel@redhat.com
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Acked-by: NeilBrown &lt;neilb@suse.de&gt; (for the 'md' bits)
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
[dpark: also remove -&gt;merge_bvec_fn() in dm-thin as well as
 dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park &lt;dpark@posteo.net&gt;
Signed-off-by: Ming Lin &lt;ming.l@ssi.samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
