<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md, branch v4.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-11-01T04:20:49Z</updated>
<entry>
<title>Merge tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md</title>
<updated>2015-11-01T04:20:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-01T04:20:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af7eba0158e9b4632dcd94c1cd4100689666e14f'/>
<id>urn:sha1:af7eba0158e9b4632dcd94c1cd4100689666e14f</id>
<content type='text'>
Pull md bug fixes from Neil Brown:
 "Two more bug fixes for md.

  One bugfix for a list corruption in raid5 because of incorrect
  locking.

  Other for possible data corruption when a recovering device is failed,
  removed, and re-added.

  Both tagged for -stable"

* tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md:
  Revert "md: allow a partially recovered device to be hot-added to an array."
  md/raid5: fix locking in handle_stripe_clean_event()
</content>
</entry>
<entry>
<title>Revert "md: allow a partially recovered device to be hot-added to an array."</title>
<updated>2015-10-31T00:00:56Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-31T00:00:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d01552a76d71f9879af448e9142389ee9be6e95b'/>
<id>urn:sha1:d01552a76d71f9879af448e9142389ee9be6e95b</id>
<content type='text'>
This reverts commit 7eb418851f3278de67126ea0c427641ab4792c57.

This commit is poorly justified, I can find not discusison in email,
and it clearly causes a problem.

If a device which is being recovered fails and is subsequently
re-added to an array, there could easily have been changes to the
array *before* the point where the recovery was up to.  So the
recovery must start again from the beginning.

If a spare is being recovered and fails, then when it is re-added we
really should do a bitmap-based recovery up to the recovery-offset,
and then a full recovery from there.  Before this reversion, we only
did the "full recovery from there" which is not corect.  After this
reversion with will do a full recovery from the start, which is safer
but not ideal.

It will be left to a future patch to arrange the two different styles
of recovery.

Reported-and-tested-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Cc: stable@vger.kernel.org (3.14+)
Fixes: 7eb418851f32 ("md: allow a partially recovered device to be hot-added to an array.")
</content>
</entry>
<entry>
<title>md/raid5: fix locking in handle_stripe_clean_event()</title>
<updated>2015-10-30T23:53:50Z</updated>
<author>
<name>Roman Gushchin</name>
<email>klamm@yandex-team.ru</email>
</author>
<published>2015-10-30T23:53:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b8a9d66d043ffac116100775a469f05f5158c16f'/>
<id>urn:sha1:b8a9d66d043ffac116100775a469f05f5158c16f</id>
<content type='text'>
After commit 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()")
__find_stripe() is called under conf-&gt;hash_locks + hash.
But handle_stripe_clean_event() calls remove_hash() under
conf-&gt;device_lock.

Under some cirscumstances the hash chain can be circuited,
and we get an infinite loop with disabled interrupts and locked hash
lock in __find_stripe(). This leads to hard lockup on multiple CPUs
and following system crash.

I was able to reproduce this behavior on raid6 over 6 ssd disks.
The devices_handle_discard_safely option should be set to enable trim
support. The following script was used:

for i in `seq 1 32`; do
    dd if=/dev/zero of=large$i bs=10M count=100 &amp;
done

neilb: original was against a 3.x kernel.  I forward-ported
  to 4.3-rc.  This verison is suitable for any kernel since
  Commit: 59fc630b8b5f ("RAID5: batch adjacent full stripe write")
  (v4.1+).  I'll post a version for earlier kernels to stable.

Signed-off-by: Roman Gushchin &lt;klamm@yandex-team.ru&gt;
Fixes: 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.13 - 4.2
</content>
</entry>
<entry>
<title>Merge tag 'md/4.3-rc6-fixes' of git://neil.brown.name/md</title>
<updated>2015-10-26T22:41:48Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-10-26T22:41:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ce6f9886037f5566cb8e440b9caa5e7d7334e53b'/>
<id>urn:sha1:ce6f9886037f5566cb8e440b9caa5e7d7334e53b</id>
<content type='text'>
Pull md fixes from Neil Brown:
 "Some raid1/raid10 fixes.

  I meant to get this to you before -rc7, but what with all the travel
  plans..

  Two fixes for bugs that are in both raid1 and raid10.  Both related to
  bad-block-lists and at least one needs to be back ported to 3.1.

  Also a revision for the "new" layout in raid10.  This "new" code
  (which aims to improve robustness) actually reduces robustness in some
  cases.  It probably isn't in use at all as not public user-space code
  makes use of these new layouts.  However just in case someone has
  their own code, it would be good to get the WARNing out for them
  sooner"

* tag 'md/4.3-rc6-fixes' of git://neil.brown.name/md:
  md/raid10: fix the 'new' raid10 layout to work correctly.
  md/raid10: don't clear bitmap bit when bad-block-list write fails.
  md/raid1: don't clear bitmap bit when bad-block-list write fails.
  md/raid10: submit_bio_wait() returns 0 on success
  md/raid1: submit_bio_wait() returns 0 on success
</content>
</entry>
<entry>
<title>md/raid10: fix the 'new' raid10 layout to work correctly.</title>
<updated>2015-10-24T05:24:25Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-22T02:20:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8bce6d35b308d73cdb2ee273c95d711a55be688c'/>
<id>urn:sha1:8bce6d35b308d73cdb2ee273c95d711a55be688c</id>
<content type='text'>
In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
supposed to rotate the replicas differently and so provide better
resilience.  In particular it could survive more combinations of 2
drive failures.

Unfortunately. due to a coding error, this some did what was wanted,
sometimes improved less than we hoped, and sometimes - in very
unlikely circumstances - put multiple replicas on the same device so
the redundancy was harmed.

No public user-space tool has created arrays using this layout so it
is very unlikely that zero-redundancy arrays actually exist.  Probably
no arrays using any form of the new layout exist.  But we cannot be
certain.

So use another bit in the 'layout' number and introduce a bug-fixed
version of the layout.
Also when assembling an array, if it has a zero-redundancy layout,
give a warning.

Reported-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md/raid10: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-10-24T05:24:23Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:23:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c340702ca26a628832fade4f133d8160a55c29cc'/>
<id>urn:sha1:c340702ca26a628832fade4f133d8160a55c29cc</id>
<content type='text'>
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fixes: bd870a16c594 ("md/raid10:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md/raid1: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-10-24T05:24:22Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:02:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bd8688a199b864944bf62eebed0ca13b46249453'/>
<id>urn:sha1:bd8688a199b864944bf62eebed0ca13b46249453</id>
<content type='text'>
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R1BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-and-tested-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Cc: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Fixes: cd5ff9a16f08 ("md/raid1:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>dm cache: the CLEAN_SHUTDOWN flag was not being set</title>
<updated>2015-10-23T18:02:56Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2015-10-22T17:10:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3201ac452e84a8a368197d648c9b7011e061804a'/>
<id>urn:sha1:3201ac452e84a8a368197d648c9b7011e061804a</id>
<content type='text'>
If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache
blocks are marked as dirty and a full writeback occurs.

__commit_transaction() is responsible for setting/clearing
CLEAN_SHUTDOWN (based the flags_mutator that is passed in).

Fix this issue, of the cache's on-disk flags being wrong, by making sure
__commit_transaction() does not reset the flags after the mutator has
altered the flags in preparation for them being serialized to disk.

before:

sb_flags = mutator(le32_to_cpu(disk_super-&gt;flags));
disk_super-&gt;flags = cpu_to_le32(sb_flags);
disk_super-&gt;flags = cpu_to_le32(cmd-&gt;flags);

after:

disk_super-&gt;flags = cpu_to_le32(cmd-&gt;flags);
sb_flags = mutator(le32_to_cpu(disk_super-&gt;flags));
disk_super-&gt;flags = cpu_to_le32(sb_flags);

Reported-by: Bogdan Vasiliev &lt;bogdan.vasiliev@gmail.com&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>dm btree: fix leak of bufio-backed block in btree_split_beneath error path</title>
<updated>2015-10-23T18:02:55Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2015-10-22T14:56:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4dcb8b57df3593dcb20481d9d6cf79d1dc1534be'/>
<id>urn:sha1:4dcb8b57df3593dcb20481d9d6cf79d1dc1534be</id>
<content type='text'>
btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.

Fix this by releasing the previously allocated bufio block using
unlock_block().

Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Joe Thornber &lt;thornber@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>dm btree remove: fix a bug when rebalancing nodes after removal</title>
<updated>2015-10-23T18:02:55Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2015-10-21T17:36:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2871c69e025e8bc507651d5a9cf81a8a7da9d24b'/>
<id>urn:sha1:2871c69e025e8bc507651d5a9cf81a8a7da9d24b</id>
<content type='text'>
Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().

The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them.  If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.

Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.

Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
</feed>
