<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md/raid10.c, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-01-20T21:52:20Z</updated>
<entry>
<title>MD: rename some functions</title>
<updated>2016-01-20T21:52:20Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-01-20T21:52:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=849674e4fb175e47b7504249f7367367b18fe6a1'/>
<id>urn:sha1:849674e4fb175e47b7504249f7367367b18fe6a1</id>
<content type='text'>
These short function names are hard to search. Rename them to make vim happy.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
</entry>
<entry>
<title>md/raid: only permit hot-add of compatible integrity profiles</title>
<updated>2016-01-14T00:49:57Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2016-01-14T00:00:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1501efadc524a0c99494b576923091589a52d2a4'/>
<id>urn:sha1:1501efadc524a0c99494b576923091589a52d2a4</id>
<content type='text'>
It is not safe for an integrity profile to be changed while i/o is
in-flight in the queue.  Prevent adding new disks or otherwise online
spares to an array if the device has an incompatible integrity profile.

The original change to the blk_integrity_unregister implementation in
md, commmit c7bfced9a671 "md: suspend i/o during runtime
blk_integrity_unregister" introduced an immediate hang regression.

This policy of disallowing changes the integrity profile once one has
been established is shared with DM.

Here is an abbreviated log from a test run that:
1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]

[   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
[   59.078302] md: data integrity enabled on md0
[..]
[   90.489209] md0: incompatible integrity profile for pmem1m
[..]
[  205.671277] md: super_written gets error=-5
[  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
[  205.677386] md/raid1:md0: Operation continuing on 1 devices.
[  205.683037] RAID1 conf printout:
[  205.684699]  --- wd:1 rd:2
[  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
[  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
[  205.691717] md: recovery of RAID array md0

Fixes: c7bfced9a671 ("md: suspend i/o during runtime blk_integrity_unregister")
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Reported-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md/raid10: fix data corruption and crash during resync</title>
<updated>2015-12-18T04:19:16Z</updated>
<author>
<name>Artur Paszkiewicz</name>
<email>artur.paszkiewicz@intel.com</email>
</author>
<published>2015-12-18T04:19:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cc57858831e3e9678291de730c4b4d2e52a19f59'/>
<id>urn:sha1:cc57858831e3e9678291de730c4b4d2e52a19f59</id>
<content type='text'>
The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
bio_copy_data()") replaced manual data copying with bio_copy_data() but
it doesn't work as intended. The source bio (fbio) is already processed,
so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt.  Because of
this, bio_copy_data() either does not copy anything, or worse, copies
data from the -&gt;bi_next bio if it is set.  This causes wrong data to be
written to drives during resync and sometimes lockups/crashes in
bio_copy_data():

[  517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
[  517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[  517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
[  517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
[  517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
[  517.468529] RIP: 0010:[&lt;ffffffff812e1888&gt;]  [&lt;ffffffff812e1888&gt;] bio_copy_data+0xc8/0x3c0
[  517.478164] RSP: 0018:ffff880150dfbc98  EFLAGS: 00000246
[  517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
[  517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
[  517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
[  517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
[  517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
[  517.525412] FS:  0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
[  517.534844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
[  517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  517.566144] Stack:
[  517.568626]  ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
[  517.577659]  0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
[  517.586715]  ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
[  517.595773] Call Trace:
[  517.598747]  [&lt;ffffffffa043ef95&gt;] raid10d+0xfc5/0x1690 [raid10]
[  517.605610]  [&lt;ffffffff816697ae&gt;] ? __schedule+0x29e/0x8e2
[  517.611987]  [&lt;ffffffff814ff206&gt;] md_thread+0x106/0x140
[  517.618072]  [&lt;ffffffff810c1d80&gt;] ? wait_woken+0x80/0x80
[  517.624252]  [&lt;ffffffff814ff100&gt;] ? super_1_load+0x520/0x520
[  517.630817]  [&lt;ffffffff8109ef89&gt;] kthread+0xc9/0xe0
[  517.636506]  [&lt;ffffffff8109eec0&gt;] ? flush_kthread_worker+0x70/0x70
[  517.643653]  [&lt;ffffffff8166d99f&gt;] ret_from_fork+0x3f/0x70
[  517.649929]  [&lt;ffffffff8109eec0&gt;] ? flush_kthread_worker+0x70/0x70

Signed-off-by: Artur Paszkiewicz &lt;artur.paszkiewicz@intel.com&gt;
Reviewed-by: Shaohua Li &lt;shli@kernel.org&gt;
Cc: stable@vger.kernel.org (v4.2+)
Fixes: c31df25f20e3 ("md/raid10: make sync_request_write() call bio_copy_data()")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'md/4.4' of git://neil.brown.name/md</title>
<updated>2015-11-05T05:12:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-05T05:12:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ac322de6bf5416cb145b58599297b8be73cd86ac'/>
<id>urn:sha1:ac322de6bf5416cb145b58599297b8be73cd86ac</id>
<content type='text'>
Pull md updates from Neil Brown:
 "Two major components to this update.

   1) The clustered-raid1 support from SUSE is nearly complete.  There
      are a few outstanding issues being worked on.  Maybe half a dozen
      patches will bring this to a usable state.

   2) The first stage of journalled-raid5 support from Facebook makes an
      appearance.  With a journal device configured (typically NVRAM or
      SSD), the "RAID5 write hole" should be closed - a crash during
      degraded operations cannot result in data corruption.

      The next stage will be to use the journal as a write-behind cache
      so that latency can be reduced and in some cases throughput
      increased by performing more full-stripe writes.

* tag 'md/4.4' of git://neil.brown.name/md: (66 commits)
  MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW
  MD: set journal disk -&gt;raid_disk
  MD: kick out journal disk if it's not fresh
  raid5-cache: start raid5 readonly if journal is missing
  MD: add new bit to indicate raid array with journal
  raid5-cache: IO error handling
  raid5: journal disk can't be removed
  raid5-cache: add trim support for log
  MD: fix info output for journal disk
  raid5-cache: use bio chaining
  raid5-cache: small log-&gt;seq cleanup
  raid5-cache: new helper: r5_reserve_log_entry
  raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
  raid5-cache: take rdev-&gt;data_offset into account early on
  raid5-cache: refactor bio allocation
  raid5-cache: clean up r5l_get_meta
  raid5-cache: simplify state machine when caches flushes are not needed
  raid5-cache: factor out a helper to run all stripes for an I/O unit
  raid5-cache: rename flushed_ios to finished_ios
  raid5-cache: free I/O units earlier
  ...
</content>
</entry>
<entry>
<title>Merge branch 'for-4.4/integrity' of git://git.kernel.dk/linux-block</title>
<updated>2015-11-05T04:51:48Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-05T04:51:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=527d1529e38b36fd22e65711b653ab773179d9e8'/>
<id>urn:sha1:527d1529e38b36fd22e65711b653ab773179d9e8</id>
<content type='text'>
Pull block integrity updates from Jens Axboe:
 ""This is the joint work of Dan and Martin, cleaning up and improving
  the support for block data integrity"

* 'for-4.4/integrity' of git://git.kernel.dk/linux-block:
  block, libnvdimm, nvme: provide a built-in blk_integrity nop profile
  block: blk_flush_integrity() for bio-based drivers
  block: move blk_integrity to request_queue
  block: generic request_queue reference counting
  nvme: suspend i/o during runtime blk_integrity_unregister
  md: suspend i/o during runtime blk_integrity_unregister
  md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown
  block: Inline blk_integrity in struct gendisk
  block: Export integrity data interval size in sysfs
  block: Reduce the size of struct blk_integrity
  block: Consolidate static integrity profile properties
  block: Move integrity kobject to struct gendisk
</content>
</entry>
<entry>
<title>md/raid10: fix the 'new' raid10 layout to work correctly.</title>
<updated>2015-10-24T05:24:25Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-22T02:20:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8bce6d35b308d73cdb2ee273c95d711a55be688c'/>
<id>urn:sha1:8bce6d35b308d73cdb2ee273c95d711a55be688c</id>
<content type='text'>
In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
supposed to rotate the replicas differently and so provide better
resilience.  In particular it could survive more combinations of 2
drive failures.

Unfortunately. due to a coding error, this some did what was wanted,
sometimes improved less than we hoped, and sometimes - in very
unlikely circumstances - put multiple replicas on the same device so
the redundancy was harmed.

No public user-space tool has created arrays using this layout so it
is very unlikely that zero-redundancy arrays actually exist.  Probably
no arrays using any form of the new layout exist.  But we cannot be
certain.

So use another bit in the 'layout' number and introduce a bug-fixed
version of the layout.
Also when assembling an array, if it has a zero-redundancy layout,
give a warning.

Reported-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md/raid10: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-10-24T05:24:23Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:23:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c340702ca26a628832fade4f133d8160a55c29cc'/>
<id>urn:sha1:c340702ca26a628832fade4f133d8160a55c29cc</id>
<content type='text'>
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fixes: bd870a16c594 ("md/raid10:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>md: suspend i/o during runtime blk_integrity_unregister</title>
<updated>2015-10-21T20:43:38Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2015-10-21T17:20:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7bfced9a6716ff66c9d61f934bb60af08d4688c'/>
<id>urn:sha1:c7bfced9a6716ff66c9d61f934bb60af08d4688c</id>
<content type='text'>
Synchronize pending i/o against a change in the integrity profile to
avoid the possibility of spurious integrity errors.  Given linear_add()
is suspending the mddev before manipulating the mddev, do the same for
the other personalities.

Acked-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>md/raid10: submit_bio_wait() returns 0 on success</title>
<updated>2015-10-20T20:24:29Z</updated>
<author>
<name>Jes Sorensen</name>
<email>Jes.Sorensen@redhat.com</email>
</author>
<published>2015-10-20T16:09:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=681ab4696062f5aa939c9e04d058732306a97176'/>
<id>urn:sha1:681ab4696062f5aa939c9e04d058732306a97176</id>
<content type='text'>
This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.

Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md")
Cc: stable@vger.kernel.org (v3.10)
Reported-by: Bill Kuzeja &lt;William.Kuzeja@stratus.com&gt;
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'md-next' of git://github.com/goldwynr/linux into for-next</title>
<updated>2015-10-13T20:09:52Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-13T20:09:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2a06c38d92d044a69a3eae0138ab95ff0788030'/>
<id>urn:sha1:c2a06c38d92d044a69a3eae0138ab95ff0788030</id>
<content type='text'>
md-cluster: A better way for METADATA_UPDATED processing

The processing of METADATA_UPDATED message is too simple and prone to
errors. Besides, it would not update the internal data structures as
required.

This set of patches reads the superblock from one of the device of the MD
and checks for changes in the in-memory data structures. If there is a change,
it performs the necessary actions to keep the internal data structures
as it would be in the primary node.

An example is if a devices turns faulty. The algorithm is:

1. The initiator node marks the device as faulty and updates the superblock
2. The initiator node sends METADATA_UPDATED with an advisory  device number to the rest of the nodes.
3. The receiving node on receiving the METADATA_UPDATED message
  3.1 Reads the superblock
  3.2 Detects a device has failed by comparing with memory structure
  3.3 Calls the necessary functions to record the failure and get the device out of the active array.
  3.4 Acknowledges the message.

The patch series also fixes adding the disk which was impacted because of
the changes.

Patches can also be found at
https://github.com/goldwynr/linux branch md-next

Changes since V2:
 - Fix status synchrnoization after --add and --re-add operations
 - Included Guoqing's patches on endian correctness, zeroing cmsg etc
 - Restructure add_new_disk() and cancel()
</content>
</entry>
</feed>
