linux/drivers/md, branch v3.12

raid5: avoid finding "discard" stripe

2013-10-24T02:00:24Z

SCSI discard will damage discard stripe bio setting, eg, some fields are changed. If the stripe is reused very soon, we have wrong bios setting. We remove discard stripe from hash list, so next time the strip will be fully initialized. Suitable for backport to 3.7+. Cc: (3.7+) Signed-off-by: Shaohua Li Signed-off-by: NeilBrown

raid5: set bio bi_vcnt 0 for discard request

2013-10-24T01:57:36Z

SCSI layer will add new payload for discard request. If two bios are merged to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse SCSI and cause oops. Suitable for backport to 3.7+ Cc: stable@vger.kernel.org (v3.7+) Reported-by: Jes Sorensen Signed-off-by: Shaohua Li Signed-off-by: NeilBrown Acked-by: Martin K. Petersen

md: avoid deadlock when md_set_badblocks.

2013-10-24T01:57:11Z

When operate harddisk and hit errors, md_set_badblocks is called after scsi_restart_operations which already disabled the irq. but md_set_badblocks will call write_sequnlock_irq and enable irq. so softirq can preempt the current thread and that may cause a deadlock. I think this situation should use write_sequnlock_irqsave/irqrestore instead. I met the situation and the call trace is below: [ 638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010 [ 638.921923] lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0 [ 638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37 [ 638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013 [ 638.927816] ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007 [ 638.929829] ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8 [ 638.931848] ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8 [ 638.933884] Call Trace: [ 638.935867] [] dump_stack+0x55/0x76 [ 638.937878] [] spin_dump+0x8a/0x8f [ 638.939861] [] spin_bug+0x21/0x26 [ 638.941836] [] do_raw_spin_lock+0xa4/0xc0 [ 638.943801] [] _raw_spin_lock+0x66/0x80 [ 638.945747] [] ? scsi_device_unbusy+0x9d/0xd0 [ 638.947672] [] ? _raw_spin_unlock+0x2b/0x50 [ 638.949595] [] scsi_device_unbusy+0x9d/0xd0 [ 638.951504] [] scsi_finish_command+0x37/0xe0 [ 638.953388] [] scsi_softirq_done+0xa8/0x140 [ 638.955248] [] blk_done_softirq+0x7b/0x90 [ 638.957116] [] __do_softirq+0xfd/0x330 [ 638.958987] [] ? __lock_release+0x6f/0x100 [ 638.960861] [] call_softirq+0x1c/0x30 [ 638.962724] [] do_softirq+0x8d/0xc0 [ 638.964565] [] irq_exit+0x10e/0x150 [ 638.966390] [] smp_apic_timer_interrupt+0x4a/0x60 [ 638.968223] [] apic_timer_interrupt+0x6f/0x80 [ 638.970079] [] ? __lock_release+0x6f/0x100 [ 638.971899] [] ? _raw_spin_unlock_irq+0x3a/0x50 [ 638.973691] [] ? _raw_spin_unlock_irq+0x30/0x50 [ 638.975475] [] md_set_badblocks+0x1f3/0x4a0 [ 638.977243] [] rdev_set_badblocks+0x27/0x80 [ 638.978988] [] raid5_end_read_request+0x36b/0x4e0 [raid456] [ 638.980723] [] bio_endio+0x1d/0x40 [ 638.982463] [] req_bio_endio.isra.65+0x83/0xa0 [ 638.984214] [] blk_update_request+0x7f/0x350 [ 638.985967] [] blk_update_bidi_request+0x31/0x90 [ 638.987710] [] __blk_end_bidi_request+0x20/0x50 [ 638.989439] [] __blk_end_request_all+0x1f/0x30 [ 638.991149] [] blk_peek_request+0x106/0x250 [ 638.992861] [] ? scsi_kill_request.isra.32+0xe9/0x130 [ 638.994561] [] scsi_request_fn+0x4a/0x3d0 [ 638.996251] [] __blk_run_queue+0x37/0x50 [ 638.997900] [] blk_run_queue+0x2f/0x50 [ 638.999553] [] scsi_run_queue+0xe0/0x1c0 [ 639.001185] [] scsi_run_host_queues+0x21/0x40 [ 639.002798] [] scsi_restart_operations+0x177/0x200 [ 639.004391] [] scsi_error_handler+0xc9/0xe0 [ 639.005996] [] ? scsi_unjam_host+0xd0/0xd0 [ 639.007600] [] kthread+0xdb/0xe0 [ 639.009205] [] ? flush_kthread_worker+0x170/0x170 [ 639.010821] [] ret_from_fork+0x7c/0xb0 [ 639.012437] [] ? flush_kthread_worker+0x170/0x170 This bug was introduce in commit 2e8ac30312973dd20e68073653 (the first time rdev_set_badblock was call from interrupt context), so this patch is appropriate for 3.5 and subsequent kernels. Cc: (3.5+) Signed-off-by: Bian Yu Reviewed-by: Jianpeng Ma Signed-off-by: NeilBrown

md: Fix skipping recovery for read-only arrays.

2013-10-24T01:55:17Z

Since: commit 7ceb17e87bde79d285a8b988cfed9eaeebe60b86 md: Allow devices to be re-added to a read-only array. spares are activated on a read-only array. In case of raid1 and raid10 personalities it causes that not-in-sync devices are marked in-sync without checking if recovery has been finished. If a read-only array is degraded and one of its devices is not in-sync (because the array has been only partially recovered) recovery will be skipped. This patch adds checking if recovery has been finished before marking a device in-sync for raid1 and raid10 personalities. In case of raid5 personality such condition is already present (at raid5.c:6029). Bug was introduced in 3.10 and causes data corruption. Cc: stable@vger.kernel.org Signed-off-by: Pawel Baldysiak Signed-off-by: Lukasz Dorau Signed-off-by: NeilBrown

bcache: Fixed incorrect order of arguments to bio_alloc_bioset()

2013-10-23T06:55:36Z

Signed-off-by: Kent Overstreet Cc: linux-stable # >= v3.10 Signed-off-by: Linus Torvalds

dm snapshot: fix data corruption

2013-10-16T02:17:47Z

This patch fixes a particular type of data corruption that has been encountered when loading a snapshot's metadata from disk. When we allocate a new chunk in persistent_prepare, we increment ps->next_free and we make sure that it doesn't point to a metadata area by further incrementing it if necessary. When we load metadata from disk on device activation, ps->next_free is positioned after the last used data chunk. However, if this last used data chunk is followed by a metadata area, ps->next_free is positioned erroneously to the metadata area. A newly-allocated chunk is placed at the same location as the metadata area, resulting in data or metadata corruption. This patch changes the code so that ps->next_free skips the metadata area when metadata are loaded in function read_exceptions. The patch also moves a piece of code from persistent_prepare_exception to a separate function skip_metadata to avoid code duplication. CVE-2013-4299 Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Cc: Mike Snitzer Signed-off-by: Alasdair G Kergon

bcache: Fix a null ptr deref regression

2013-10-11T01:17:39Z

Commit c0f04d88e46d ("bcache: Fix flushes in writeback mode") was fixing a reported data corruption bug, but it seems some last minute refactoring or rebasing introduced a null pointer deref. Signed-off-by: Kent Overstreet Cc: linux-stable # >= v3.10 Reported-by: Gabriel de Perthuis Signed-off-by: Linus Torvalds

Merge tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

2013-09-25T22:12:46Z

Pull device-mapper fixes from Mike Snitzer: "A few fixes for dm-snapshot, a 32 bit fix for dm-stats, a couple error handling fixes for dm-multipath. A fix for the thin provisioning target to not expose non-zero discard limits if discards are disabled. Lastly, add two DM module parameters which allow users to tune the emergency memory reserves that DM mainatins per device -- this helps fix a long-standing issue for dm-multipath. The conservative default reserve for request-based dm-multipath devices (256) has proven problematic for users with many multipathed SCSI devices but relatively little memory. To responsibly select a smaller value users should use the new nr_bios tracepoint info (via commit 75afb352 "block: Add nr_bios to block_rq_remap tracepoint") to determine the peak number of bios their workloads create" * tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: add reserved_bio_based_ios module parameter dm: add reserved_rq_based_ios module parameter dm: lower bio-based mempool reservation dm thin: do not expose non-zero discard limits if discards disabled dm mpath: disable WRITE SAME if it fails dm-snapshot: fix performance degradation due to small hash size dm snapshot: workaround for a false positive lockdep warning dm stats: fix possible counter corruption on 32-bit systems dm mpath: do not fail path on -ENOSPC

bcache: Fix flushes in writeback mode

2013-09-24T21:41:43Z

In writeback mode, when we get a cache flush we need to make sure we issue a flush to the backing device. The code for sending down an extra flush was wrong - by cloning the bio we were probably getting flags that didn't make sense for a bare flush, and also the old code was firing for FUA bios, for which we don't need to send a flush to the backing device. This was causing data corruption somehow - the mechanism was never determined, but this patch fixes it for the users that were seeing it. Signed-off-by: Kent Overstreet Cc: linux-stable # >= v3.10 Signed-off-by: Linus Torvalds

bcache: Fix for handling overlapping extents when reading in a btree node

2013-09-24T21:41:43Z

btree_sort_fixup() was overly clever, because it was trying to avoid pulling a key off the btree iterator in more than one place. This led to a really obscure bug where we'd break early from the loop in btree_sort_fixup() if the current key overlapped with keys in more than one older set, and the next key it overlapped with was zero size. Signed-off-by: Kent Overstreet Cc: linux-stable # >= v3.10 Signed-off-by: Linus Torvalds