linux/drivers/md/dm-snap.c, branch v4.5

dm snapshot: fix hung bios when copy error occurs

2016-01-09T01:03:05Z

When there is an error copying a chunk dm-snapshot can incorrectly hold associated bios indefinitely, resulting in hung IO. The function copy_callback sets pe->error if there was error copying the chunk, and then calls complete_exception. complete_exception calls pending_complete on error, otherwise it calls commit_exception with commit_callback (and commit_callback calls complete_exception). The persistent exception store (dm-snap-persistent.c) assumes that calls to prepare_exception and commit_exception are paired. persistent_prepare_exception increases ps->pending_count and persistent_commit_exception decreases it. If there is a copy error, persistent_prepare_exception is called but persistent_commit_exception is not. This results in the variable ps->pending_count never returning to zero and that causes some pending exceptions (and their associated bios) to be held forever. Fix this by unconditionally calling commit_exception regardless of whether the copy was successful. A new "valid" parameter is added to commit_exception -- when the copy fails this parameter is set to zero so that the chunk that failed to copy (and all following chunks) is not recorded in the snapshot store. Also, remove commit_callback now that it is merely a wrapper around pending_complete. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org

dm: don't save and restore bi_private

2015-12-10T15:38:56Z

Device mapper used the field bi_private to point to dm_target_io. However, since kernel 3.15, the bi_private field is unused, and so the targets do not need to save and restore this field. This patch removes code that saves and restores bi_private from dm-cache, dm-snapshot and dm-verity. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer

dm snapshot: add new persistent store option to support overflow

2015-10-09T20:57:03Z

Commit 76c44f6d80 introduced the possibly for "Overflow" to be reported by the snapshot device's status. Older userspace (e.g. lvm2) does not handle the "Overflow" status response. Fix this incompatibility by requiring newer userspace code, that can cope with "Overflow", request the persistent store with overflow support by using "PO" (Persistent with Overflow) for the snapshot store type. Reported-by: Zdenek Kabelac Fixes: 76c44f6d80 ("dm snapshot: don't invalidate on-disk image on snapshot write overflow") Reviewed-by: Mikulas Patocka Signed-off-by: Mike Snitzer

Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

2015-09-02T23:35:26Z

Pull device mapper update from Mike Snitzer: - a couple small cleanups in dm-cache, dm-verity, persistent-data's dm-btree, and DM core. - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio prison cells - a 4.2-stable fix that adds feature reporting for the dm-stats features added in 4.2 - improve DM-snapshot to not invalidate the on-disk snapshot if snapshot device write overflow occurs; but a write overflow triggered through the origin device will still invalidate the snapshot. - optimize DM-thinp's async discard submission a bit now that late bio splitting has been included in block core. - switch DM-cache's SMQ policy lock from using a mutex to a spinlock; improves performance on very low latency devices (eg. NVMe SSD). - document DM RAID 4/5/6's discard support [ I did not pull the slab changes, which weren't appropriate for this tree, and weren't obviously the right thing to do anyway. At the very least they need some discussion and explanation before getting merged. Because not pulling the actual tagged commit but doing a partial pull instead, this merge commit thus also obviously is missing the git signature from the original tag ] * tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix use after freeing migrations dm cache: small cleanups related to deferred prison cell cleanup dm cache: fix leaking of deferred bio prison cells dm raid: document RAID 4/5/6 discard support dm stats: report precise_timestamps and histogram in @stats_list output dm thin: optimize async discard submission dm snapshot: don't invalidate on-disk image on snapshot write overflow dm: remove unlikely() before IS_ERR() dm: do not override error code returned from dm_get_device() dm: test return value for DM_MAPIO_SUBMITTED dm verity: remove unused mempool dm cache: move wake_waker() from free_migrations() to where it is needed dm btree remove: remove unused function get_nr_entries() dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling() dm cache policy smq: change the mutex to a spinlock

block: kill merge_bvec_fn() completely

2015-08-13T18:31:57Z

As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe Cc: Lars Ellenberg Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina Cc: Yehuda Sadeh Cc: Sage Weil Cc: Alex Elder Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon Cc: Mike Snitzer Cc: dm-devel@redhat.com Cc: Neil Brown Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig Cc: "Martin K. Petersen" Acked-by: NeilBrown (for the 'md' bits) Acked-by: Mike Snitzer Signed-off-by: Kent Overstreet [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin Signed-off-by: Jens Axboe

dm snapshot: don't invalidate on-disk image on snapshot write overflow

2015-08-12T20:22:55Z

When the snapshot overflows because of a write to the origin, the on-disk image has to be invalidated. However, when the snapshot overflows because of a write to the snapshot, the on-disk image doesn't have to be invalidated. Change the behavior so that the on-disk image is not invalidated in this case. When the snapshot overflows, the variable snapshot_overflowed is set. All writes to the snapshot are disallowed to minimize filesystem corruption - this condition is cleared when the snapshot is deactivated and activated. The user can extend the overflowed snapshot, deactivate and activate it again, run fsck (if journaling filesystem is not used) mount it and recover the data. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer

block: add a bi_error field to struct bio

2015-07-29T14:55:15Z

Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: NeilBrown Signed-off-by: Jens Axboe

block: remove management of bi_remaining when restoring original bi_end_io

2015-05-22T14:58:55Z

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") regressed all existing callers that followed this pattern: 1) saving a bio's original bi_end_io 2) wiring up an intermediate bi_end_io 3) restoring the original bi_end_io from intermediate bi_end_io 4) calling bio_endio() to execute the restored original bi_end_io The regression was due to BIO_CHAIN only ever getting set if bio_inc_remaining() is called. For the above pattern it isn't set until step 3 above (step 2 would've needed to establish BIO_CHAIN). As such the first bio_endio(), in step 2 above, never decremented __bi_remaining before calling the intermediate bi_end_io -- leaving __bi_remaining with the value 1 instead of 0. When bio_inc_remaining() occurred during step 3 it brought it to a value of 2. When the second bio_endio() was called, in step 4 above, it should've called the original bi_end_io but it didn't because there was an extra reference that wasn't dropped (due to atomic operations being optimized away since BIO_CHAIN wasn't set upfront). Fix this issue by removing the __bi_remaining management complexity for all callers that use the above pattern -- bio_chain() is the only interface that _needs_ to be concerned with __bi_remaining. For the above pattern callers just expect the bi_end_io they set to get called! Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls that aren't associated with the bio_chain() interface. Also, the bio_inc_remaining() interface has been moved local to bio.c. Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Mike Snitzer Signed-off-by: Jens Axboe

bio: skip atomic inc/dec of ->bi_remaining for non-chains

2015-05-05T19:32:47Z

Struct bio has an atomic ref count for chained bio's, and we use this to know when to end IO on the bio. However, most bio's are not chained, so we don't need to always introduce this atomic operation as part of ending IO. Add a helper to elevate the bi_remaining count, and flag the bio as now actually needing the decrement at end_io time. Rename the field to __bi_remaining to catch any current users of this doing the incrementing manually. For high IOPS workloads, this reduces the overhead of bio_endio() substantially. Tested-by: Robert Elliott Acked-by: Kent Overstreet Reviewed-by: Jan Kara Signed-off-by: Jens Axboe

dm snapshot: suspend merging snapshot when doing exception handover

2015-02-27T19:53:16Z

The "dm snapshot: suspend origin when doing exception handover" commit fixed a exception store handover bug associated with pending exceptions to the "snapshot-origin" target. However, a similar problem exists in snapshot merging. When snapshot merging is in progress, we use the target "snapshot-merge" instead of "snapshot-origin". Consequently, during exception store handover, we must find the snapshot-merge target and suspend its associated mapped_device. To avoid lockdep warnings, the target must be suspended and resumed without holding _origins_lock. Introduce a dm_hold() function that grabs a reference on a mapped_device, but unlike dm_get(), it doesn't crash if the device has the DMF_FREEING flag set, it returns an error in this case. In snapshot_resume() we grab the reference to the origin device using dm_hold() while holding _origins_lock (_origins_lock guarantees that the device won't disappear). Then we release _origins_lock, suspend the device and grab _origins_lock again. NOTE to stable@ people: When backporting to kernels 3.18 and older, use dm_internal_suspend and dm_internal_resume instead of dm_internal_suspend_fast and dm_internal_resume_fast. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org