linux/drivers/block/drbd, branch v3.16

drbd: fix regression 'out of mem, failed to invoke fence-peer helper'

2014-07-10T09:06:03Z

Since linux kernel 3.13, kthread_run() internally uses wait_for_completion_killable(). We sometimes may use kthread_run() while we still have a signal pending, which we used to kick our threads out of potentially blocking network functions, causing kthread_run() to mistake that as a new fatal signal and fail. Fix: flush_signals() before kthread_run(). Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: fix NULL pointer deref in blk_add_request_payload

2014-06-25T15:53:47Z

Discards don't have any payload. But the scsi layer still expects a bio_vec it can use internally, see sd_setup_discard_cmnd() and blk_add_request_payload(). Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: use list_first_entry_or_null in first_peer_device/first_connection

2014-04-30T19:46:56Z

If there are no peer_devices or connections, I'd rather have NULL than some "arbitrary" address pretending to point to a struct. Helps to avoid hard to debug symptoms, in case we ever try to use and dereference a drbd_connection or drbd_peer_device where we in fact don't have any connection at all. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: Allow attaching of a newly created device to any backing device

2014-04-30T19:46:56Z

A newly created device was never exposed before, i.e. has a exposed_data_uuid of 0. Then it is valid to attach to any current_uuid of a backing device (of course also to a newly created one (4)) Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: Test cstate while holding req_lock

2014-04-30T19:46:56Z

In case a connection transitions into C_TIMEOUT within the timer function (request_timer_fn()) we need to make sure that the receiver thread (potentially running on a different CPU) sees the updated cstate later on. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: use blk_set_stacking_limits()

2014-04-30T19:46:55Z

...instead directly assigning to q->limits.discard_zeroes_data Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: evaluate disk and network timeout on different requests

2014-04-30T19:46:55Z

Just because it is the oldest not yet completed request does not make it the oldest request waiting for disk. Or waiting for the peer. And we completely missed already completed requests that would still hold references to activity log extents, waiting only for the barrier ack. Find two oldest not yet completely processed requests, one that is still waiting for local completion, and one that is still waiting for some response from the peer. These may or may not be the same request object. Then separately apply the network and disk timeouts, respectively. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: Fix a hole in the challange-response connection authentication

2014-04-30T19:46:55Z

In the implementation as it was, the two peers sent each other a challenge, and expects the challenge hashed with the shared secret back. A attacker could simply wait for the challenge of the peer, and send the same challenge back. Then it waits for the response, and sends the same response back. Prevent this by not accepting a challenge from the peer that is the same as the challenge sent to the peer. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: always implicitly close last epoch when idle

2014-04-30T19:46:55Z

Once our sender thread needs to wait_for_work(), and actually needs to schedule(), just before we do that, we already check if it is useful to implicitly close the last epoch. The condition was too strict: only implicitly close the epoch, if there have been no new (write) requests at all. The assumption was that if there were new requests, they would always be communicated one way or another, and would send necessary epoch separating barriers explicitly. This is not always true, e.g. when becoming diskless, or while explicitly starting a full resync. The last communicated epoch could stay open for a long time, locking down corresponding activity log extents. It is safe to always implicitly send that last barrier, as soon as we determin that there cannot be more requests in the last communicated epoch, even if there have been (uncommunicated) new requests in new epochs meanwhile. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe

drbd: add back some fairness to AL transactions

2014-04-30T19:46:55Z

When batching more updates to the activity log into single transactions, we lost the ability for new requests to force themselves into the active set: all preparation steps became non-blocking, and if all currently hot extents keep busy, they could starve out new incoming requests to cold extents for quite a while. This can only happen if your IO backend accepts more IO operations per average DRBD replication round trip time than you have al-extents configured. If we have incoming requests to cold extents, at least do one blocking update per transaction. In an artificial worst-case workload on SSD with an asynchronous 600 ms replication link, with al-extents = 7 (the minimum we allow), and concurrent full resynch, without this patch, some write requests have been observed to be starved for 40 seconds. With this patch, application observed a worst case latency of twice the replication round trip time. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe