linux/block, branch v2.6.26

[SCSI] bsg: fix oops on remove

2008-07-12T15:14:56Z

If you do a modremove of any sas driver, you run into an oops on shutdown when the host is removed (coming from the host bsg device). The root cause seems to be that there's a use after free of the bsg_class_device: In bsg_kref_release_function, this is used (to do a put_device(bcg->parent) after bcg->release has been called. In sas (and possibly many other things) bcd->release frees the queue which contains the bsg_class_device, so we get a put_device on unreferenced memory. Fix this by taking a copy of the pointer to the parent before releasing bsg. Acked-by: FUJITA Tomonori Signed-off-by: James Bottomley

block: Fix the starving writes bug in the anticipatory IO scheduler

2008-07-01T07:06:42Z

AS scheduler alternates between issuing read and write batches. It does the batch switch only after all requests from the previous batch are completed. When switching to a write batch, if there is an on-going read request, it waits for its completion and indicates its intention of switching by setting ad->changed_batch and the new direction but does not update the batch_expire_time for the new write batch which it does in the case of no previous pending requests. On completion of the read request, it sees that we were waiting for the switch and schedules work for kblockd right away and resets the ad->changed_data flag. Now when kblockd enters dispatch_request where it is expected to pick up a write request, it in turn ends the write batch because the batch_expire_timer was not updated and shows the expire timestamp for the previous batch. This results in the write starvation for all the cases where there is the intention for switching to a write batch, but there is a previous in-flight read request and the batch gets reverted to a read_batch right away. This also holds true in the reverse case (switching from a write batch to a read batch with an in-flight write request). I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and linux-2.6-block git HEAD. I've tested the fix on x86 platforms with SCSI drives where the driver asks for the next request while a current request is in-flight. This patch is based off linux-2.6-block git HEAD. Bug reproduction: A simple scenario which reproduces this bug is: - dd if=/dev/hda3 of=/dev/null & - lilo The lilo takes forever to complete. This can also be reproduced fairly easily with the earlier dd and another test program doing msync(). The example test program below should print out a message after every iteration but it simply hangs forever. With this bugfix it makes forward progress. ==== Example test program using msync() (thanks to suleiman AT google DOT com) inline uint64_t rdtsc(void) { int64_t tsc; __asm __volatile("rdtsc" : "=A" (tsc)); return (tsc); } int main(int argc, char **argv) { struct stat st; uint64_t e, s, t; char *p, q; long i; int fd; if (argc < 2) { printf("Usage: %s \n", argv[0]); return (1); } if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0) err(1, "open"); if (fstat(fd, &st) < 0) err(1, "fstat"); p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); t = 0; for (i = 0; i < 1000; i++) { *p = 0; msync(p, 4096, MS_SYNC); s = rdtsc(); *p = 0; __asm __volatile(""::: "memory"); e = rdtsc(); if (argc > 2) printf("%d: %lld cycles %jd %jd\n", i, e - s, (intmax_t)s, (intmax_t)e); t += e - s; } printf("average time: %lld cycles\n", t / 1000); return (0); } Cc: Acked-by: Nick Piggin Signed-off-by: Jens Axboe

block: disable IRQs until data is written to relay channel

2008-06-12T18:20:57Z

As we may run relay_reserve from interrupt context we must always disable IRQs. This is because a call to relay_reserve may expose previously written data to use space. Updated new message code and an old but related comment. Signed-off-by: Carl Henrik Lunde Signed-off-by: Jens Axboe Signed-off-by: Linus Torvalds

Fix invalid access errors in blk_lookup_devt

2008-06-09T17:06:24Z

Commit 30f2f0eb4bd2c43d10a8b0d872c6e5ad8f31c9a0 ("block: do_mounts - accept root=") extended blk_lookup_devt() to be able to look up partitions that had not yet been registered, but in the process made the assumption that the '&block_class.devices' list only contains disk devices and that you can do 'dev_to_disk(dev)' on them. That isn't actually true. The block_class device list also contains the partitions we've discovered so far, and you can't just do a 'dev_to_disk()' on those. So make sure to only work on devices that block/genhd.c has registered itself, something we can test by checking the 'dev->type' member. This makes the loop in blk_lookup_devt() match the other such loops in this file. [ We may want to do an alternate version that knows to handle _either_ whole-disk devices or partitions, but for now this is the minimal fix for a series of crashes reported by Mariusz Kozlowski in http://lkml.org/lkml/2008/5/25/25 and Ingo in http://lkml.org/lkml/2008/6/9/39 ] Reported-by: Mariusz Kozlowski Reported-by: Ingo Molnar Cc: Neil Brown Cc: Joao Luis Meloni Assirati Acked-by: Kay Sievers Cc: Greg Kroah-Hartman Signed-off-by: Linus Torvalds

cfq-iosched: fix RCU problem in cfq_cic_lookup()

2008-05-28T12:49:28Z

cfq_cic_lookup() needs to properly protect ioc->ioc_data before dereferencing it and also exclude updaters of ioc->ioc_data as well. Also add a number of comments documenting why the existing RCU usage is OK. Thanks a lot to "Paul E. McKenney" for review and comments! Signed-off-by: Jens Axboe

block: make blktrace use per-cpu buffers for message notes

2008-05-28T12:49:27Z

Currently it uses a single static char array, but that risks being corrupted when multiple users issue message notes at the same time. Make the buffers dynamically allocated when the trace is setup and make them per-cpu instead. The default max message size of 1k is also very large, the interface is mainly for small text notes. So shrink it to 128 bytes. Signed-off-by: Jens Axboe

Added in elevator switch message to blktrace stream

2008-05-28T12:49:27Z

Signed-off-by: Alan D. Brunelle Signed-off-by: Jens Axboe

Added in MESSAGE notes for blktraces

2008-05-28T12:49:27Z

Allows messages to be inserted into blktrace streams. Signed-off-by: Alan D. Brunelle Signed-off-by: Jens Axboe

block: reorder cfq_queue to save space on 64bit builds

2008-05-28T12:49:27Z

saves 8 bytes of padding & increases objects/slab from 30 to 32 on my AMD64 config Signed-off-by: Richard Kennedy Signed-off-by: Jens Axboe

block: Move the second call to get_request to the end of the loop

2008-05-28T12:49:27Z

In function get_request_wait, the second call to get_request could be moved to the end of the while loop, because if the first call to get_request fails, the second call will fail without sleep. Signed-off-by: Zhang Yanmin Signed-off-by: Jens Axboe