aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/raid (follow)
AgeCommit message (Collapse)AuthorFilesLines
2005-02-07[PATCH] raid5 overlapping read hackNeil Brown1-0/+2
If we detect an overlap, we set a flag and wait for a wakeup. When requests are handled, if the flag was set, we perform the wakeup. Note that the code currently in -mm is badly broken. With this patch applied, it passes tests the use O_DIRECT to cause lots of overlapping requests. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-07[PATCH] md: improve 'hash' code in linear.cNeil Brown1-6/+1
The hashtable that linear uses to find the right device stores two pointers for every entry. The second is always one of: The first plus 1 NULL When NULL, it is never accessed, so any value can be stored. Thus it could always be "first plus 1", and so we don't need to store it as it is trivial to calculate. This patch halves the size of this table, which results in some simpler code as well. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-10[PATCH] md: "Faulty" personalityNeil Brown1-1/+6
The 'faulty' personality provides a layer over any block device in which errors may be synthesised. A variety of errors are possible including transient and persistent read and write errors, and read errors that persist until the next write. There error mode can be changed on a live array. Accessing this personality requires mdadm 2.8.0 or later. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-10[PATCH] md: fix problem with md/linear for devices larger than 2 terabytesNeil Brown1-2/+2
Some size fields were "int" instead of "sector_t". Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-25[PATCH] md: fixes to make version-1 superblocks work in md driverNeil Brown2-3/+3
Add some missing data_offset additions and some le_to_cpu convertions and fix a few other little mistakes. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-25[PATCH] md: make retry_list non-global in raid1 and multipathNeil Brown2-1/+3
Both raid1 and multipath have a "retry_list" which is global, so all raid1 arrays (for example) us the same list. This is rather ugly, and it is simple enough to make it per-array, so this patch does that. It also changes to multipath code to use list.h lists instead of roll-your-own. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-13[PATCH] mark md_interrupt_thread staticChristoph Hellwig1-1/+0
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23[PATCH] md: RAID10 moduleNeil Brown2-1/+107
This patch adds a 'raid10' module which provides features similar to both raid0 and raid1 in the one array. Various combinations of layout are supported. This code is still "experimental", but appears to work. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23[PATCH] md: assorted fixes/improvemnet to generic md resync code.Neil Brown2-1/+6
1/ Introduce "mddev->resync_max_sectors" so that an md personality can ask for resync to cover a different address range than that of a single drive. raid10 will use this. 2/ fix is_mddev_idle so that if there seem to be a negative number of events, it doesn't immediately assume activity. 3/ make "sync_io" (the count of IO sectors used for array resync) an atomic_t to avoid SMP races. 4/ Pass md_sync_acct a "block_device" rather than the containing "rdev", as the whole rdev isn't needed. Also make this an inline function. 5/ Make sure recovery gets interrupted on any error. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04[PATCH] md: support reshaping raid1 arrays - adding or removing drives.Neil Brown2-0/+17
This allows the number of "raid_disks" in a raid1 to be changed. This requires allocating a new pool of "r1bio" structures which a different number of bios, suspending IO, and swapping the new pool in place of the old. (and a few other related changes). Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04[PATCH] md: allow md arrays to be resized if devices are large enough.Neil Brown1-0/+1
It is possible to have raid1/4/5/6 arrays that do not use all the space on the drive. This can be done explicitly, or can happen info you, one by one, replace all the drives with larger devices. This patch extends the "SET_ARRAY_INFO" ioctl (which previously invalid on active arrays) allow some attributes of the array to be changed and implements changing of the "size" attribute. "size" is the amount of each device that is actually used. If "size" is increased, the new space will immediately be "resynced". Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04[PATCH] md: make sure md_check_recovery will remove a faulty device when ↵Neil Brown1-0/+8
->nr_pending hits 0 md_check_recovery only locks a device and does stuff when it thinks there is a real likelyhood that something needs doing. So the test at the top must cover all possibilities. But it didn't cover the possibility that the last outstanding request on a failed device had finished and so the device needed to be removed. As a result, a failed drive might not get removed from the personalities perspective on the array, and so it could never be removed from the array as a whole. With this patch, whenever ->nr_pending hits zero on a faulty device, MD_RECOVERY_NEEDED is set so that md_check_recovery will do stuff. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-19[PATCH] Remove blk_run_queues() remnantsAndrew Morton1-1/+1
It no longer exists.
2004-04-12[PATCH] unplugging: md updateAndrew Morton1-3/+4
From: Neil Brown <neilb@cse.unsw.edu.au> I've made a bunch of changes to the 'md' bits - largely moving the unplugging into the individual personalities which know more about which drives are actually in use.
2004-04-12[PATCH] per-backing dev unpluggingAndrew Morton2-26/+1
From: Jens Axboe <axboe@suse.de>, Chris Mason, me, others. The global unplug list causes horrid spinlock contention on many-disk many-CPU setups - throughput is worse than halved. The other problem with the global unplugging is of course that it will cause the unplugging of queues which are unrelated to the I/O upon which the caller is about to wait. So what we do to solve these problems is to remove the global unplug and set up the infrastructure under which the VFS can tell the block layer to unplug only those queues which are relevant to the page or buffer_head whcih is about to be waited upon. We do this via the very appropriate address_space->backing_dev_info structure. Most of the complexity is in devicemapper, MD and swapper_space, because for these backing devices, multiple queues may need to be unplugged to complete a page/buffer I/O. In each case we ensure that data structures are in place to permit us to identify all the lower-level queues which contribute to the higher-level backing_dev_info. Each contributing queue is told to unplug in response to a higher-level unplug. To simplify things in various places we also introduce the concept of a "synchronous BIO": it is tagged with BIO_RW_SYNC. The block layer will perform an immediate unplug when it sees one of these go past.
2004-03-28[PATCH] md: Convert a number or "unsigned long"s to "sector_t"sAndrew Morton1-2/+2
From: NeilBrown <neilb@cse.unsw.edu.au> This helps raid5 work on at least 1 very large array.. Thanks to Evan Felix <evan.felix@pnl.gov>
2004-02-18[PATCH] md: Allow partitioning of MD devices.Andrew Morton1-10/+3
From: NeilBrown <neilb@cse.unsw.edu.au> With this patch, md used two major numbers for arrays. One Major is number 9 with name 'md' have unpartitioned md arrays, one per minor number. The other Major is allocated dynamically with name 'mdp' and had on array for every 64 minors, allowing for upto 63 partitions. The arrays under one major are completely separate from the arrays under the other. The preferred name for devices with the new major are of the form: /dev/md/d1p3 # partion 3 of device 1 - minor 67 When a paritioned md device is assembled, the partitions are not recognised until after the whole-array device is opened again. A future version of mdadm will perform this open so that the need will be transparent.
2004-02-18[PATCH] md: Avoid unnecessary bio allocation during raid1 resyncAndrew Morton1-0/+1
From: NeilBrown <neilb@cse.unsw.edu.au> For each resync request, we allocate a "r1_bio" which has a bio "master_bio" attached that goes largely unused. We also allocate a read_bio which is used. This patch removes the read_bio and just uses the master_bio instead. This fixes a bug wherein bi_bdev of the master_bio wasn't being set, but was being used. We also introduce a new "sectors" field into the r1_bio as we can no-longer rely in master_bio->bi_sectors.
2004-02-18[PATCH] md: Remove some un-needed fields from r1bio_sAndrew Morton1-4/+2
From: NeilBrown <neilb@cse.unsw.edu.au> next_r1 is never used, so it can just go. read_bio isn't needed as we can easily use one of the pointers in the write_bios array - write_bios[->read_disk]. So rename "write_bios" to "bios" and store the pointer to the read bio in there.
2004-02-18[PATCH] md: Discard the cmd field from r1_bio structureAndrew Morton1-3/+2
From: NeilBrown <neilb@cse.unsw.edu.au> The only time it is really needed is to differentiate a retry-on-fail from a write-after-read-for-resync request to raid1d. So we use a bit in 'state' for that.
2004-02-03[PATCH] md: Change the way the name of an md device is printed in error ↵Andrew Morton1-0/+4
messages. From: NeilBrown <neilb@cse.unsw.edu.au> Instead of using ("md%d", mdidx(mddev)), we now use ("%s", mdname(mddev)) where mdname is the disk_name field in the associated gendisk structure. This allows future flexability in naming.
2004-01-20[PATCH] md: Remove the 'disks' array from md which holds the gendisk structures.Andrew Morton1-0/+2
From: NeilBrown <neilb@cse.unsw.edu.au> Move the pointers into mddev. The reduces dependance on MAX_MD_DEVS.
2004-01-20[PATCH] md: Fix typo in commentAndrew Morton1-1/+1
From: NeilBrown <neilb@cse.unsw.edu.au> Thanks dann frazier <dannf@hp.com>
2004-01-20[PATCH] RAID-6Andrew Morton1-1/+4
From: "H. Peter Anvin" <hpa@zytor.com> RAID6 implementation. See Kconfig help for usage details. The next release of `mdadm' has raid6 userspace support.
2003-09-22[PATCH] 32-bit dev_t: internal useAlexander Viro1-5/+1
Starting the conversion: * internal dev_t made 32bit. * new helpers - new_encode_dev(), new_decode_dev(), huge_encode_dev(), huge_decode_dev(), new_valid_dev(). They do encoding/decoding of 32bit and 64bit values; for now huge_... are aliases for new_... and new_valid_dev() is always true. We do 12:20 for 32bit; representation is compatible with 16bit one - we have major in bits 19--8 and minor in 31--20,7--0. That's what the userland sees; internally we have (major << 20)|minor, of course. * MKDEV(), MAJOR() and MINOR() updated. * several places used to handle Missed'em'V dev_t (14:18 split) manually; that stuff had been taken into common helpers. Now we can start replacing old_... with new_... and huge_..., depending on the width available. MKDEV() callers should (for now) make sure that major and minor are within 12:20. That's what the next chunk will do.
2003-08-06[PATCH] Proper block queue reference countingJens Axboe1-1/+1
To be able to properly be able to keep references to block queues, we make blk_init_queue() return the queue that it initialized, and let it be independently allocated and then cleaned up on the last reference. I have grepped high and low, and there really shouldn't be any broken uses of blk_init_queue() in the kernel drivers left. The added bonus being blk_init_queue() error checking is explicit now, most of the drivers were broken in this regard (even IDE/SCSI). No drivers have embedded request queue structures. Drivers that don't use blk_init_queue() but blk_queue_make_request(), should allocate the queue with blk_alloc_queue(gfp_mask). I've converted all of them to do that, too. They can call blk_cleanup_queue() now too, using the define blk_put_queue() is probably cleaner though.
2003-05-26[PATCH] md: Replace bdev_partition_name with calls to bdevnameNeil Brown1-15/+0
2003-05-26[PATCH] md: Remove dependance on MD_SB_DISKS in linear personalityNeil Brown1-1/+1
Linear uses one array sized by MD_SB_DISKS inside a structure. We move it to the end of the structure, declare it as size 0, and arrange for approprate extra space to be allocated on structure allocation.
2003-05-26[PATCH] md: Remove MD_SB_DISKS limits from raid1Neil Brown1-5/+6
raid1 uses MD_SB_DISKS to size two data structures, but the new version-1 superblock allows for more than this number of disks (and most actual arrays use many fewer). This patch sizes to two arrays dynamically. One becomes a separate kmalloced array. The other is moved to the end of the containing structure and appropriate extra space is allocated. Also, change r1buf_pool_alloc (which allocates buffers for a mempool for doing re-sync) to not get r1bio structures from the r1bio pool (which could exhaust the pool) but instead to allocate them separately.
2003-05-26[PATCH] md: Remove dependancy on MD_SB_DISKS from raid0Neil Brown1-2/+3
Arrays with type-1 superblock can have more than MD_SB_DISKS, so we remove the dependancy on that number from raid0, replacing several fixed sized arrays with one dynamically allocated array.
2003-05-26[PATCH] md: Remove dependancy on MD_SB_DISKS from raid5Neil Brown1-1/+1
One embeded array gets moved to end of structure and sized dynamically.
2003-05-26[PATCH] md: Remove dependancy on MD_SB_DISKS from multipathNeil Brown1-1/+1
Multipath has a dependancy on MD_SB_DISKS which is no longer authoritative. We change it to use a separately allocated array.
2003-05-26[PATCH] md: Improve raid0 mapping code to simplify and reduce mem usage.Neil Brown1-9/+5
To cope with a raid0 array with differing sized devices, raid0 divides an array into "strip zones". The first zone covers the start of all devices, upto an offset equal to the size of the smallest device. The second strip zone covers the remaining devices upto the size of the next smallest size, etc. In order to determing which strip zone a given address is in, the array is logically divided into slices the size of the smallest zone, and a 'hash' table is created listing the first and, if relevant, second zone in each slice. As the smallest slice can be very small (imagine an array with a 76G drive and a 75.5G drive) this hash table can be rather large. With this patch, we limit the size of the hash table to one page, at the possible cost of making several probes into the zone list before we find the correct zone. We also cope with the possibility that a zone could be larger than a 32bit sector address would allow.
2003-05-26[PATCH] md: Use new single page bio splitting for raid0 and linearNeil Brown3-2/+1
Sometimes raid0 and linear are required to take a single page bio that spans two devices. We use bio_split to split such a bio into two. The the same time, bio.h is included by linux/raid/md.h so we don't included it elsewhere anymore. We also modify the mergeable_bvec functions to allow a bvec that doesn't fit if it is the first bvec to be added to the bio, and be careful never to return a negative length from a bvec_mergable funciton.
2003-05-07[PATCH] remove partition_name()Andrew Morton1-2/+13
From: Christoph Hellwig <hch@lst.de> partition_name() is a variant of __bdevname() that caches results and returns a pointrer to kmalloc()ed data instead of printing into a buffer. Due to it's caching it gets utterly confused when the name for a dev_t changes (can happen easily now with device mapper and probably in the future with dynamic dev_t users). It's only used by the raid code and most calls are through a wrapper, bdev_partition_name() which takes a struct block_device * that maybe be NULL. The patch below changes the bdev_partition_name() to call bdevname() if possible and the other calls where we really have nothing more than a dev_t to __bdevname. Btw, it would be nice if someone who knows the md code a bit better than me could remove bdev_partition_name() in favour of direct calls to bdevname() where possible - that would also get rid of the returns pointer to string on stack issue that this patch can't fix yet.
2003-04-03[PATCH] md: Cleanups for md to move device size calculations into personalitiesNeil Brown2-2/+1
2003-03-26[PATCH] md: Convert md personalities to new module interfaceNeil Brown1-0/+1
Thanks to Angus Sawyer <angus.sawyer@dsl.pipex.com> and Daniel McNeil <daniel@osdl.org>
2003-03-14[PATCH] md: Add new superblock format for mdNeil Brown1-0/+53
Superblock format '1' resolves a number of issues with superblock format '0'. It is more dense and can support many more sub-devices. It does not contains un-needed redundancy. It adds a few new useful fields
2003-03-14[PATCH] md: Allow components of MD raid array to have data start at offset ↵Neil Brown1-0/+1
from start of device. Normally the data stored on a component of a RAID array is stored from the start of the device. This patch allows a per-device data_offset so the data can start elsewhere. This will allow RAID arrays where the metadata is at the head of the device rather than the tail.
2003-03-14[PATCH] md: Fulltime delayed 'safe_mode' for mdNeil Brown1-1/+3
From: Angus Sawyer <angus.sawyer@dsl.pipex.com> If there are no writes for 20 milliseconds, write out superblock to mark array as clean. Write out superblock with dirty flag before allowing any further write to succeed. If an md thread gets signaled with SIGKILL, reduce the delay to 0. Also tidy up some printk's and make sure writing the superblock isn't noisy.
2003-03-14[PATCH] md: Remove md_recoveryd thread for mdNeil Brown5-10/+9
The md_recoveryd thread is responsible for initiating and cleaning up resync threads. This job can be equally well done by the per-array threads for those arrays which might need it. So the mdrecoveryd thread is gone and the core code that it ran is now run by raid5d, raid1d or multipathd. We add an MD_RECOVERY_NEEDED flag so those daemon don't have to bother trying to lock the md array unless it is likely that something needs to be done. Also modify the names of all threads to have the number of md device.
2003-03-14[PATCH] md: Tidy up recovery_running flags in mdNeil Brown1-8/+14
Md uses ->recovery_running and ->recovery_err to keep track of the status or recovery. This is rather ad hoc and race prone. This patch changes it to ->recovery which has bit flags for various states.
2003-03-14[PATCH] md: Convert /proc/mdstat to use seq_fileNeil Brown2-1/+2
From: Angus Sawyer <angus.sawyer@dsl.pipex.com> Mainly straightforward convert of sprintf -> seq_printf. seq_start and seq_next modelled on /proc/partitions. locking/ref counting as for ITERATE_MDDEV. pos == 0 -> header pos == n -> nth mddev pos == 0x10000 -> tail
2003-02-17[PATCH] Provide a 'safe-mode' for soft raid.Neil Brown2-1/+7
When a raid1 or raid5 array is in 'safe-mode', then the array is marked clean whenever there are no outstanding write requests, and is marked dirty again before allowing any write request to proceed. This means than an unclean shutdown while no write activity is happening will NOT cause a resync to be required. However it does mean extra updates to the superblock. Currently safe-mode is turned on by sending SIGKILL to the raid thread as would happen at a normal shutdown. This should mean that the reboot notifier is no longer needed. After looking more at performance issues I may make safemode be on all the time. I will almost certainly make it on when RAID5 is degraded as an unclean shutdown of a degraded RAID5 means data loss. This code was provided by Angus Sawyer <angus.sawyer@dsl.pipex.com>
2003-02-17[PATCH] Add name of md device to name of thread managing that device.Neil Brown3-0/+3
This allows the thread to easily identified and signalled. The point of signalling will appear in the next patch.
2003-01-05[PATCH] md: Record location of incomplete resync at shutdown and restart ↵Neil Brown2-2/+10
from there. Add a new field to the md superblock, in an used area, to record where resync was up-to on a clean shutdown while resync is active. Restart from this point. The extra field is verified by having a second copy of the event counter. If the second event counter is wrong, we ignore the extra field. This patch thanks to Angus Sawyer <angus.sawyer@dsl.pipex.com>
2002-10-30[PATCH] md: factor out MD superblock handling codeNeil Brown1-1/+3
Define an interface for interpreting and updating superblocks so we can more easily define new formats. With this patch, (almost) all superblock layout information is locating in a small set of routines dedicated to superblock handling. This will allow us to provide a similar set for a different format. The two exceptions are: 1/ autostart_array where the devices listed in the superblock are searched for. 2/ raid5 'knows' the maximum number of devices for compute_parity. These will be addressed in a later patch.
2002-10-28[PATCH] removed a bunch of gratuitous kdev_t usesAlexander Viro1-5/+0
2002-10-08[PATCH] 64-bit sector_t - driver changesAndrew Morton3-7/+7
From Peter Chubb Compaq Smart array sector_t cleanup: prepare for possible 64-bit sector_t Clean up loop device to allow huge backing files. MD transition to 64-bit sector_t. - Hold sizes and offsets as sector_t not int; - use 64-bit arithmetic if necessary to map block-in-raid to zone and block-in-zone
2002-09-21[PATCH] removal of bogus exportsAlexander Viro2-12/+1
partition_name() moved from md.c to partitions/check.c; disk_name() is not exported anymore; partition_name() takes dev_t instead of kdev_t.