<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/block, branch v2.6.28</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.28</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.28'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2008-12-05T22:49:18Z</updated>
<entry>
<title>Enforce a minimum SG_IO timeout</title>
<updated>2008-12-05T22:49:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-12-05T22:49:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f2f1fa78a155524b849edf359e42a3001ea652c0'/>
<id>urn:sha1:f2f1fa78a155524b849edf359e42a3001ea652c0</id>
<content type='text'>
There's no point in having too short SG_IO timeouts, since if the
command does end up timing out, we'll end up through the reset sequence
that is several seconds long in order to abort the command that timed
out.

As a result, shorter timeouts than a few seconds simply do not make
sense, as the recovery would be longer than the timeout itself.

Add a BLK_MIN_SG_TIMEOUT to match the existign BLK_DEFAULT_SG_TIMEOUT.

Suggested-by: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jeff Garzik &lt;jeff@garzik.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>[PATCH 1/2] kill FMODE_NDELAY_NOW</title>
<updated>2008-12-04T09:22:57Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2008-11-05T13:58:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd4ce1acd0f8558033b1a6968001552bd7671e6d'/>
<id>urn:sha1:fd4ce1acd0f8558033b1a6968001552bd7671e6d</id>
<content type='text'>
Update FMODE_NDELAY before each ioctl call so that we can kill the
magic FMODE_NDELAY_NOW.  It would be even better to do this directly
in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
not just block special files.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>[PATCH] Fix block dev compat ioctl handling</title>
<updated>2008-12-04T09:22:55Z</updated>
<author>
<name>Andreas Schwab</name>
<email>schwab@suse.de</email>
</author>
<published>2008-10-31T21:39:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1c925604e1038c7c65b91a92d14dc972b3a70a97'/>
<id>urn:sha1:1c925604e1038c7c65b91a92d14dc972b3a70a97</id>
<content type='text'>
Commit 33c2dca4957bd0da3e1af7b96d0758d97e708ef6 (trim file propagation
in block/compat_ioctl.c) removed the handling of some ioctls from
compat_blkdev_driver_ioctl.  That caused them to be rejected as unknown
by the compat layer.

Signed-off-by: Andreas Schwab &lt;schwab@suse.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>block: fix setting of max_segment_size and seg_boundary mask</title>
<updated>2008-12-03T11:55:55Z</updated>
<author>
<name>Milan Broz</name>
<email>mbroz@redhat.com</email>
</author>
<published>2008-12-03T11:55:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0e435ac26e3f951d83338ed3d4ab7dc0fe0055bc'/>
<id>urn:sha1:0e435ac26e3f951d83338ed3d4ab7dc0fe0055bc</id>
<content type='text'>
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
devices.

When stacking devices (LVM over MD over SCSI) some of the request queue
parameters are not set up correctly in some cases by default, namely
max_segment_size and and seg_boundary mask.

If you create MD device over SCSI, these attributes are zeroed.

Problem become when there is over this mapping next device-mapper mapping
- queue attributes are set in DM this way:

request_queue   max_segment_size  seg_boundary_mask
SCSI                65536             0xffffffff
MD RAID1                0                      0
LVM                 65536                 -1 (64bit)

Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
physical segments according to these parameters.

During the generic_make_request() is segment cout recalculated and can
increase bio-&gt;bi_phys_segments count over the allowed limit.  (After
bio_clone() in stack operation.)

Thi is specially problem in CCISS driver, where it produce OOPS here

    BUG_ON(creq-&gt;nr_phys_segments &gt; MAXSGENTRIES);

(MAXSEGENTRIES is 31 by default.)

Sometimes even this command is enough to cause oops:

  dd iflag=direct if=/dev/&lt;vg&gt;/&lt;lv&gt; of=/dev/null bs=128000 count=10

This command generates bios with 250 sectors, allocated in 32 4k-pages
(last page uses only 1024 bytes).

For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
unfortunatelly on lower layer it is recalculated to 32 segments and this
violates CCISS restriction and triggers BUG_ON().

The patch tries to fix it by:

 * initializing attributes above in queue request constructor
   blk_queue_make_request()

 * make sure that blk_queue_stack_limits() inherits setting

 (DM uses its own function to set the limits because it
 blk_queue_stack_limits() was introduced later.  It should probably switch
 to use generic stack limit function too.)

 * sets the default seg_boundary value in one place (blkdev.h)

 * use this mask as default in DM (instead of -1, which differs in 64bit)

Bugs related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=471639
http://bugzilla.kernel.org/show_bug.cgi?id=8672

Signed-off-by: Milan Broz &lt;mbroz@redhat.com&gt;
Reviewed-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: FUJITA Tomonori &lt;fujita.tomonori@lab.ntt.co.jp&gt;
Cc: Tejun Heo &lt;htejun@gmail.com&gt;
Cc: Mike Miller &lt;mike.miller@hp.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: internal dequeue shouldn't start timer</title>
<updated>2008-12-03T11:41:26Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2008-12-03T11:41:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53a08807c01989c6847bb135d8d43f61c5dfdda5'/>
<id>urn:sha1:53a08807c01989c6847bb135d8d43f61c5dfdda5</id>
<content type='text'>
blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
both start the timeout timer.  Barrier code dequeues the original
barrier request but doesn't passes the request itself to lower level
driver, only broken down proxy requests; however, as the original
barrier code goes through the same dequeue path and timeout timer is
started on it.  If barrier sequence takes long enough, this timer
expires but the low level driver has no idea about this request and
oops follows.

Timeout timer shouldn't have been started on the original barrier
request as it never goes through actual IO.  This patch unexports
elv_dequeue_request(), which has no external user anyway, and makes it
operate on elevator proper w/o adding the timer and make
blkdev_dequeue_request() call elv_dequeue_request() and add timer.
Internal users which don't pass the request to driver - barrier code
and end_that_request_last() - are converted to use
elv_dequeue_request().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mike Anderson &lt;andmike@linux.vnet.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: set disk-&gt;node_id before it's being used</title>
<updated>2008-12-03T11:41:20Z</updated>
<author>
<name>Cheng Renquan</name>
<email>crquan@gmail.com</email>
</author>
<published>2008-11-20T07:37:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bf91db18ac2852a3ff39fe25ff56c5557c0fff78'/>
<id>urn:sha1:bf91db18ac2852a3ff39fe25ff56c5557c0fff78</id>
<content type='text'>
disk-&gt;node_id will be refered in allocating in disk_expand_part_tbl, so we
should set it before disk-&gt;node_id is refered.

Signed-off-by: Cheng Renquan &lt;crquan@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>When block layer fails to map iov, it calls bio_unmap_user to undo</title>
<updated>2008-12-03T11:41:20Z</updated>
<author>
<name>Petr Vandrovec</name>
<email>petr@vandrovec.name</email>
</author>
<published>2008-11-19T10:12:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53cc0b2948bcb8a084982e6c1f9bd7b337e0df38'/>
<id>urn:sha1:53cc0b2948bcb8a084982e6c1f9bd7b337e0df38</id>
<content type='text'>
mapping.  Which is good if pages were mapped - but if they were provided
by someone else and just copied then bad things happen - pages are
released once here, and once by caller, leading to user triggerable BUG
at include/linux/mm.h:246.

Signed-off-by: Petr Vandrovec &lt;petr@vandrovec.name&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: hold extra reference to bio in blk_rq_map_user_iov()</title>
<updated>2008-11-18T14:08:56Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2008-11-18T14:07:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c26156b2534c75bb3cdedf76f6ad1340971cf5bd'/>
<id>urn:sha1:c26156b2534c75bb3cdedf76f6ad1340971cf5bd</id>
<content type='text'>
If the size passed in is OK but we end up mapping too many segments,
we call the unmap path directly like from IO completion. But from IO
completion we have an extra reference to the bio, so this error case
goes OOPS when it attempts to free and already free bio.

Fix it by getting an extra reference to the bio before calling the
unmap failure case.

Reported-by: Petr Vandrovec &lt;vandrove@vc.cvut.cz&gt;

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash</title>
<updated>2008-11-18T14:08:56Z</updated>
<author>
<name>Zhang, Yanmin</name>
<email>yanmin_zhang@linux.intel.com</email>
</author>
<published>2008-11-14T07:26:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=561ec68e4de7947167937c49c451728e6b19e63b'/>
<id>urn:sha1:561ec68e4de7947167937c49c451728e6b19e63b</id>
<content type='text'>
We run into system boot failure with kernel 2.6.28-rc. We found it on a
couple of machines, including T61 notebook, nehalem machine, and another
HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
With kernel prior to 2.6.28-rc, system boot doesn't fail.

I debug it and locate the root cause. Pls. see
http://bugzilla.kernel.org/show_bug.cgi?id=11899
https://bugzilla.redhat.com/show_bug.cgi?id=471517

As a matter of fact, there are 2 bugs.

1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
and fails once. nash has a bug. Some of its functions misuse return
value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
0 means nash gets an uevent, but the uevent isn't block-related (for
exmaple, usb). If by coincidence, kernel tells nash that uevents are
available, but kernel also set timeout, nash might stops collecting
other uevents in queue if current uevent isn't block-related.  I work
out a patch for nash to fix it.
http://bugzilla.kernel.org/attachment.cgi?id=18858

2) root=LABEL=/, system always can't boot. initrd init reports
switchroot fails. Here is an executation branch of nash when booting:
    (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
    (2) nash query /proc/devices with the major number; It found line
	"8 sd";
    (3) nash use 'sd' to search its own probe table to find device (DISK)
	type for the device and add it to its own list;
    (4) Later on, it probes all devices in its list to get filesystem
	labels; scsi register "8 sd" always.

When major is 259, nash fails to find the device(DISK) type. I enables
CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
for device /dev/sda1, which causes nash to fail to find device (DISK)
type.

To fixing issue 2), I create a patch for nash and another patch for
kernel.

http://bugzilla.kernel.org/attachment.cgi?id=18859
http://bugzilla.kernel.org/attachment.cgi?id=18837

Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
block device in proc/devices.

With 2 patches on nash and 1 patch on kernel, I boot my machines for
dozens of times without failure.

Signed-off-by Zhang Yanmin &lt;yanmin.zhang@linux.intel.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: make add_partition() return pointer to hd_struct</title>
<updated>2008-11-18T14:08:56Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2008-11-10T06:29:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ba32929a91fe2c0628f5be62d1597b379c8d3062'/>
<id>urn:sha1:ba32929a91fe2c0628f5be62d1597b379c8d3062</id>
<content type='text'>
Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure.  This change will be used to fix md
autodetection bug.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
</feed>
