<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/block, branch v4.20</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.20</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-12-11T23:19:38Z</updated>
<entry>
<title>block: Fix null_blk_zoned creation failure with small number of zones</title>
<updated>2018-12-11T23:19:38Z</updated>
<author>
<name>Shin'ichiro Kawasaki</name>
<email>shinichiro.kawasaki@wdc.com</email>
</author>
<published>2018-12-11T12:08:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=927b6b2d69b4cc900fa50d7e46d8f1fa91c91b3a'/>
<id>urn:sha1:927b6b2d69b4cc900fa50d7e46d8f1fa91c91b3a</id>
<content type='text'>
null_blk_zoned creation fails if the number of zones specified is equal to or is
smaller than 64 due to a memory allocation failure in blk_alloc_zones(). With
such a small number of zones, the required memory size for all zones descriptors
fits in a single page, and the page order for alloc_pages_node() is zero. Allow
this value in blk_alloc_zones() for the allocation to succeed.

Fixes: bf5054569653 "block: Introduce blk_revalidate_disk_zones()"
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Shin'ichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block/bio: Do not zero user pages</title>
<updated>2018-12-10T20:37:20Z</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2018-12-10T15:44:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f55adad601c6a97c8c9628195453e0fb23b4a0ae'/>
<id>urn:sha1:f55adad601c6a97c8c9628195453e0fb23b4a0ae</id>
<content type='text'>
We don't need to zero fill the bio if not using kernel allocated pages.

Fixes: f3587d76da05 ("block: Clear kernel memory before copying to user") # v4.20-rc2
Reported-by: Todd Aiken &lt;taiken@mvtech.ca&gt;
Cc: Laurence Oberman &lt;loberman@redhat.com&gt;
Cc: stable@vger.kernel.org
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Tested-by: Laurence Oberman &lt;loberman@redhat.com&gt;
Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: punt failed direct issue to dispatch list</title>
<updated>2018-12-07T15:16:11Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-12-07T05:17:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c616cbee97aed4bc6178f148a7240206dcdb85a6'/>
<id>urn:sha1:c616cbee97aed4bc6178f148a7240206dcdb85a6</id>
<content type='text'>
After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.

Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
-&gt;queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.

This basically reverts ffe81d45322c and is based on a patch from Ming,
but with the list insert case covered as well.

Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
Cc: stable@vger.kernel.org
Suggested-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reported-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Tested-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block, bfq: fix decrement of num_active_groups</title>
<updated>2018-12-07T14:40:07Z</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@linaro.org</email>
</author>
<published>2018-12-06T18:18:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ba7aeae5539c7a7cccc4cf07a2bc61281a93c50e'/>
<id>urn:sha1:ba7aeae5539c7a7cccc4cf07a2bc61281a93c50e</id>
<content type='text'>
Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).

Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.

A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.

To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.

This simplified scheme provides a fix to the unbalanced decrements
introduced by 2d29c9f89fcd. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.

Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Reported-by: Steven Barrett &lt;steven@liquorix.net&gt;
Tested-by: Steven Barrett &lt;steven@liquorix.net&gt;
Tested-by: Lucjan Lucjanov &lt;lucjan.lucjanov@gmail.com&gt;
Reviewed-by: Federico Motta &lt;federico@willer.it&gt;
Signed-off-by: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix corruption with direct issue</title>
<updated>2018-12-05T03:06:48Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-12-05T03:06:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ffe81d45322cc3cb140f0db080a4727ea284661e'/>
<id>urn:sha1:ffe81d45322cc3cb140f0db080a4727ea284661e</id>
<content type='text'>
If we attempt a direct issue to a SCSI device, and it returns BUSY, then
we queue the request up normally. However, the SCSI layer may have
already setup SG tables etc for this particular command. If we later
merge with this request, then the old tables are no longer valid. Once
we issue the IO, we only read/write the original part of the request,
not the new state of it.

This causes data corruption, and is most often noticed with the file
system complaining about the just read data being invalid:

[  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)

because most of it is garbage...

This doesn't happen from the normal issue path, as we will simply defer
the request to the hardware queue dispatch list if we fail. Once it's on
the dispatch list, we never merge with it.

Fix this from the direct issue path by flagging the request as
REQ_NOMERGE so we don't change the size of it before issue.

See also:
  https://bugzilla.kernel.org/show_bug.cgi?id=201685

Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: fix single range discard merge</title>
<updated>2018-11-30T17:07:57Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2018-11-30T16:38:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a5cf35cd6c56b2924bce103413ad3381bdc31fa'/>
<id>urn:sha1:2a5cf35cd6c56b2924bce103413ad3381bdc31fa</id>
<content type='text'>
There are actually two kinds of discard merge:

- one is the normal discard merge, just like normal read/write request,
and call it single-range discard

- another is the multi-range discard, queue_max_discard_segments(rq-&gt;q) &gt; 1

For the former case, queue_max_discard_segments(rq-&gt;q) is 1, and we
should handle this kind of discard merge like the normal read/write
request.

This patch fixes the following kernel panic issue[1], which is caused by
not removing the single-range discard request from elevator queue.

Guangwu has one raid discard test case, in which this issue is a bit
easier to trigger, and I verified that this patch can fix the kernel
panic issue in Guangwu's test case.

[1] kernel panic log from Jens's report

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
 PGD 0 P4D 0.
 Oops: 0000 [#1] SMP PTI
 CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
4.20.0-rc3-00649-ge64d9a554a91-dirty #14  Hardware name: Wiwynn \
Leopard-Orv2/Leopard-DDR BW, BIOS LBM08   03/03/2017       Workqueue: kblockd \
blk_mq_run_work_fn                                            RIP: \
0010:blk_mq_get_driver_tag+0x81/0x120                                       Code: 24 \
10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 &lt;48&gt; 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
f6 87 b0 00 00 00 02  RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246                     \
  RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
 RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
 R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
 R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
 FS:  0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  blk_mq_dispatch_rq_list+0xec/0x480
  ? elv_rb_del+0x11/0x30
  blk_mq_do_dispatch_sched+0x6e/0xf0
  blk_mq_sched_dispatch_requests+0xfa/0x170
  __blk_mq_run_hw_queue+0x5f/0xe0
  process_one_work+0x154/0x350
  worker_thread+0x46/0x3c0
  kthread+0xf5/0x130
  ? process_one_work+0x350/0x350
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x1f/0x30
 Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
nvme_core fuse sg loop efivarfs autofs4  CR2: 0000000000000148                        \

 ---[ end trace 340a1fb996df1b9b ]---
 RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
 Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 &lt;48&gt; 8b 87 48 01 00 00 8b 40 04 39 43 \
20 72 37 f6 87 b0 00 00 00 02

Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached")
Reported-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Guangwu Zhang &lt;guazhang@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Jianchao Wang &lt;jianchao.w.wang@oracle.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>SCSI: fix queue cleanup race before queue initialization is done</title>
<updated>2018-11-14T15:19:10Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2018-11-14T08:25:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a'/>
<id>urn:sha1:8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a</id>
<content type='text'>
c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
already fixed this race, however the implied synchronize_rcu()
in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
performance regression.

Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
tried to quiesce queue for avoiding unnecessary synchronize_rcu()
only when queue initialization is done, because it is usual to see
lots of inexistent LUNs which need to be probed.

However, turns out it isn't safe to quiesce queue only when queue
initialization is done. Because when one SCSI command is completed,
the user of sending command can be waken up immediately, then the
scsi device may be removed, meantime the run queue in scsi_end_request()
is still in-progress, so kernel panic can be caused.

In Red Hat QE lab, there are several reports about this kind of kernel
panic triggered during kernel booting.

This patch tries to address the issue by grabing one queue usage
counter during freeing one request and the following run queue.

Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
Cc: Andrew Jones &lt;drjones@redhat.com&gt;
Cc: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: linux-scsi@vger.kernel.org
Cc: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: James E.J. Bottomley &lt;jejb@linux.vnet.ibm.com&gt;
Cc: stable &lt;stable@vger.kernel.org&gt;
Cc: jianchao.wang &lt;jianchao.w.wang@oracle.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: fix 32 bit overflow in __blkdev_issue_discard()</title>
<updated>2018-11-14T15:17:18Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2018-11-14T15:17:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4800bf7bc8c725e955fcbc6191cc872f43f506d3'/>
<id>urn:sha1:4800bf7bc8c725e955fcbc6191cc872f43f506d3</id>
<content type='text'>
A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
fall into an endless loop in the discard code. The test is creating
a device that is exactly 2^32 sectors in size to test mkfs boundary
conditions around the 32 bit sector overflow region.

mkfs issues a discard for the entire device size by default, and
hence this throws a sector count of 2^32 into
blkdev_issue_discard(). It takes the number of sectors to discard as
a sector_t - a 64 bit value.

The commit ba5d73851e71 ("block: cleanup __blkdev_issue_discard")
takes this sector count and casts it to a 32 bit value before
comapring it against the maximum allowed discard size the device
has. This truncates away the upper 32 bits, and so if the lower 32
bits of the sector count is zero, it starts issuing discards of
length 0. This causes the code to fall into an endless loop, issuing
a zero length discards over and over again on the same sector.

Fixes: ba5d73851e71 ("block: cleanup __blkdev_issue_discard")
Tested-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;

Killed pointless WARN_ON().

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: copy ioprio in __bio_clone_fast() and bounce</title>
<updated>2018-11-12T17:35:25Z</updated>
<author>
<name>Hannes Reinecke</name>
<email>hare@suse.com</email>
</author>
<published>2018-11-12T17:35:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ca474b73896bf6e0c1eb8787eb217b0f80221610'/>
<id>urn:sha1:ca474b73896bf6e0c1eb8787eb217b0f80221610</id>
<content type='text'>
We need to copy the io priority, too; otherwise the clone will run
with a different priority than the original one.

Fixes: 43b62ce3ff0a ("block: move bio io prio to a new field")
Signed-off-by: Hannes Reinecke &lt;hare@suse.com&gt;
Signed-off-by: Jean Delvare &lt;jdelvare@suse.de&gt;

Fixed up subject, and ordered stores.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus-20181109' of git://git.kernel.dk/linux-block</title>
<updated>2018-11-09T22:31:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-11-09T22:31:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc5db21865507d0d9f706bd97c4a85315d60d0c5'/>
<id>urn:sha1:dc5db21865507d0d9f706bd97c4a85315d60d0c5</id>
<content type='text'>
Pull block layer fixes from Jens Axboe:

 - Two fixes for an ubd regression, one for missing locking, and one for
   a missing initialization of a field. The latter was an old latent
   bug, but it's now visible and triggers (Me, Anton Ivanov)

 - Set of NVMe fixes via Christoph, but applied manually due to a git
   tree mixup (Christoph, Sagi)

 - Fix for a discard split regression, in three patches (Ming)

 - Update libata git trees (Geert)

 - SPDX identifier for sata_rcar (Kuninori Morimoto)

 - Virtual boundary merge fix (Johannes)

 - Preemptively clear memory we are going to pass to userspace, in case
   the driver does a short read (Keith)

* tag 'for-linus-20181109' of git://git.kernel.dk/linux-block:
  block: make sure writesame bio is aligned with logical block size
  block: cleanup __blkdev_issue_discard()
  block: make sure discard bio is aligned with logical block size
  Revert "nvmet-rdma: use a private workqueue for delete"
  nvme: make sure ns head inherits underlying device limits
  nvmet: don't try to add ns to p2p map unless it actually uses it
  sata_rcar: convert to SPDX identifiers
  ubd: fix missing initialization of io_req
  block: Clear kernel memory before copying to user
  MAINTAINERS: Fix remaining pointers to obsolete libata.git
  ubd: fix missing lock around request issue
  block: respect virtual boundary mask in bvecs
</content>
</entry>
</feed>
