<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/block, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-06-21T08:14:28Z</updated>
<entry>
<title>Revert "virtio-blk: support completion batching for the IRQ path"</title>
<updated>2023-06-21T08:14:28Z</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2023-06-08T21:42:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=afd384f0dbea2229fd11159efb86a5b41051c4a9'/>
<id>urn:sha1:afd384f0dbea2229fd11159efb86a5b41051c4a9</id>
<content type='text'>
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.

This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp

i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.

After a few, normally &lt;10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.

'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252       0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252      16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...

Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0                         vq=0           1           2           3
[a]request_count     -   839416590   813148916   105586179    84988123
[b]completion1_count -   839416590   813148916   105586179    84988123
[c]completion2_count -           0           0           0           0

PF= 1                         vq=0           1           2           3
[a]request_count     -   823335887   812516140   104582672    75856549
[b]completion1_count -   823335887   812516140   104582672    75856549
[c]completion2_count -           0           0           0           0

i.e. the issue is after the virtio-blk driver.

This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
                 if (likely(!blk_should_fake_timeout(req-&gt;q))) {
                          blk_mq_complete_request(req);
                 }

kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
                 if (likely(!blk_should_fake_timeout(req-&gt;q))) {
                          if (!blk_mq_complete_request_remote(req)) {
                                  if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                           virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                   }
                          }
                 }

If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
                 if (likely(!blk_should_fake_timeout(req-&gt;q))) {
                          if (!blk_mq_complete_request_remote(req)) {
                                   virtblk_request_done(req); //force this here...
                                  if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                           virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                   }
                          }
                 }

Perhaps you might like to fix/test/revert this change...
Martin

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim &lt;suwan.kim027@gmail.com&gt;
Tested-by: edliaw@google.com
Reported-by: "Roberts, Martin" &lt;martin.roberts@intel.com&gt;
Message-Id: &lt;336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux</title>
<updated>2023-06-09T21:21:20Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-06-09T21:21:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=64569520920a3ca5d456ddd9f4f95fc6ea9b8b45'/>
<id>urn:sha1:64569520920a3ca5d456ddd9f4f95fc6ea9b8b45</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - Fix an issue with the hardware queue nr_active, causing it to become
   imbalanced (Tian)

 - Fix an issue with null_blk not releasing pages if configured as
   memory backed (Nitesh)

 - Fix a locking issue in dasd (Jan)

* tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux:
  s390/dasd: Use correct lock while counting channel queue length
  null_blk: Fix: memory release when memory_backed=1
  blk-mq: fix blk_mq_hw_ctx active request accounting
</content>
</entry>
<entry>
<title>rbd: get snapshot context after exclusive lock is ensured to be held</title>
<updated>2023-06-06T07:54:27Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-06-05T14:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=870611e4877eff1e8413c3fb92a585e45d5291f6'/>
<id>urn:sha1:870611e4877eff1e8413c3fb92a585e45d5291f6</id>
<content type='text'>
Move capturing the snapshot context into the image request state
machine, after exclusive lock is ensured to be held for the duration of
dealing with the image request.  This is needed to ensure correctness
of fast-diff states (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object
deltas computed based off of them.  Otherwise the object map that is
forked for the snapshot isn't guaranteed to accurately reflect the
contents of the snapshot when the snapshot is taken under I/O.  This
breaks differential backup and snapshot-based mirroring use cases with
fast-diff enabled: since some object deltas may be incomplete, the
destination image may get corrupted.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61472
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
</entry>
<entry>
<title>rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting</title>
<updated>2023-06-06T07:53:59Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-06-05T14:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=09fe05c57b5aaf23e2c35036c98ea9f282b19a77'/>
<id>urn:sha1:09fe05c57b5aaf23e2c35036c98ea9f282b19a77</id>
<content type='text'>
Move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting into the object request
state machine to allow for the snapshot context to be captured in the
image request state machine rather than in rbd_queue_workfn().

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
</entry>
<entry>
<title>null_blk: Fix: memory release when memory_backed=1</title>
<updated>2023-06-05T22:15:35Z</updated>
<author>
<name>Nitesh Shetty</name>
<email>nj.shetty@samsung.com</email>
</author>
<published>2023-06-05T06:23:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8cfb98196cceec35416041c6b91212d2b99392e4'/>
<id>urn:sha1:8cfb98196cceec35416041c6b91212d2b99392e4</id>
<content type='text'>
Memory/pages are not freed, when unloading nullblk driver.

Steps to reproduce issue
  1.free -h
        total        used        free      shared  buff/cache   available
Mem:    7.8Gi       260Mi       7.1Gi       3.0Mi       395Mi       7.3Gi
Swap:      0B          0B          0B
  2.modprobe null_blk memory_backed=1
  3.dd if=/dev/urandom of=/dev/nullb0 oflag=direct bs=1M count=1000
  4.modprobe -r null_blk
  5.free -h
        total        used        free      shared  buff/cache   available
Mem:    7.8Gi       1.2Gi       6.1Gi       3.0Mi       398Mi       6.3Gi
Swap:      0B          0B          0B

Signed-off-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
Link: https://lore.kernel.org/r/20230605062354.24785-1-nj.shetty@samsung.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip</title>
<updated>2023-05-27T16:42:56Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-05-27T16:42:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4e893b5aa4ac2c8a56a40d18fe87e9d2295e5dcf'/>
<id>urn:sha1:4e893b5aa4ac2c8a56a40d18fe87e9d2295e5dcf</id>
<content type='text'>
Pull xen fixes from Juergen Gross:

 - a double free fix in the Xen pvcalls backend driver

 - a fix for a regression causing the MSI related sysfs entries to not
   being created in Xen PV guests

 - a fix in the Xen blkfront driver for handling insane input data
   better

* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pci/xen: populate MSI sysfs entries
  xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
  xen/blkfront: Only check REQ_FUA for writes
</content>
</entry>
<entry>
<title>xen/blkfront: Only check REQ_FUA for writes</title>
<updated>2023-05-24T14:35:39Z</updated>
<author>
<name>Ross Lagerwall</name>
<email>ross.lagerwall@citrix.com</email>
</author>
<published>2023-04-26T16:40:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b6ebaa8100090092aa602530d7e8316816d0c98d'/>
<id>urn:sha1:b6ebaa8100090092aa602530d7e8316816d0c98d</id>
<content type='text'>
The existing code silently converts read operations with the
REQ_FUA bit set into write-barrier operations. This results in data
loss as the backend scribbles zeroes over the data instead of returning
it.

While the REQ_FUA bit doesn't make sense on a read operation, at least
one well-known out-of-tree kernel module does set it and since it
results in data loss, let's be safe here and only look at REQ_FUA for
writes.

Signed-off-by: Ross Lagerwall &lt;ross.lagerwall@citrix.com&gt;
Acked-by: Juergen Gross &lt;jgross@suse.com&gt;
Link: https://lore.kernel.org/r/20230426164005.2213139-1-ross.lagerwall@citrix.com
Signed-off-by: Juergen Gross &lt;jgross@suse.com&gt;
</content>
</entry>
<entry>
<title>ublk: fix AB-BA lockdep warning</title>
<updated>2023-05-18T13:59:08Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2023-05-17T13:34:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ac5902f84bb546c64aea02c439c2579cbf40318f'/>
<id>urn:sha1:ac5902f84bb546c64aea02c439c2579cbf40318f</id>
<content type='text'>
When handling UBLK_IO_FETCH_REQ, ctx-&gt;uring_lock is grabbed first, then
ub-&gt;mutex is acquired.

When handling UBLK_CMD_STOP_DEV or UBLK_CMD_DEL_DEV, ub-&gt;mutex is
grabbed first, then calling io_uring_cmd_done() for canceling uring
command, in which ctx-&gt;uring_lock may be required.

Real deadlock only happens when all the above commands are issued from
same uring context, and in reality different uring contexts are often used
for handing control command and IO command.

Fix the issue by using io_uring_cmd_complete_in_task() to cancel command
in ublk_cancel_dev(ublk_cancel_queue).

Reported-by: Shinichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Closes: https://lore.kernel.org/linux-block/becol2g7sawl4rsjq2dztsbc7mqypfqko6wzsyoyazqydoasml@rcxarzwidrhk
Cc: Ziyang Zhang &lt;ZiyangZhang@linux.alibaba.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Tested-by: Shinichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Link: https://lore.kernel.org/r/20230517133408.210944-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>ublk: fix command op code check</title>
<updated>2023-05-12T15:09:06Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2023-05-05T15:31:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e485bd9e2c419142430ae6fe3e8f64e3059aef50'/>
<id>urn:sha1:e485bd9e2c419142430ae6fe3e8f64e3059aef50</id>
<content type='text'>
In case of CONFIG_BLKDEV_UBLK_LEGACY_OPCODES, type of cmd opcode could
be 0 or 'u'; and type can only be 'u' if CONFIG_BLKDEV_UBLK_LEGACY_OPCODES
isn't set.

So fix the wrong check.

Fixes: 2d786e66c966 ("block: ublk: switch to ioctl command encoding")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20230505153142.1258336-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block/rnbd: replace REQ_OP_FLUSH with REQ_OP_WRITE</title>
<updated>2023-05-12T14:56:42Z</updated>
<author>
<name>Guoqing Jiang</name>
<email>guoqing.jiang@linux.dev</email>
</author>
<published>2023-05-12T03:46:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e6e08087a4acb4ee3574cea32dbff0f63c7f608'/>
<id>urn:sha1:5e6e08087a4acb4ee3574cea32dbff0f63c7f608</id>
<content type='text'>
Since flush bios are implemented as writes with no data and
the preflush flag per Christoph's comment [1].

And we need to change it in rnbd accordingly. Otherwise, I
got splatting when create fs from rnbd client.

[  464.028545] ------------[ cut here ]------------
[  464.028553] WARNING: CPU: 0 PID: 65 at block/blk-core.c:751 submit_bio_noacct+0x32c/0x5d0
[ ... ]
[  464.028668] CPU: 0 PID: 65 Comm: kworker/0:1H Tainted: G           OE      6.4.0-rc1 #9
[  464.028671] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  464.028673] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[  464.028717] RIP: 0010:submit_bio_noacct+0x32c/0x5d0
[  464.028720] Code: 03 0f 85 51 fe ff ff 48 8b 43 18 8b 88 04 03 00 00 85 c9 0f 85 3f fe ff ff e9 be fd ff ff 0f b6 d0 3c 0d 74 26 83 fa 01 74 21 &lt;0f&gt; 0b b8 0a 00 00 00 e9 56 fd ff ff 4c 89 e7 e8 70 a1 03 00 84 c0
[  464.028722] RSP: 0018:ffffaf3680b57c68 EFLAGS: 00010202
[  464.028724] RAX: 0000000000060802 RBX: ffffa09dcc18bf00 RCX: 0000000000000000
[  464.028726] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffa09dde081d00
[  464.028727] RBP: ffffaf3680b57c98 R08: ffffa09dde081d00 R09: ffffa09e38327200
[  464.028729] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa09dde081d00
[  464.028730] R13: ffffa09dcb06e1e8 R14: 0000000000000000 R15: 0000000000200000
[  464.028733] FS:  0000000000000000(0000) GS:ffffa09e3bc00000(0000) knlGS:0000000000000000
[  464.028735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  464.028736] CR2: 000055a4e8206c40 CR3: 0000000119f06000 CR4: 00000000003506f0
[  464.028738] Call Trace:
[  464.028740]  &lt;TASK&gt;
[  464.028746]  submit_bio+0x1b/0x80
[  464.028748]  rnbd_srv_rdma_ev+0x50d/0x10c0 [rnbd_server]
[  464.028754]  ? percpu_ref_get_many.constprop.0+0x55/0x140 [rtrs_server]
[  464.028760]  ? __this_cpu_preempt_check+0x13/0x20
[  464.028769]  process_io_req+0x1dc/0x450 [rtrs_server]
[  464.028775]  rtrs_srv_inv_rkey_done+0x67/0xb0 [rtrs_server]
[  464.028780]  __ib_process_cq+0xbc/0x1f0 [ib_core]
[  464.028793]  ib_cq_poll_work+0x2b/0xa0 [ib_core]
[  464.028804]  process_one_work+0x2a9/0x580

[1]. https://lore.kernel.org/all/ZFHgefWofVt24tRl@infradead.org/

Signed-off-by: Guoqing Jiang &lt;guoqing.jiang@linux.dev&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230512034631.28686-1-guoqing.jiang@linux.dev
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
