<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/nvme, branch v5.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-05-27T18:32:56Z</updated>
<entry>
<title>nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()</title>
<updated>2020-05-27T18:32:56Z</updated>
<author>
<name>Dongli Zhang</name>
<email>dongli.zhang@oracle.com</email>
</author>
<published>2020-05-27T16:13:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9210c075cef29c1f764b4252f93105103bdfb292'/>
<id>urn:sha1:9210c075cef29c1f764b4252f93105103bdfb292</id>
<content type='text'>
There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g.,
when doing live reset while polling the nvme device.

      CPU X                        CPU Y
                               nvme_poll()
nvme_dev_disable()
-&gt; nvme_stop_queues()
-&gt; nvme_suspend_io_queues()
-&gt; nvme_suspend_queue()
                               -&gt; spin_lock(&amp;nvmeq-&gt;cq_poll_lock);
-&gt; nvme_reap_pending_cqes()
   -&gt; nvme_process_cq()        -&gt; nvme_process_cq()

In the above scenario, the nvme_process_cq() for the same queue may be
running on both CPU X and CPU Y concurrently.

It is much more easier to reproduce the issue when CONFIG_PREEMPT is
enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer
time for nvme_stop_queues()--&gt;blk_mq_quiesce_queue() to wait for grace
period.

This patch protects nvme_process_cq() with nvmeq-&gt;cq_poll_lock in
nvme_reap_pending_cqes().

Fixes: fa46c6fb5d61 ("nvme/pci: move cqe check after device shutdown")
Signed-off-by: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-pci: dma read memory barrier for completions</title>
<updated>2020-05-12T16:02:24Z</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2020-05-08T20:04:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b69e2ef24b7b4867f80f47e2781e95d0bacd15cb'/>
<id>urn:sha1:b69e2ef24b7b4867f80f47e2781e95d0bacd15cb</id>
<content type='text'>
Control dependencies do not guarantee load order across the condition,
allowing a CPU to predict and speculate memory reads.

Commit 324b494c2862 inlined verifying a new completion with its
handling. At least one architecture was observed to access the contents
out of order, resulting in the driver using stale data for the
completion.

Add a dma read barrier before reading the completion queue entry and
after the condition its contents depend on to ensure the read order is
determinsitic.

Reported-by: John Garry &lt;john.garry@huawei.com&gt;
Suggested-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Tested-by: John Garry &lt;john.garry@huawei.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme: fix possible hang when ns scanning fails during error recovery</title>
<updated>2020-05-09T22:07:58Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2020-05-06T22:44:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=59c7c3caaaf8750df4ec3255082f15eb4e371514'/>
<id>urn:sha1:59c7c3caaaf8750df4ec3255082f15eb4e371514</id>
<content type='text'>
When the controller is reconnecting, the host fails I/O and admin
commands as the host cannot reach the controller. ns scanning may
revalidate namespaces during that period and it is wrong to remove
namespaces due to these failures as we may hang (see 205da2434301).

One command that may fail is nvme_identify_ns_descs. Since we return
success due to having ns identify descriptor list optional, we continue
to compare ns identifiers in nvme_revalidate_disk, obviously fail and
return -ENODEV to nvme_validate_ns, which will remove the namespace.

Exactly what we don't want to happen.

Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional")
Tested-by: Anton Eidelman &lt;anton@lightbitslabs.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme-pci: fix "slimmer CQ head update"</title>
<updated>2020-05-09T22:07:58Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-05-07T20:07:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a8de6639169b90e3dc4f27e752a3c5abac5e90da'/>
<id>urn:sha1:a8de6639169b90e3dc4f27e752a3c5abac5e90da</id>
<content type='text'>
Pre-incrementing -&gt;cq_head can't be done in memory because OOB value
can be observed by another context.

This devalues space savings compared to original code :-\

	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
	add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
	Function                                     old     new   delta
	nvme_poll_irqdisable                         464     456      -8
	nvme_poll                                    455     447      -8
	nvme_irq                                     388     380      -8
	nvme_dev_disable                             955     947      -8

But the code is minimal now: one read for head, one read for q_depth,
one increment, one comparison, single instruction phase bit update and
one write for new head.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reported-by: John Garry &lt;john.garry@huawei.com&gt;
Tested-by: John Garry &lt;john.garry@huawei.com&gt;
Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme: prevent double free in nvme_alloc_ns() error handling</title>
<updated>2020-04-27T15:08:06Z</updated>
<author>
<name>Niklas Cassel</name>
<email>niklas.cassel@wdc.com</email>
</author>
<published>2020-04-27T12:34:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=132be62387c7a72a38872676c18b0dfae264adb8'/>
<id>urn:sha1:132be62387c7a72a38872676c18b0dfae264adb8</id>
<content type='text'>
When jumping to the out_put_disk label, we will call put_disk(), which will
trigger a call to disk_release(), which calls blk_put_queue().

Later in the cleanup code, we do blk_cleanup_queue(), which will also call
blk_put_queue().

Putting the queue twice is incorrect, and will generate a KASAN splat.

Set the disk-&gt;queue pointer to NULL, before calling put_disk(), so that the
first call to blk_put_queue() will not free the queue.

The second call to blk_put_queue() uses another pointer to the same queue,
so this call will still free the queue.

Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration")
Signed-off-by: Niklas Cassel &lt;niklas.cassel@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block</title>
<updated>2020-04-10T17:06:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-10T17:06:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8df2a0a6da450b0fc28f1fed110817c1d98b84c2'/>
<id>urn:sha1:8df2a0a6da450b0fc28f1fed110817c1d98b84c2</id>
<content type='text'>
Pull block fixes from Jens Axboe:
 "Here's a set of fixes that should go into this merge window. This
  contains:

   - NVMe pull request from Christoph with various fixes

   - Better discard support for loop (Evan)

   - Only call -&gt;commit_rqs() if we have queued IO (Keith)

   - blkcg offlining fixes (Tejun)

   - fix (and fix the fix) for busy partitions"

* tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block:
  block: fix busy device checking in blk_drop_partitions again
  block: fix busy device checking in blk_drop_partitions
  nvmet-rdma: fix double free of rdma queue
  blk-mq: don't commit_rqs() if none were queued
  nvme-fc: Revert "add module to ops template to allow module references"
  nvme: fix deadlock caused by ANA update wrong locking
  nvmet-rdma: fix bonding failover possible NULL deref
  loop: Better discard support for block devices
  loop: Report EOPNOTSUPP properly
  nvmet: fix NULL dereference when removing a referral
  nvme: inherit stable pages constraint in the mpath stack device
  blkcg: don't offline parent blkcg first
  blkcg: rename blkcg-&gt;cgwb_refcnt to -&gt;online_pin and always use it
  nvme-tcp: fix possible crash in recv error flow
  nvme-tcp: don't poll a non-live queue
  nvme-tcp: fix possible crash in write_zeroes processing
  nvmet-fc: fix typo in comment
  nvme-rdma: Replace comma with a semicolon
  nvme-fcloop: fix deallocation of working context
  nvme: fix compat address handling in several ioctls
</content>
</entry>
<entry>
<title>nvmet-rdma: fix double free of rdma queue</title>
<updated>2020-04-07T16:33:45Z</updated>
<author>
<name>Israel Rukshin</name>
<email>israelr@mellanox.com</email>
</author>
<published>2020-04-07T11:02:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=21f9024355e58772ec5d7fc3534aa5e29d72a8b6'/>
<id>urn:sha1:21f9024355e58772ec5d7fc3534aa5e29d72a8b6</id>
<content type='text'>
In case rdma accept fails at nvmet_rdma_queue_connect(), release work is
scheduled. Later on, a new RDMA CM event may arrive since we didn't
destroy the cm-id and call nvmet_rdma_queue_connect_fail(), which
schedule another release work. This will cause calling
nvmet_rdma_free_queue twice. To fix this we implicitly destroy the cm_id
with non-zero ret code, which guarantees that new rdma_cm events will
not arrive afterwards. Also add a qp pointer to nvmet_rdma_queue
structure, so we can use it when the cm_id pointer is NULL or was
destroyed.

Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Suggested-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-fc: Revert "add module to ops template to allow module references"</title>
<updated>2020-04-04T07:09:39Z</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2020-04-03T14:33:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8c5c660529209a0e324c1c1a35ce3f83d67a2aa5'/>
<id>urn:sha1:8c5c660529209a0e324c1c1a35ce3f83d67a2aa5</id>
<content type='text'>
The original patch was to resolve the lldd being able to be unloaded
while being used to talk to the boot device of the system. However, the
end result of the original patch is that any driver unload while a nvme
controller is live via the lldd is now being prohibited. Given the module
reference, the module teardown routine can't be called, thus there's no
way, other than manual actions to terminate the controllers.

Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references")
Cc: &lt;stable@vger.kernel.org&gt; # v5.4+
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Reviewed-by: Himanshu Madhani &lt;himanshu.madhani@oracle.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme: fix deadlock caused by ANA update wrong locking</title>
<updated>2020-04-04T07:07:03Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2020-04-02T16:34:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=657f1975e9d9c880fa13030e88ba6cc84964f1db'/>
<id>urn:sha1:657f1975e9d9c880fa13030e88ba6cc84964f1db</id>
<content type='text'>
The deadlock combines 4 flows in parallel:
- ns scanning (triggered from reconnect)
- request timeout
- ANA update (triggered from reconnect)
- I/O coming into the mpath device

(1) ns scanning triggers disk revalidation -&gt; update disk info -&gt;
    freeze queue -&gt; but blocked, due to (2)

(2) timeout handler reference the g_usage_counter - &gt; but blocks in
    the transport .timeout() handler, due to (3)

(3) the transport timeout handler (indirectly) calls nvme_stop_queue() -&gt;
    which takes the (down_read) namespaces_rwsem - &gt; but blocks, due to (4)

(4) ANA update takes the (down_write) namespaces_rwsem -&gt; calls
    nvme_mpath_set_live() -&gt; which synchronize the ns_head srcu
    (see commit 504db087aacc) -&gt; but blocks, due to (5)

(5) I/O came into nvme_mpath_make_request -&gt; took srcu_read_lock -&gt;
    direct_make_request &gt; blk_queue_enter -&gt; but blocked, due to (1)

==&gt; the request queue is under freeze -&gt; deadlock.

The fix is making ANA update take a read lock as the namespaces list
is not manipulated, it is just the ns and ns-&gt;head that are being
updated (which is protected with the ns-&gt;head lock).

Fixes: 0d0b660f214dc ("nvme: add ANA support")
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvmet-rdma: fix bonding failover possible NULL deref</title>
<updated>2020-04-04T07:06:55Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2020-04-02T15:48:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a032e4f6d60d0aca4f6570d2ad33105a2b9ba385'/>
<id>urn:sha1:a032e4f6d60d0aca4f6570d2ad33105a2b9ba385</id>
<content type='text'>
RDMA_CM_EVENT_ADDR_CHANGE event occur in the case of bonding failover
on normal as well as on listening cm_ids. Hence this event will
immediately trigger a NULL dereference trying to disconnect a queue
for a cm_id that actually belongs to the port.

To fix this we provide a different handler for the listener cm_ids
that will defer a work to disable+(re)enable the port which essentially
destroys and setups another listener cm_id

Reported-by: Alex Lyakas &lt;alex@zadara.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Tested-by: Alex Lyakas &lt;alex@zadara.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
</feed>
