<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/nvme, branch v5.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-11-09T02:15:55Z</updated>
<entry>
<title>Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block</title>
<updated>2019-11-09T02:15:55Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-11-09T02:15:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5cb8418cb533222709f362d264653a634eb8c7ac'/>
<id>urn:sha1:5cb8418cb533222709f362d264653a634eb8c7ac</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - Two NVMe device removal crash fixes, and a compat fixup for for an
   ioctl that was introduced in this release (Anton, Charles, Max - via
   Keith)

 - Missing error path mutex unlock for drbd (Dan)

 - cgroup writeback fixup on dead memcg (Tejun)

 - blkcg online stats print fix (Tejun)

* tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
  cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  block: drbd: remove a stray unlock in __drbd_send_protocol()
  blkcg: make blkcg_print_stat() print stats only for online blkgs
  nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
  nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
  nvme-rdma: fix a segmentation fault during module unload
</content>
</entry>
<entry>
<title>nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths</title>
<updated>2019-11-05T15:30:37Z</updated>
<author>
<name>Anton Eidelman</name>
<email>anton@lightbitslabs.com</email>
</author>
<published>2019-11-02T00:27:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=763303a83a095a88c3a8a0d1abf97165db2e8bf5'/>
<id>urn:sha1:763303a83a095a88c3a8a0d1abf97165db2e8bf5</id>
<content type='text'>
nvme_mpath_clear_ctrl_paths() iterates through
the ctrl-&gt;namespaces list while holding ctrl-&gt;scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.

Specifically, nvme_scan_work() sorts ctrl-&gt;namespaces
AFTER unlocking scan_lock.

This may result in the following (rare) crash in ctrl disconnect
during scan_work:

    BUG: kernel NULL pointer dereference, address: 0000000000000050
    Oops: 0000 [#1] SMP PTI
    CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
    RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
    ...
    Call Trace:
     nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
     nvme_remove_namespaces+0x35/0xe0 [nvme_core]
     nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
     nvme_sysfs_delete+0x49/0x60 [nvme_core]
     dev_attr_store+0x17/0x30
     sysfs_kf_write+0x3e/0x50
     kernfs_fop_write+0x11e/0x1a0
     __vfs_write+0x1b/0x40
     vfs_write+0xb9/0x1a0
     ksys_write+0x67/0xe0
     __x64_sys_write+0x1a/0x20
     do_syscall_64+0x5a/0x130
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f8d02bfb154

Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&amp;ctrl-&gt;namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.

Alternative: sort ctrl-&gt;namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl-&gt;namespaces modification by anyone other than scan_work.

Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Anton Eidelman &lt;anton@lightbitslabs.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvme-rdma: fix a segmentation fault during module unload</title>
<updated>2019-11-05T15:29:23Z</updated>
<author>
<name>Max Gurtovoy</name>
<email>maxg@mellanox.com</email>
</author>
<published>2019-10-29T14:42:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9ad9e8d6ca29c1446d81c6518ae634a2141dfd22'/>
<id>urn:sha1:9ad9e8d6ca29c1446d81c6518ae634a2141dfd22</id>
<content type='text'>
In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.

Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2019-11-02T00:48:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-11-02T00:48:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1204c70d9dcba31164f78ad5d8c88c42335d51f8'/>
<id>urn:sha1:1204c70d9dcba31164f78ad5d8c88c42335d51f8</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

 2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

 4) Add an r8152 device ID, from Kazutoshi Noguchi.

 5) Missing include in ipv6's addrconf.c, from Ben Dooks.

 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

 7) Several netdevice nesting depth fixes from Taehee Yoo.

 8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

 9) Fix jumbo packet handling in RXRPC, from David Howells.

10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

11) Fix DMA synchronization in gve driver, from Yangchun Fu.

12) Several bpf offload fixes, from Jakub Kicinski.

13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  net: fix installing orphaned programs
  net: cls_bpf: fix NULL deref on offload filter removal
  selftests: bpf: Skip write only files in debugfs
  selftests: net: reuseport_dualstack: fix uninitalized parameter
  r8169: fix wrong PHY ID issue with RTL8168dp
  net: dsa: bcm_sf2: Fix IMP setup for port different than 8
  net: phylink: Fix phylink_dbg() macro
  gve: Fixes DMA synchronization.
  inet: stop leaking jiffies on the wire
  ixgbe: Remove duplicate clear_bit() call
  Documentation: networking: device drivers: Remove stray asterisks
  e1000: fix memory leaks
  i40e: Fix receive buffer starvation for AF_XDP
  igb: Fix constant media auto sense switching when no cable is connected
  net: ethernet: arc: add the missed clk_disable_unprepare
  igb: Enable media autosense for the i350.
  igb/igc: Don't warn on fatal read failures when the device is removed
  tcp: increase tcp_max_syn_backlog max value
  net: increase SOMAXCONN to 4096
  netdevsim: Fix use-after-free during device dismantle
  ...
</content>
</entry>
<entry>
<title>nvme-multipath: remove unused groups_only mode in ana log</title>
<updated>2019-10-29T14:55:00Z</updated>
<author>
<name>Anton Eidelman</name>
<email>anton@lightbitslabs.com</email>
</author>
<published>2019-10-18T18:32:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86cccfbf773fafb88898c5627aa3727b02bc4708'/>
<id>urn:sha1:86cccfbf773fafb88898c5627aa3727b02bc4708</id>
<content type='text'>
groups_only mode in nvme_read_ana_log() is no longer used: remove it.

Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Anton Eidelman &lt;anton@lightbitslabs.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme-multipath: fix possible io hang after ctrl reconnect</title>
<updated>2019-10-29T14:55:00Z</updated>
<author>
<name>Anton Eidelman</name>
<email>anton@lightbitslabs.com</email>
</author>
<published>2019-10-18T18:32:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af8fd0424713a2adb812d10d55e86718152cf656'/>
<id>urn:sha1:af8fd0424713a2adb812d10d55e86718152cf656</id>
<content type='text'>
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns-&gt;flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns-&gt;flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns-&gt;flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.

Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Anton Eidelman &lt;anton@lightbitslabs.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>net: use skb_queue_empty_lockless() in busy poll contexts</title>
<updated>2019-10-28T20:33:41Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-10-24T05:44:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3f926af3f4d688e2e11e7f8ed04e277a14d4d4a4'/>
<id>urn:sha1:3f926af3f4d688e2e11e7f8ed04e277a14d4d4a4</id>
<content type='text'>
Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()

Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>nvme-pci: Set the prp2 correctly when using more than 4k page</title>
<updated>2019-10-18T14:09:41Z</updated>
<author>
<name>Kevin Hao</name>
<email>haokexin@gmail.com</email>
</author>
<published>2019-10-18T02:53:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a4f40484e7f1dff56bb9f286cc59ffa36e0259eb'/>
<id>urn:sha1:a4f40484e7f1dff56bb9f286cc59ffa36e0259eb</id>
<content type='text'>
In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev-&gt;ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.

Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Kevin Hao &lt;haokexin@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvme-tcp: fix possible leakage during error flow</title>
<updated>2019-10-15T13:47:29Z</updated>
<author>
<name>Max Gurtovoy</name>
<email>maxg@mellanox.com</email>
</author>
<published>2019-10-13T16:57:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28a4cac48c7e897a0b4e7d79a53a8e4fe40337ae'/>
<id>urn:sha1:28a4cac48c7e897a0b4e7d79a53a8e4fe40337ae</id>
<content type='text'>
During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
since it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvmet-loop: fix possible leakage during error flow</title>
<updated>2019-10-15T13:47:28Z</updated>
<author>
<name>Max Gurtovoy</name>
<email>maxg@mellanox.com</email>
</author>
<published>2019-10-13T16:57:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5812d04c4c7455627d8722e04ab99a737cfe9713'/>
<id>urn:sha1:5812d04c4c7455627d8722e04ab99a737cfe9713</id>
<content type='text'>
During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
</feed>
