<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/block, branch v4.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-10-28T00:12:58Z</updated>
<entry>
<title>block: re-add discard_granularity and alignment checks</title>
<updated>2015-10-28T00:12:58Z</updated>
<author>
<name>Ming Lin</name>
<email>ming.l@ssi.samsung.com</email>
</author>
<published>2015-10-22T16:59:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a22c4d7e34402ccdf3414f64c50365436eba7b93'/>
<id>urn:sha1:a22c4d7e34402ccdf3414f64c50365436eba7b93</id>
<content type='text'>
In commit b49a087("block: remove split code in
blkdev_issue_{discard,write_same}"), discard_granularity and alignment
checks were removed. Ideally, with bio late splitting, the upper layers
shouldn't need to depend on device's limits.

Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe
device when mkfs.xfs. We have not found the root cause yet.

This patch re-adds discard_granularity and alignment checks by reverting
the related changes in commit b49a087. The good thing is now we can
remove the 2G discard size cap and just use UINT_MAX to avoid bi_size
overflow.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lin &lt;ming.l@ssi.samsung.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: don't release bdi while request_queue has live references</title>
<updated>2015-10-15T15:53:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-08T16:20:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b02176f30cd30acccd3b633ab7d9aed8b5da52ff'/>
<id>urn:sha1:b02176f30cd30acccd3b633ab7d9aed8b5da52ff</id>
<content type='text'>
bdi's are initialized in two steps, bdi_init() and bdi_register(), but
destroyed in a single step by bdi_destroy() which, for a bdi embedded
in a request_queue, is called during blk_cleanup_queue() which makes
the queue invisible and starts the draining of remaining usages.

A request_queue's user can access the congestion state of the embedded
bdi as long as it holds a reference to the queue.  As such, it may
access the congested state of a queue which finished
blk_cleanup_queue() but hasn't reached blk_release_queue() yet.
Because the congested state was embedded in backing_dev_info which in
turn is embedded in request_queue, accessing the congested state after
bdi_destroy() was called was fine.  The bdi was destroyed but the
memory region for the congested state remained accessible till the
queue got released.

a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in
bdi_writeback") changed the situation.  Now, the root congested state
which is expected to be pinned while request_queue remains accessible
is separately reference counted and the base ref is put during
bdi_destroy().  This means that the root congested state may go away
prematurely while the queue is between bdi_dstroy() and
blk_cleanup_queue(), which was detected by Andrey's KASAN tests.

The root cause of this problem is that bdi doesn't distinguish the two
steps of destruction, unregistration and release, and now the root
congested state actually requires a separate release step.  To fix the
issue, this patch separates out bdi_unregister() and bdi_exit() from
bdi_destroy().  bdi_unregister() is called from blk_cleanup_queue()
and bdi_exit() from blk_release_queue().  bdi_destroy() is now just a
simple wrapper calling the two steps back-to-back.

While at it, the prototype of bdi_destroy() is moved right below
bdi_setup_and_register() so that the counterpart operations are
located together.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback")
Cc: stable@vger.kernel.org # v4.2+
Reported-and-tested-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com
Reviewed-by: Jan Kara &lt;jack@suse.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix use-after-free in blk_mq_free_tag_set()</title>
<updated>2015-10-15T14:45:58Z</updated>
<author>
<name>Junichi Nomura</name>
<email>j-nomura@ce.jp.nec.com</email>
</author>
<published>2015-10-14T05:02:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f42d79ab67322e51b92dd7aa965e310c71352a64'/>
<id>urn:sha1:f42d79ab67322e51b92dd7aa965e310c71352a64</id>
<content type='text'>
tags is freed in blk_mq_free_rq_map() and should not be used after that.
The problem doesn't manifest if CONFIG_CPUMASK_OFFSTACK is false because
free_cpumask_var() is nop.

tags-&gt;cpumask is allocated in blk_mq_init_tags() so it's natural to
free cpumask in its counter part, blk_mq_free_tags().

Fixes: f26cdc8536ad ("blk-mq: Shared tag enhancements")
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: factor out a helper to iterate all tags for a request_queue</title>
<updated>2015-10-01T08:10:57Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2015-09-27T19:01:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0bf6cd5b9531bcc29c0a5e504b6ce2984c6fd8d8'/>
<id>urn:sha1:0bf6cd5b9531bcc29c0a5e504b6ce2984c6fd8d8</id>
<content type='text'>
And replace the blk_mq_tag_busy_iter with it - the driver use has been
replaced with a new helper a while ago, and internal to the block we
only need the new version.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix racy updates of rq-&gt;errors</title>
<updated>2015-10-01T08:10:55Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2015-09-27T19:01:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f4829a9b7a61e159367350008a608b062c4f6840'/>
<id>urn:sha1:f4829a9b7a61e159367350008a608b062c4f6840</id>
<content type='text'>
blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq-&gt;errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq-&gt;errors until we known we won the race to complete the
request.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix deadlock when reading cpu_list</title>
<updated>2015-09-29T17:32:51Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60de074ba1e8f327db19bc33d8530131ac01695d'/>
<id>urn:sha1:60de074ba1e8f327db19bc33d8530131ac01695d</id>
<content type='text'>
CPU hotplug handling for blk-mq (blk_mq_queue_reinit) acquires
all_q_mutex in blk_mq_queue_reinit_notify() and then removes sysfs
entries by blk_mq_sysfs_unregister().  Removing sysfs entry needs to
be blocked until the active reference of the kernfs_node to be zero.

On the other hand, reading blk_mq_hw_sysfs_cpu sysfs entry (e.g.
/sys/block/nullb0/mq/0/cpu_list) acquires all_q_mutex in
blk_mq_hw_sysfs_cpus_show().

If these happen at the same time, a deadlock can happen.  Because one
can wait for the active reference to be zero with holding all_q_mutex,
and the other tries to acquire all_q_mutex with holding the active
reference.

The reason that all_q_mutex is acquired in blk_mq_hw_sysfs_cpus_show()
is to avoid reading an imcomplete hctx-&gt;cpumask.  Since reading sysfs
entry for blk-mq needs to acquire q-&gt;sysfs_lock, we can avoid deadlock
and reading an imcomplete hctx-&gt;cpumask by protecting q-&gt;sysfs_lock
while hctx-&gt;cpumask is being updated.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: avoid inserting requests before establishing new mapping</title>
<updated>2015-09-29T17:32:50Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5778322e67ed34dc9f391a4a5cbcbb856071ceba'/>
<id>urn:sha1:5778322e67ed34dc9f391a4a5cbcbb856071ceba</id>
<content type='text'>
Notifier callbacks for CPU_ONLINE action can be run on the other CPU
than the CPU which was just onlined.  So it is possible for the
process running on the just onlined CPU to insert request and run
hw queue before establishing new mapping which is done by
blk_mq_queue_reinit_notify().

This can cause a problem when the CPU has just been onlined first time
since the request queue was initialized.  At this time ctx-&gt;index_hw
for the CPU, which is the index in hctx-&gt;ctxs[] for this ctx, is still
zero before blk_mq_queue_reinit_notify() is called by notifier
callbacks for CPU_ONLINE action.

For example, there is a single hw queue (hctx) and two CPU queues
(ctx0 for CPU0, and ctx1 for CPU1).  Now CPU1 is just onlined and
a request is inserted into ctx1-&gt;rq_list and set bit0 in pending
bitmap as ctx1-&gt;index_hw is still zero.

And then while running hw queue, flush_busy_ctxs() finds bit0 is set
in pending bitmap and tries to retrieve requests in
hctx-&gt;ctxs[0]-&gt;rq_list.  But htx-&gt;ctxs[0] is a pointer to ctx0, so the
request in ctx1-&gt;rq_list is ignored.

Fix it by ensuring that new mapping is established before onlined cpu
starts running.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix q-&gt;mq_usage_counter access race</title>
<updated>2015-09-29T17:32:48Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0e6263682014d480b8d7b8c10287f4536066b54f'/>
<id>urn:sha1:0e6263682014d480b8d7b8c10287f4536066b54f</id>
<content type='text'>
CPU hotplug handling for blk-mq (blk_mq_queue_reinit) accesses
q-&gt;mq_usage_counter while freezing all request queues in all_q_list.
On the other hand, q-&gt;mq_usage_counter is deinitialized in
blk_mq_free_queue() before deleting the queue from all_q_list.

So if CPU hotplug event occurs in the window, percpu_ref_kill() is
called with q-&gt;mq_usage_counter which has already been marked dead,
and it triggers warning.  Fix it by deleting the queue from all_q_list
earlier than destroying q-&gt;mq_usage_counter.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: Fix use after of free q-&gt;mq_map</title>
<updated>2015-09-29T17:32:46Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a723bab3d7529133f71fc8a5e96f86e3639a0d13'/>
<id>urn:sha1:a723bab3d7529133f71fc8a5e96f86e3639a0d13</id>
<content type='text'>
CPU hotplug handling for blk-mq (blk_mq_queue_reinit) updates
q-&gt;mq_map by blk_mq_update_queue_map() for all request queues in
all_q_list.  On the other hand, q-&gt;mq_map is released before deleting
the queue from all_q_list.

So if CPU hotplug event occurs in the window, invalid memory access
can happen.  Fix it by releasing q-&gt;mq_map in blk_mq_release() to make
it happen latter than removal from all_q_list.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Suggested-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix sysfs registration/unregistration race</title>
<updated>2015-09-29T17:32:45Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4593fdbe7a2f44d5e64c627c715dd0bcec9bdf14'/>
<id>urn:sha1:4593fdbe7a2f44d5e64c627c715dd0bcec9bdf14</id>
<content type='text'>
There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.

null_add_dev
    --&gt; blk_mq_init_queue
        --&gt; blk_mq_init_allocated_queue
            --&gt; add to 'all_q_list' (*)
    --&gt; add_disk
        --&gt; blk_register_queue
            --&gt; blk_mq_register_disk (++)

null_del_dev
    --&gt; del_gendisk
        --&gt; blk_unregister_queue
            --&gt; blk_mq_unregister_disk (--)
    --&gt; blk_cleanup_queue
        --&gt; blk_mq_free_queue
            --&gt; del from 'all_q_list' (*)

blk_mq_queue_reinit
    --&gt; blk_mq_sysfs_unregister (-)
    --&gt; blk_mq_sysfs_register (+)

While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback.  But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished.  Because '/sys/block/*/mq/' is not exists.

There has already been BLK_MQ_F_SYSFS_UP flag in hctx-&gt;flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.

In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking.  So this introduces
q-&gt;mq_sysfs_init_done which is properly protected with all_q_mutex.

Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held.  Since hctx-&gt;nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx-&gt;nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
