<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/trace/blktrace.c, branch v4.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2017-09-25T14:56:05Z</updated>
<entry>
<title>blktrace: Fix potential deadlock between delete &amp; sysfs ops</title>
<updated>2017-09-25T14:56:05Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2017-09-20T19:12:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5acb3cc2c2e9d3020a4fee43763c6463767f1572'/>
<id>urn:sha1:5acb3cc2c2e9d3020a4fee43763c6463767f1572</id>
<content type='text'>
The lockdep code had reported the following unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(s_active#228);
                               lock(&amp;bdev-&gt;bd_mutex/1);
                               lock(s_active#228);
  lock(&amp;bdev-&gt;bd_mutex);

 *** DEADLOCK ***

The deadlock may happen when one task (CPU1) is trying to delete a
partition in a block device and another task (CPU0) is accessing
tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
partition.

The s_active isn't an actual lock. It is a reference count (kn-&gt;count)
on the sysfs (kernfs) file. Removal of a sysfs file, however, require
a wait until all the references are gone. The reference count is
treated like a rwsem using lockdep instrumentation code.

The fact that a thread is in the sysfs callback method or in the
ioctl call means there is a reference to the opended sysfs or device
file. That should prevent the underlying block structure from being
removed.

Instead of using bd_mutex in the block_device structure, a new
blk_trace_mutex is now added to the request_queue structure to protect
access to the blk_trace structure.

Suggested-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;

Fix typo in patch subject line, and prune a comment detailing how
the code used to work.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: replace bi_bdev with a gendisk pointer and partitions index</title>
<updated>2017-08-23T18:49:55Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-08-23T17:10:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=74d46992e0d9dee7f1f376de0d56d31614c8a17a'/>
<id>urn:sha1:74d46992e0d9dee7f1f376de0d56d31614c8a17a</id>
<content type='text'>
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use standard blktrace API to output cgroup info for debug notes</title>
<updated>2017-07-29T15:00:03Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-07-12T18:49:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=35fe6d763229e8fc0eb5f9b93a401673cfcb5e1e'/>
<id>urn:sha1:35fe6d763229e8fc0eb5f9b93a401673cfcb5e1e</id>
<content type='text'>
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.

Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup' option enabled. cgroup
info isn't output as a string by default too, we only do this with
'blk_cgname' option enabled. Also cgroup info is output in different
position of the note string. I think these behavior changes aren't a big
issue (actually we make trace data shorter which is good), since the
blktrace note is solely for debugging.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blktrace: add an option to allow displaying cgroup path</title>
<updated>2017-07-29T15:00:03Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-07-12T18:49:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=69fd5c391763bd94a40dd152bc72a7f230137150'/>
<id>urn:sha1:69fd5c391763bd94a40dd152bc72a7f230137150</id>
<content type='text'>
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.

with the option enabled, blktrace will output something like this:
dd-1353  [007] d..2   293.015252:   8,0   /test/level  D   R 24 + 8 [dd]

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blktrace: export cgroup info in trace</title>
<updated>2017-07-29T15:00:03Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-07-12T18:49:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ca1136c99b66b1566781ff12ecddc635d570f932'/>
<id>urn:sha1:ca1136c99b66b1566781ff12ecddc635d570f932</id>
<content type='text'>
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from flusher thread but
the BIOs are for different blk cgroups. Request could be requeued and
dispatched from completely different tasks. MD/DM are another examples.

This patch tries to fix the gap. We print out cgroup fhandle info in
blktrace. Userspace can use open_by_handle_at() syscall to find the
cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to
find fhandle for a cgroup and use a BPF program to filter out blktrace
for a specific cgroup.

We add a new 'blk_cgroup' trace option for blk tracer. It's default off.
Application which doesn't know the new option isn't affected.  When it's
on, we output fhandle info right after blk_io_trace with an extra bit
set in event action. So from application point of view, blktrace with
the option will output new actions.

I didn't change blk trace event yet, since I'm not sure if changing the
trace event output is an ABI issue. If not, I'll do it later.

Acked-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: switch bios to blk_status_t</title>
<updated>2017-06-09T15:27:32Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-06-03T07:38:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4e4cbee93d56137ebff722be022cae5f70ef84fb'/>
<id>urn:sha1:4e4cbee93d56137ebff722be022cae5f70ef84fb</id>
<content type='text'>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blktrace: fix integer parse</title>
<updated>2017-05-19T15:21:15Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-05-19T15:04:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5f3394530fbe90d3bcd1c204618960bc50236578'/>
<id>urn:sha1:5f3394530fbe90d3bcd1c204618960bc50236578</id>
<content type='text'>
sscanf is a very poor way to parse integer. For example, I input
"discard" for act_mask, it gets 0xd and completely messes up. Using
correct API to do integer parse.

This patch also makes attributes accept any base of integer.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: remove the errors field from struct request</title>
<updated>2017-04-20T18:16:10Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-04-20T14:03:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=caf7df12272118e0274c8353bcfeaf60c7743a47'/>
<id>urn:sha1:caf7df12272118e0274c8353bcfeaf60c7743a47</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Bart Van Assche &lt;Bart.VanAssche@sandisk.com&gt;
Acked-by: Roger Pau Monné &lt;roger.pau@citrix.com&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blktrace: remove the unused block_rq_abort tracepoint</title>
<updated>2017-04-20T18:16:10Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-04-20T14:03:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cee4b7ce3f9161c88f7255a3d73c1c4d5bbabea7'/>
<id>urn:sha1:cee4b7ce3f9161c88f7255a3d73c1c4d5bbabea7</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blktrace: use existing disk debugfs directory</title>
<updated>2017-02-02T17:20:16Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-01-31T22:53:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6ac93117ab009d3901ed5d3d0f79056eb5fc0afd'/>
<id>urn:sha1:6ac93117ab009d3901ed5d3d0f79056eb5fc0afd</id>
<content type='text'>
We may already have a directory to put the blktrace stuff in if

1. The disk uses blk-mq
2. CONFIG_BLK_DEBUG_FS is enabled
3. We are tracing the whole disk and not a partition

Instead of hardcoding this very specific case, let's use the new
debugfs_lookup(). If the directory exists, we use it, otherwise we
create one and clean it up later.

Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
