<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/io_uring, branch v6.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-05-22T01:24:18Z</updated>
<entry>
<title>io_uring/net: only retry recv bundle for a full transfer</title>
<updated>2025-05-22T01:24:18Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-22T00:51:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3a08988123c868dbfdd054541b1090fb891fa49e'/>
<id>urn:sha1:3a08988123c868dbfdd054541b1090fb891fa49e</id>
<content type='text'>
If a shorter than assumed transfer was seen, a partial buffer will have
been filled. For that case it isn't sane to attempt to fill more into
the bundle before posting a completion, as that will cause a gap in
the received data.

Check if the iterator has hit zero and only allow to continue a bundle
operation if that is the case.

Also ensure that for putting finished buffers, only the current transfer
is accounted. Otherwise too many buffers may be put for a short transfer.

Link: https://github.com/axboe/liburing/issues/1409
Cc: stable@vger.kernel.org
Fixes: 7c71a0af81ba ("io_uring/net: improve recv bundles")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: fix overflow resched cqe reordering</title>
<updated>2025-05-21T13:01:54Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2025-05-17T12:27:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a7d755ed9ce9738af3db602eb29d32774a180bc7'/>
<id>urn:sha1:a7d755ed9ce9738af3db602eb29d32774a180bc7</id>
<content type='text'>
Leaving the CQ critical section in the middle of a overflow flushing
can cause cqe reordering since the cache cq pointers are reset and any
new cqe emitters that might get called in between are not going to be
forced into io_cqe_cache_refill().

Fixes: eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush")
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/cmd: axe duplicate io_uring_cmd_import_fixed_vec() declaration</title>
<updated>2025-05-20T20:36:41Z</updated>
<author>
<name>Caleb Sander Mateos</name>
<email>csander@purestorage.com</email>
</author>
<published>2025-05-20T19:33:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f1774d9d4e104639a9122bde3b1fe58a0c0dcde7'/>
<id>urn:sha1:f1774d9d4e104639a9122bde3b1fe58a0c0dcde7</id>
<content type='text'>
io_uring_cmd_import_fixed_vec() is declared in both
include/linux/io_uring/cmd.h and io_uring/uring_cmd.h. The declarations
are identical (if redundant) for CONFIG_IO_URING=y. But if
CONFIG_IO_URING=N, include/linux/io_uring/cmd.h declares the function as
static inline while io_uring/uring_cmd.h declares it as extern. This
causes linker errors if the declaration in io_uring/uring_cmd.h is used.

Remove the declaration in io_uring/uring_cmd.h to avoid linker errors
and prevent the declarations getting out of sync.

Signed-off-by: Caleb Sander Mateos &lt;csander@purestorage.com&gt;
Fixes: ef4902752972 ("io_uring/cmd: introduce io_uring_cmd_import_fixed_vec")
Link: https://lore.kernel.org/r/20250520193337.1374509-1-csander@purestorage.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/fdinfo: grab ctx-&gt;uring_lock around io_uring_show_fdinfo()</title>
<updated>2025-05-14T13:15:28Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-13T21:02:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d871198ee431d90f5308d53998c1ba1d5db5619a'/>
<id>urn:sha1:d871198ee431d90f5308d53998c1ba1d5db5619a</id>
<content type='text'>
Not everything requires locking in there, which is why the 'has_lock'
variable exists. But enough does that it's a bit unwieldy to manage.
Wrap the whole thing in a -&gt;uring_lock trylock, and just return
with no output if we fail to grab it. The existing trylock() will
already have greatly diminished utility/output for the failure case.

This fixes an issue with reading the SQE fields, if the ring is being
actively resized at the same time.

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/memmap: don't use page_address() on a highmem page</title>
<updated>2025-05-12T15:27:41Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-12T15:06:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f446c6311e86618a1f81eb576b56a6266307238f'/>
<id>urn:sha1:f446c6311e86618a1f81eb576b56a6266307238f</id>
<content type='text'>
For older/32-bit systems with highmem, don't assume that the pages in
a mapped region are always going to be mapped. If io_region_init_ptr()
finds that the pages are coalescable, also check if the first page is
a HighMem page or not. If it is, fall through to the usual vmap()
mapping rather than attempt to get the unmapped page address.

Cc: stable@vger.kernel.org
Fixes: c4d0ac1c1567 ("io_uring/memmap: optimise single folio regions")
Link: https://lore.kernel.org/all/681fe2fb.050a0220.f2294.001a.GAE@google.com/
Reported-by: syzbot+5b8c4abafcb1d791ccfc@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/681fed0a.050a0220.f2294.001c.GAE@google.com/
Reported-by: syzbot+6456a99dfdc2e78c4feb@syzkaller.appspotmail.com
Tested-by: syzbot+6456a99dfdc2e78c4feb@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/uring_cmd: fix hybrid polling initialization issue</title>
<updated>2025-05-12T13:17:02Z</updated>
<author>
<name>hexue</name>
<email>xue01.he@samsung.com</email>
</author>
<published>2025-05-12T05:20:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63166b815dc163b2e46426cecf707dc5923d6d13'/>
<id>urn:sha1:63166b815dc163b2e46426cecf707dc5923d6d13</id>
<content type='text'>
Modify the check for whether the timer is initialized during IO transfer
when passthrough is used with hybrid polling, to ensure that it's always
setup correctly.

Cc: stable@vger.kernel.org
Fixes: 01ee194d1aba ("io_uring: add support for hybrid IOPOLL")
Signed-off-by: hexue &lt;xue01.he@samsung.com&gt;
Link: https://lore.kernel.org/r/20250512052025.293031-1-xue01.he@samsung.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/sqpoll: Increase task_work submission batch size</title>
<updated>2025-05-09T13:56:53Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2025-05-08T18:12:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=92835cebab120f8a5f023a26a792a2ac3f816c4f'/>
<id>urn:sha1:92835cebab120f8a5f023a26a792a2ac3f816c4f</id>
<content type='text'>
Our QA team reported a 10%-23%, throughput reduction on an io_uring
sqpoll testcase doing IO to a null_blk, that I traced back to a
reduction of the device submission queue depth utilization. It turns out
that, after commit af5d68f8892f ("io_uring/sqpoll: manage task_work
privately"), we capped the number of task_work entries that can be
completed from a single spin of sqpoll to only 8 entries, before the
sqpoll goes around to (potentially) sleep.  While this cap doesn't drive
the submission side directly, it impacts the completion behavior, which
affects the number of IO queued by fio per sqpoll cycle on the
submission side, and io_uring ends up seeing less ios per sqpoll cycle.
As a result, block layer plugging is less effective, and we see more
time spent inside the block layer in profilings charts, and increased
submission latency measured by fio.

There are other places that have increased overhead once sqpoll sleeps
more often, such as the sqpoll utilization calculation.  But, in this
microbenchmark, those were not representative enough in perf charts, and
their removal didn't yield measurable changes in throughput.  The major
overhead comes from the fact we plug less, and less often, when submitting
to the block layer.

My benchmark is:

fio --ioengine=io_uring --direct=1 --iodepth=128 --runtime=300 --bs=4k \
    --invalidate=1 --time_based  --ramp_time=10 --group_reporting=1 \
    --filename=/dev/nullb0 --name=RandomReads-direct-nullb-sqpoll-4k-1 \
    --rw=randread --numjobs=1 --sqthread_poll

In one machine, tested on top of Linux 6.15-rc1, we have the following
baseline:
  READ: bw=4994MiB/s (5236MB/s), 4994MiB/s-4994MiB/s (5236MB/s-5236MB/s), io=439GiB (471GB), run=90001-90001msec

With this patch:
  READ: bw=5762MiB/s (6042MB/s), 5762MiB/s-5762MiB/s (6042MB/s-6042MB/s), io=506GiB (544GB), run=90001-90001msec

which is a 15% improvement in measured bandwidth.  The average
submission latency is noticeably lowered too.  As measured by
fio:

Baseline:
   lat (usec): min=20, max=241, avg=99.81, stdev=3.38
Patched:
   lat (usec): min=26, max=226, avg=86.48, stdev=4.82

If we look at blktrace, we can also see the plugging behavior is
improved. In the baseline, we end up limited to plugging 8 requests in
the block layer regardless of the device queue depth size, while after
patching we can drive more io, and we manage to utilize the full device
queue.

In the baseline, after a stabilization phase, an ordinary submission
looks like:
  254,0    1    49942     0.016028795  5977  U   N [iou-sqp-5976] 7

After patching, I see consistently more requests per unplug.
  254,0    1     4996     0.001432872  3145  U   N [iou-sqp-3144] 32

Ideally, the cap size would at least be the deep enough to fill the
device queue, but we can't predict that behavior, or assume all IO goes
to a single device, and thus can't guess the ideal batch size.  We also
don't want to let the tw run unbounded, though I'm not sure it would
really be a problem.  Instead, let's just give it a more sensible value
that will allow for more efficient batching.  I've tested with different
cap values, and initially proposed to increase the cap to 1024.  Jens
argued it is too big of a bump and I observed that, with 32, I'm no
longer able to observe this bottleneck in any of my machines.

Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20250508181203.3785544-1-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: ensure deferred completions are flushed for multishot</title>
<updated>2025-05-07T13:55:15Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-07T13:34:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=687b2bae0efff9b25e071737d6af5004e6e35af5'/>
<id>urn:sha1:687b2bae0efff9b25e071737d6af5004e6e35af5</id>
<content type='text'>
Multishot normally uses io_req_post_cqe() to post completions, but when
stopping it, it may finish up with a deferred completion. This is fine,
except if another multishot event triggers before the deferred completions
get flushed. If this occurs, then CQEs may get reordered in the CQ ring,
as new multishot completions get posted before the deferred ones are
flushed. This can cause confusion on the application side, if strict
ordering is required for the use case.

When multishot posting via io_req_post_cqe(), flush any pending deferred
completions first, if any.

Cc: stable@vger.kernel.org # 6.1+
Reported-by: Norman Maurer &lt;norman_maurer@apple.com&gt;
Reported-by: Christian Mazakas &lt;christian.mazakas@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: always arm linked timeouts prior to issue</title>
<updated>2025-05-04T15:15:58Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-04T14:06:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b53e523261bf058ea4a518b482222e7a277b186b'/>
<id>urn:sha1:b53e523261bf058ea4a518b482222e7a277b186b</id>
<content type='text'>
There are a few spots where linked timeouts are armed, and not all of
them adhere to the pre-arm, attempt issue, post-arm pattern. This can
be problematic if the linked request returns that it will trigger a
callback later, and does so before the linked timeout is fully armed.

Consolidate all the linked timeout handling into __io_issue_sqe(),
rather than have it spread throughout the various issue entry points.

Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/1390
Reported-by: Chase Hiltz &lt;chase@path.net&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/fdinfo: annotate racy sq/cq head/tail reads</title>
<updated>2025-04-30T13:17:17Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-04-30T13:17:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f024d3a8ded0d8d2129ae123d7a5305c29ca44ce'/>
<id>urn:sha1:f024d3a8ded0d8d2129ae123d7a5305c29ca44ce</id>
<content type='text'>
syzbot complains about the cached sq head read, and it's totally right.
But we don't need to care, it's just reading fdinfo, and reading the
CQ or SQ tail/head entries are known racy in that they are just a view
into that very instant and may of course be outdated by the time they
are reported.

Annotate both the SQ head and CQ tail read with data_race() to avoid
this syzbot complaint.

Link: https://lore.kernel.org/io-uring/6811f6dc.050a0220.39e3a1.0d0e.GAE@google.com/
Reported-by: syzbot+3e77fd302e99f5af9394@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
