<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/io_uring/kbuf.c, branch v6.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-12-21T16:47:06Z</updated>
<entry>
<title>io_uring/kbuf: add method for returning provided buffer ring head</title>
<updated>2023-12-21T16:47:06Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-12-21T16:02:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d293b1a89694fc4918d9a4330a71ba2458f9d581'/>
<id>urn:sha1:d293b1a89694fc4918d9a4330a71ba2458f9d581</id>
<content type='text'>
The tail of the provided ring buffer is shared between the kernel and
the application, but the head is private to the kernel as the
application doesn't need to see it. However, this also prevents the
application from knowing how many buffers the kernel has consumed.
Usually this is fine, as the information is inherently racy in that
the kernel could be consuming buffers continually, but for cleanup
purposes it may be relevant to know how many buffers are still left
in the ring.

Add IORING_REGISTER_PBUF_STATUS which will return status for a given
provided buffer ring. Right now it just returns the head, but space
is reserved for more information later in, if needed.

Link: https://github.com/axboe/liburing/discussions/1020
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: check for buffer list readiness after NULL check</title>
<updated>2023-12-05T14:02:13Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-12-05T14:02:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9865346b7e8374b57f1c3ccacdc77846c6352ff4'/>
<id>urn:sha1:9865346b7e8374b57f1c3ccacdc77846c6352ff4</id>
<content type='text'>
Move the buffer list 'is_ready' check below the validity check for
the buffer list for a given group.

Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU")
Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()</title>
<updated>2023-12-05T13:59:56Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2023-12-05T12:37:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e53f7b54b1fdecae897f25002ff0cff04faab228'/>
<id>urn:sha1:e53f7b54b1fdecae897f25002ff0cff04faab228</id>
<content type='text'>
The io_mem_alloc() function returns error pointers, not NULL.  Update
the check accordingly.

Fixes: b10b73c102a2 ("io_uring/kbuf: recycle freed mapped buffer ring entries")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Link: https://lore.kernel.org/r/5ed268d3-a997-4f64-bd71-47faa92101ab@moroto.mountain
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: free io_buffer_list entries via RCU</title>
<updated>2023-11-28T18:45:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T00:54:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5cf4f52e6d8aa2d3b7728f568abbf9d42a3af252'/>
<id>urn:sha1:5cf4f52e6d8aa2d3b7728f568abbf9d42a3af252</id>
<content type='text'>
mmap_lock nests under uring_lock out of necessity, as we may be doing
user copies with uring_lock held. However, for mmap of provided buffer
rings, we attempt to grab uring_lock with mmap_lock already held from
do_mmap(). This makes lockdep, rightfully, complain:

WARNING: possible circular locking dependency detected
6.7.0-rc1-00009-gff3337ebaf94-dirty #4438 Not tainted
------------------------------------------------------
buf-ring.t/442 is trying to acquire lock:
ffff00020e1480a8 (&amp;ctx-&gt;uring_lock){+.+.}-{3:3}, at: io_uring_validate_mmap_request.isra.0+0x4c/0x140

but task is already holding lock:
ffff0000dc226190 (&amp;mm-&gt;mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x124/0x264

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #1 (&amp;mm-&gt;mmap_lock){++++}-{3:3}:
       __might_fault+0x90/0xbc
       io_register_pbuf_ring+0x94/0x488
       __arm64_sys_io_uring_register+0x8dc/0x1318
       invoke_syscall+0x5c/0x17c
       el0_svc_common.constprop.0+0x108/0x130
       do_el0_svc+0x2c/0x38
       el0_svc+0x4c/0x94
       el0t_64_sync_handler+0x118/0x124
       el0t_64_sync+0x168/0x16c

-&gt; #0 (&amp;ctx-&gt;uring_lock){+.+.}-{3:3}:
       __lock_acquire+0x19a0/0x2d14
       lock_acquire+0x2e0/0x44c
       __mutex_lock+0x118/0x564
       mutex_lock_nested+0x20/0x28
       io_uring_validate_mmap_request.isra.0+0x4c/0x140
       io_uring_mmu_get_unmapped_area+0x3c/0x98
       get_unmapped_area+0xa4/0x158
       do_mmap+0xec/0x5b4
       vm_mmap_pgoff+0x158/0x264
       ksys_mmap_pgoff+0x1d4/0x254
       __arm64_sys_mmap+0x80/0x9c
       invoke_syscall+0x5c/0x17c
       el0_svc_common.constprop.0+0x108/0x130
       do_el0_svc+0x2c/0x38
       el0_svc+0x4c/0x94
       el0t_64_sync_handler+0x118/0x124
       el0t_64_sync+0x168/0x16c

From that mmap(2) path, we really just need to ensure that the buffer
list doesn't go away from underneath us. For the lower indexed entries,
they never go away until the ring is freed and we can always sanely
reference those as long as the caller has a file reference. For the
higher indexed ones in our xarray, we just need to ensure that the
buffer list remains valid while we return the address of it.

Free the higher indexed io_buffer_list entries via RCU. With that we can
avoid needing -&gt;uring_lock inside mmap(2), and simply hold the RCU read
lock around the buffer list lookup and address check.

To ensure that the arrayed lookup either returns a valid fully formulated
entry via RCU lookup, add an 'is_ready' flag that we access with store
and release memory ordering. This isn't needed for the xarray lookups,
but doesn't hurt either. Since this isn't a fast path, retain it across
both types. Similarly, for the allocated array inside the ctx, ensure
we use the proper load/acquire as setup could in theory be running in
parallel with mmap.

While in there, add a few lockdep checks for documentation purposes.

Cc: stable@vger.kernel.org
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: prune deferred locked cache when tearing down</title>
<updated>2023-11-28T18:45:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T00:02:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=07d6063d3d3beb3168d3ac9fdef7bca81254d983'/>
<id>urn:sha1:07d6063d3d3beb3168d3ac9fdef7bca81254d983</id>
<content type='text'>
We used to just use our page list for final teardown, which would ensure
that we got all the buffers, even the ones that were not on the normal
cached list. But while moving to slab for the io_buffers, we know only
prune this list, not the deferred locked list that we have. This can
cause a leak of memory, if the workload ends up using the intermediate
locked list.

Fix this by always pruning both lists when tearing down.

Fixes: b3a4dbc89d40 ("io_uring/kbuf: Use slab for struct io_buffer objects")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: recycle freed mapped buffer ring entries</title>
<updated>2023-11-28T18:45:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T18:17:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b10b73c102a2eab91e1cd62a03d6446f1dfecc64'/>
<id>urn:sha1:b10b73c102a2eab91e1cd62a03d6446f1dfecc64</id>
<content type='text'>
Right now we stash any potentially mmap'ed provided ring buffer range
for freeing at release time, regardless of when they get unregistered.
Since we're keeping track of these ranges anyway, keep track of their
registration state as well, and use that to recycle ranges when
appropriate rather than always allocate new ones.

The lookup is a basic scan of entries, checking for the best matching
free entry.

Fixes: c392cbecd8ec ("io_uring/kbuf: defer release of mapped buffer rings")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: defer release of mapped buffer rings</title>
<updated>2023-11-28T14:56:16Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-27T23:47:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c392cbecd8eca4c53f2bf508731257d9d0a21c2d'/>
<id>urn:sha1:c392cbecd8eca4c53f2bf508731257d9d0a21c2d</id>
<content type='text'>
If a provided buffer ring is setup with IOU_PBUF_RING_MMAP, then the
kernel allocates the memory for it and the application is expected to
mmap(2) this memory. However, io_uring uses remap_pfn_range() for this
operation, so we cannot rely on normal munmap/release on freeing them
for us.

Stash an io_buf_free entry away for each of these, if any, and provide
a helper to free them post -&gt;release().

Cc: stable@vger.kernel.org
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: indicate if io_kbuf_recycle did recycle anything</title>
<updated>2023-11-06T20:41:58Z</updated>
<author>
<name>Dylan Yudaken</name>
<email>dyudaken@gmail.com</email>
</author>
<published>2023-11-06T20:39:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=89d528ba2f8281de61163c6b62e598b64d832175'/>
<id>urn:sha1:89d528ba2f8281de61163c6b62e598b64d832175</id>
<content type='text'>
It can be useful to know if io_kbuf_recycle did actually recycle the
buffer on the request, or if it left the request alone.

Signed-off-by: Dylan Yudaken &lt;dyudaken@gmail.com&gt;
Link: https://lore.kernel.org/r/20231106203909.197089-2-dyudaken@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-6.7/io_uring-2023-10-30' of git://git.kernel.dk/linux</title>
<updated>2023-11-01T21:09:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-01T21:09:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ffa059b262ba72571e7fefe7fa2b4ebb6776b277'/>
<id>urn:sha1:ffa059b262ba72571e7fefe7fa2b4ebb6776b277</id>
<content type='text'>
Pull io_uring updates from Jens Axboe:
 "This contains the core io_uring updates, of which there are not many,
  and adds support for using WAITID through io_uring and hence not
  needing to block on these kinds of events.

  Outside of that, tweaks to the legacy provided buffer handling and
  some cleanups related to cancelations for uring_cmd support"

* tag 'for-6.7/io_uring-2023-10-30' of git://git.kernel.dk/linux:
  io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups
  io_uring/kbuf: Use slab for struct io_buffer objects
  io_uring/kbuf: Allow the full buffer id space for provided buffers
  io_uring/kbuf: Fix check of BID wrapping in provided buffers
  io_uring/rsrc: cleanup io_pin_pages()
  io_uring: cancelable uring_cmd
  io_uring: retain top 8bits of uring_cmd flags for kernel internal use
  io_uring: add IORING_OP_WAITID support
  exit: add internal include file with helpers
  exit: add kernel_waitid_prepare() helper
  exit: move core of do_wait() into helper
  exit: abstract out should_wake helper for child_wait_callback()
  io_uring/rw: add support for IORING_OP_READ_MULTISHOT
  io_uring/rw: mark readv/writev as vectored in the opcode definition
  io_uring/rw: split io_read() into a helper
</content>
</entry>
<entry>
<title>io_uring/kbuf: Use slab for struct io_buffer objects</title>
<updated>2023-10-05T14:38:17Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2023-10-05T00:05:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3a4dbc89d4021b3f90ff6a13537111a004f9d07'/>
<id>urn:sha1:b3a4dbc89d4021b3f90ff6a13537111a004f9d07</id>
<content type='text'>
The allocation of struct io_buffer for metadata of provided buffers is
done through a custom allocator that directly gets pages and
fragments them.  But, slab would do just fine, as this is not a hot path
(in fact, it is a deprecated feature) and, by keeping a custom allocator
implementation we lose benefits like tracking, poisoning,
sanitizers. Finally, the custom code is more complex and requires
keeping the list of pages in struct ctx for no good reason.  This patch
cleans this path up and just uses slab.

I microbenchmarked it by forcing the allocation of a large number of
objects with the least number of io_uring commands possible (keeping
nbufs=USHRT_MAX), with and without the patch.  There is a slight
increase in time spent in the allocation with slab, of course, but even
when allocating to system resources exhaustion, which is not very
realistic and happened around 1/2 billion provided buffers for me, it
wasn't a significant hit in system time.  Specially if we think of a
real-world scenario, an application doing register/unregister of
provided buffers will hit ctx-&gt;io_buffers_cache more often than actually
going to slab.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20231005000531.30800-4-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
