<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/io_uring, branch v6.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-12-14T23:52:13Z</updated>
<entry>
<title>io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation</title>
<updated>2023-12-14T23:52:13Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2023-12-14T21:34:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1ba0e9d69b2000e95267c888cbfa91d823388d47'/>
<id>urn:sha1:1ba0e9d69b2000e95267c888cbfa91d823388d47</id>
<content type='text'>
	In 8e9fad0e70b7 "io_uring: Add io_uring command support for sockets"
you've got an include of asm-generic/ioctls.h done in io_uring/uring_cmd.c.
That had been done for the sake of this chunk -
+               ret = prot-&gt;ioctl(sk, SIOCINQ, &amp;arg);
+               if (ret)
+                       return ret;
+               return arg;
+       case SOCKET_URING_OP_SIOCOUTQ:
+               ret = prot-&gt;ioctl(sk, SIOCOUTQ, &amp;arg);

SIOC{IN,OUT}Q are defined to symbols (FIONREAD and TIOCOUTQ) that come from
ioctls.h, all right, but the values vary by the architecture.

FIONREAD is
	0x467F on mips
	0x4004667F on alpha, powerpc and sparc
	0x8004667F on sh and xtensa
	0x541B everywhere else
TIOCOUTQ is
	0x7472 on mips
	0x40047473 on alpha, powerpc and sparc
	0x80047473 on sh and xtensa
	0x5411 everywhere else

-&gt;ioctl() expects the same values it would've gotten from userland; all
places where we compare with SIOC{IN,OUT}Q are using asm/ioctls.h, so
they pick the correct values.  io_uring_cmd_sock(), OTOH, ends up
passing the default ones.

Fixes: 8e9fad0e70b7 ("io_uring: Add io_uring command support for sockets")
Cc:  &lt;stable@vger.kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lore.kernel.org/r/20231214213408.GT1674809@ZenIV
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE</title>
<updated>2023-12-13T15:58:15Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-12-13T15:58:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=595e52284d24adc376890d3fc93bdca4707d9aca'/>
<id>urn:sha1:595e52284d24adc376890d3fc93bdca4707d9aca</id>
<content type='text'>
There are a few quirks around using lazy wake for poll unconditionally,
and one of them is related the EPOLLEXCLUSIVE. Those may trigger
exclusive wakeups, which wake a limited number of entries in the wait
queue. If that wake number is less than the number of entries someone is
waiting for (and that someone is also using DEFER_TASKRUN), then we can
get stuck waiting for more entries while we should be processing the ones
we already got.

If we're doing exclusive poll waits, flag the request as not being
compatible with lazy wakeups.

Reported-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Fixes: 6ce4a93dbb5b ("io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/af_unix: disable sending io_uring over sockets</title>
<updated>2023-12-07T17:35:19Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2023-12-06T13:26:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=705318a99a138c29a512a72c3e0043b3cd7f55f4'/>
<id>urn:sha1:705318a99a138c29a512a72c3e0043b3cd7f55f4</id>
<content type='text'>
File reference cycles have caused lots of problems for io_uring
in the past, and it still doesn't work exactly right and races with
unix_stream_read_generic(). The safest fix would be to completely
disallow sending io_uring files via sockets via SCM_RIGHT, so there
are no possible cycles invloving registered files and thus rendering
SCM accounting on the io_uring side unnecessary.

Cc:  &lt;stable@vger.kernel.org&gt;
Fixes: 0091bfc81741b ("io_uring/af_unix: defer registered files gc to io_uring release")
Reported-and-suggested-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/c716c88321939156909cfa1bd8b0faaf1c804103.1701868795.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: check for buffer list readiness after NULL check</title>
<updated>2023-12-05T14:02:13Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-12-05T14:02:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9865346b7e8374b57f1c3ccacdc77846c6352ff4'/>
<id>urn:sha1:9865346b7e8374b57f1c3ccacdc77846c6352ff4</id>
<content type='text'>
Move the buffer list 'is_ready' check below the validity check for
the buffer list for a given group.

Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU")
Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()</title>
<updated>2023-12-05T13:59:56Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2023-12-05T12:37:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e53f7b54b1fdecae897f25002ff0cff04faab228'/>
<id>urn:sha1:e53f7b54b1fdecae897f25002ff0cff04faab228</id>
<content type='text'>
The io_mem_alloc() function returns error pointers, not NULL.  Update
the check accordingly.

Fixes: b10b73c102a2 ("io_uring/kbuf: recycle freed mapped buffer ring entries")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Link: https://lore.kernel.org/r/5ed268d3-a997-4f64-bd71-47faa92101ab@moroto.mountain
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: fix mutex_unlock with unreferenced ctx</title>
<updated>2023-12-04T02:09:28Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2023-12-03T15:37:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f7b32e785042d2357c5abc23ca6db1b92c91a070'/>
<id>urn:sha1:f7b32e785042d2357c5abc23ca6db1b92c91a070</id>
<content type='text'>
Callers of mutex_unlock() have to make sure that the mutex stays alive
for the whole duration of the function call. For io_uring that means
that the following pattern is not valid unless we ensure that the
context outlives the mutex_unlock() call.

mutex_lock(&amp;ctx-&gt;uring_lock);
req_put(req); // typically via io_req_task_submit()
mutex_unlock(&amp;ctx-&gt;uring_lock);

Most contexts are fine: io-wq pins requests, syscalls hold the file,
task works are taking ctx references and so on. However, the task work
fallback path doesn't follow the rule.

Cc:  &lt;stable@vger.kernel.org&gt;
Fixes: 04fc6c802d ("io_uring: save ctx put/get for task_work submit")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/io-uring/CAG48ez3xSoYb+45f1RLtktROJrpiDQ1otNvdR+YLQf7m+Krj5Q@mail.gmail.com/
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: use fget/fput consistently</title>
<updated>2023-11-28T18:56:29Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T17:29:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=73363c262d6a7d26063da96610f61baf69a70f7c'/>
<id>urn:sha1:73363c262d6a7d26063da96610f61baf69a70f7c</id>
<content type='text'>
Normally within a syscall it's fine to use fdget/fdput for grabbing a
file from the file table, and it's fine within io_uring as well. We do
that via io_uring_enter(2), io_uring_register(2), and then also for
cancel which is invoked from the latter. io_uring cannot close its own
file descriptors as that is explicitly rejected, and for the cancel
side of things, the file itself is just used as a lookup cookie.

However, it is more prudent to ensure that full references are always
grabbed. For anything threaded, either explicitly in the application
itself or through use of the io-wq worker threads, this is what happens
anyway. Generalize it and use fget/fput throughout.

Also see the below link for more details.

Link: https://lore.kernel.org/io-uring/CAG48ez1htVSO3TqmrF8QcX2WFuYTRM-VZ_N10i-VZgbtg=NNqw@mail.gmail.com/
Suggested-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: free io_buffer_list entries via RCU</title>
<updated>2023-11-28T18:45:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T00:54:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5cf4f52e6d8aa2d3b7728f568abbf9d42a3af252'/>
<id>urn:sha1:5cf4f52e6d8aa2d3b7728f568abbf9d42a3af252</id>
<content type='text'>
mmap_lock nests under uring_lock out of necessity, as we may be doing
user copies with uring_lock held. However, for mmap of provided buffer
rings, we attempt to grab uring_lock with mmap_lock already held from
do_mmap(). This makes lockdep, rightfully, complain:

WARNING: possible circular locking dependency detected
6.7.0-rc1-00009-gff3337ebaf94-dirty #4438 Not tainted
------------------------------------------------------
buf-ring.t/442 is trying to acquire lock:
ffff00020e1480a8 (&amp;ctx-&gt;uring_lock){+.+.}-{3:3}, at: io_uring_validate_mmap_request.isra.0+0x4c/0x140

but task is already holding lock:
ffff0000dc226190 (&amp;mm-&gt;mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x124/0x264

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #1 (&amp;mm-&gt;mmap_lock){++++}-{3:3}:
       __might_fault+0x90/0xbc
       io_register_pbuf_ring+0x94/0x488
       __arm64_sys_io_uring_register+0x8dc/0x1318
       invoke_syscall+0x5c/0x17c
       el0_svc_common.constprop.0+0x108/0x130
       do_el0_svc+0x2c/0x38
       el0_svc+0x4c/0x94
       el0t_64_sync_handler+0x118/0x124
       el0t_64_sync+0x168/0x16c

-&gt; #0 (&amp;ctx-&gt;uring_lock){+.+.}-{3:3}:
       __lock_acquire+0x19a0/0x2d14
       lock_acquire+0x2e0/0x44c
       __mutex_lock+0x118/0x564
       mutex_lock_nested+0x20/0x28
       io_uring_validate_mmap_request.isra.0+0x4c/0x140
       io_uring_mmu_get_unmapped_area+0x3c/0x98
       get_unmapped_area+0xa4/0x158
       do_mmap+0xec/0x5b4
       vm_mmap_pgoff+0x158/0x264
       ksys_mmap_pgoff+0x1d4/0x254
       __arm64_sys_mmap+0x80/0x9c
       invoke_syscall+0x5c/0x17c
       el0_svc_common.constprop.0+0x108/0x130
       do_el0_svc+0x2c/0x38
       el0_svc+0x4c/0x94
       el0t_64_sync_handler+0x118/0x124
       el0t_64_sync+0x168/0x16c

From that mmap(2) path, we really just need to ensure that the buffer
list doesn't go away from underneath us. For the lower indexed entries,
they never go away until the ring is freed and we can always sanely
reference those as long as the caller has a file reference. For the
higher indexed ones in our xarray, we just need to ensure that the
buffer list remains valid while we return the address of it.

Free the higher indexed io_buffer_list entries via RCU. With that we can
avoid needing -&gt;uring_lock inside mmap(2), and simply hold the RCU read
lock around the buffer list lookup and address check.

To ensure that the arrayed lookup either returns a valid fully formulated
entry via RCU lookup, add an 'is_ready' flag that we access with store
and release memory ordering. This isn't needed for the xarray lookups,
but doesn't hurt either. Since this isn't a fast path, retain it across
both types. Similarly, for the allocated array inside the ctx, ensure
we use the proper load/acquire as setup could in theory be running in
parallel with mmap.

While in there, add a few lockdep checks for documentation purposes.

Cc: stable@vger.kernel.org
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: prune deferred locked cache when tearing down</title>
<updated>2023-11-28T18:45:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T00:02:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=07d6063d3d3beb3168d3ac9fdef7bca81254d983'/>
<id>urn:sha1:07d6063d3d3beb3168d3ac9fdef7bca81254d983</id>
<content type='text'>
We used to just use our page list for final teardown, which would ensure
that we got all the buffers, even the ones that were not on the normal
cached list. But while moving to slab for the io_buffers, we know only
prune this list, not the deferred locked list that we have. This can
cause a leak of memory, if the workload ends up using the intermediate
locked list.

Fix this by always pruning both lists when tearing down.

Fixes: b3a4dbc89d40 ("io_uring/kbuf: Use slab for struct io_buffer objects")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/kbuf: recycle freed mapped buffer ring entries</title>
<updated>2023-11-28T18:45:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-28T18:17:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b10b73c102a2eab91e1cd62a03d6446f1dfecc64'/>
<id>urn:sha1:b10b73c102a2eab91e1cd62a03d6446f1dfecc64</id>
<content type='text'>
Right now we stash any potentially mmap'ed provided ring buffer range
for freeing at release time, regardless of when they get unregistered.
Since we're keeping track of these ranges anyway, keep track of their
registration state as well, and use that to recycle ranges when
appropriate rather than always allocate new ones.

The lookup is a basic scan of entries, checking for the best matching
free entry.

Fixes: c392cbecd8ec ("io_uring/kbuf: defer release of mapped buffer rings")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
