linux/io_uring, branch v6.4

io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr

2023-06-21T13:34:17Z

Rather than assign the user pointer to msghdr->msg_control, assign it to msghdr->msg_control_user to make sparse happy. They are in a union so the end result is the same, but let's avoid new sparse warnings and squash this one. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/ Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries") Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe

io_uring/net: disable partial retries for recvmsg with cmsg

2023-06-21T13:34:07Z

We cannot sanely handle partial retries for recvmsg if we have cmsg attached. If we don't, then we'd just be overwriting the initial cmsg header on retries. Alternatively we could increment and handle this appropriately, but it doesn't seem worth the complication. Move the MSG_WAITALL check into the non-multishot case while at it, since MSG_WAITALL is explicitly disabled for multishot anyway. Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/ Cc: stable@vger.kernel.org # 5.10+ Reported-by: Stefan Metzmacher Reviewed-by: Stefan Metzmacher Signed-off-by: Jens Axboe

io_uring/net: clear msg_controllen on partial sendmsg retry

2023-06-21T13:33:48Z

If we have cmsg attached AND we transferred partial data at least, clear msg_controllen on retry so we don't attempt to send that again. Cc: stable@vger.kernel.org # 5.10+ Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries") Reported-by: Stefan Metzmacher Reviewed-by: Stefan Metzmacher Signed-off-by: Jens Axboe

io_uring/poll: serialize poll linked timer start with poll removal

2023-06-18T02:21:52Z

We selectively grab the ctx->uring_lock for poll update/removal, but we really should grab it from the start to fully synchronize with linked timeouts. Normally this is indeed the case, but if requests are forced async by the application, we don't fully cover removal and timer disarm within the uring_lock. Make this simpler by having consistent locking state for poll removal. Cc: stable@vger.kernel.org # 6.1+ Reported-by: Querijn Voet Signed-off-by: Jens Axboe

io_uring/io-wq: clear current->worker_private on exit

2023-06-14T18:54:55Z

A recent fix stopped clearing PF_IO_WORKER from current->flags on exit, which meant that we can now call inc/dec running on the worker after it has been removed if it ends up scheduling in/out as part of exit. If this happens after an RCU grace period has passed, then the struct pointed to by current->worker_private may have been freed, and we can now be accessing memory that is freed. Ensure this doesn't happen by clearing the task worker_private field. Both io_wq_worker_running() and io_wq_worker_sleeping() check this field before going any further, and we don't need any accounting etc done after this worker has exited. Fixes: fd37b884003c ("io_uring/io-wq: don't clear PF_IO_WORKER on exit") Reported-by: Zorro Lang Signed-off-by: Jens Axboe

io_uring/net: save msghdr->msg_control for retries

2023-06-14T01:26:42Z

If the application sets ->msg_control and we have to later retry this command, or if it got queued with IOSQE_ASYNC to begin with, then we need to retain the original msg_control value. This is due to the net stack overwriting this field with an in-kernel pointer, to copy it in. Hitting that path for the second time will now fail the copy from user, as it's attempting to copy from a non-user address. Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/issues/880 Reported-and-tested-by: Marek Majkowski Signed-off-by: Jens Axboe

io_uring/io-wq: don't clear PF_IO_WORKER on exit

2023-06-12T18:16:09Z

A recent commit gated the core dumping task exit logic on current->flags remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time. This exposed a problem with the io-wq handling of that, which explicitly clears PF_IO_WORKER before calling do_exit(). The reasons for this manual clear of PF_IO_WORKER is historical, where io-wq used to potentially trigger a sleep on exit. As the io-wq thread is exiting, it should not participate any further accounting. But these days we don't need to rely on current->flags anymore, so we can safely remove the PF_IO_WORKER clearing. Reported-by: Zorro Lang Reported-by: Dave Chinner Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/ Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") Signed-off-by: Jens Axboe Signed-off-by: Linus Torvalds

io_uring: undeprecate epoll_ctl support

2023-05-27T02:22:41Z

Libuv recently started using it so there is at least one consumer now. Cc: stable@vger.kernel.org Fixes: 61a2732af4b0 ("io_uring: deprecate epoll_ctl support") Link: https://github.com/libuv/libuv/pull/3979 Signed-off-by: Ben Noordhuis Link: https://lore.kernel.org/r/20230506095502.13401-1-info@bnoordhuis.nl Signed-off-by: Jens Axboe

io_uring: unlock sqd->lock before sq thread release CPU

2023-05-25T15:30:13Z

The sq thread actively releases CPU resources by calling the cond_resched() and schedule() interfaces when it is idle. Therefore, more resources are available for other threads to run. There exists a problem in sq thread: it does not unlock sqd->lock before releasing CPU resources every time. This makes other threads pending on sqd->lock for a long time. For example, the following interfaces all require sqd->lock: io_sq_offload_create(), io_register_iowq_max_workers() and io_ring_exit_work(). Before the sq thread releases CPU resources, unlocking sqd->lock will provide the user a better experience because it can respond quickly to user requests. Signed-off-by: Kanchan Joshi Signed-off-by: Wenwen Chen Link: https://lore.kernel.org/r/20230525082626.577862-1-wenwen.chen@samsung.com Signed-off-by: Jens Axboe

Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux

2023-05-07T17:00:09Z

Pull more io_uring updates from Jens Axboe: "Nothing major in here, just two different parts: - A small series from Breno that enables passing the full SQE down for ->uring_cmd(). This is a prerequisite for enabling full network socket operations. Queued up a bit late because of some stylistic concerns that got resolved, would be nice to have this in 6.4-rc1 so the dependent work will be easier to handle for 6.5. - Fix for the huge page coalescing, which was a regression introduced in the 6.3 kernel release (Tobias)" * tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux: io_uring: Remove unnecessary BUILD_BUG_ON io_uring: Pass whole sqe to commands io_uring: Create a helper to return the SQE size io_uring/rsrc: check for nonconsecutive pages