<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/io_uring/io_uring.c, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-04-14T12:38:23Z</updated>
<entry>
<title>io_uring: complete request via task work in case of DEFER_TASKRUN</title>
<updated>2023-04-14T12:38:23Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2023-04-14T07:53:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=860e1c7f8b0b43fbf91b4d689adfaa13adb89452'/>
<id>urn:sha1:860e1c7f8b0b43fbf91b4d689adfaa13adb89452</id>
<content type='text'>
So far io_req_complete_post() only covers DEFER_TASKRUN by completing
request via task work when the request is completed from IOWQ.

However, uring command could be completed from any context, and if io
uring is setup with DEFER_TASKRUN, the command is required to be
completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
can't be wakeup, and may hang forever.

The issue can be observed on removing ublk device, but turns out it is
one generic issue for uring command &amp; DEFER_TASKRUN, so solve it in
io_uring core code.

Fixes: e6aeb2721d3b ("io_uring: complete all requests in task context")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/
Reported-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: fix memory leak when removing provided buffers</title>
<updated>2023-04-01T22:52:12Z</updated>
<author>
<name>Wojciech Lukowicz</name>
<email>wlukowicz01@gmail.com</email>
</author>
<published>2023-04-01T19:50:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4a72c0589fdea6259720375426179888969d6a2'/>
<id>urn:sha1:b4a72c0589fdea6259720375426179888969d6a2</id>
<content type='text'>
When removing provided buffers, io_buffer structs are not being disposed
of, leading to a memory leak. They can't be freed individually, because
they are allocated in page-sized groups. They need to be added to some
free list instead, such as io_buffers_cache. All callers already hold
the lock protecting it, apart from when destroying buffers, so had to
extend the lock there.

Fixes: cc3cec8367cb ("io_uring: speedup provided buffer handling")
Signed-off-by: Wojciech Lukowicz &lt;wlukowicz01@gmail.com&gt;
Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: silence variable ‘prev’ set but not used warning</title>
<updated>2023-03-09T17:10:58Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-03-09T16:51:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa780334a8c392d959ae05eb19f2410b3a1e6cb0'/>
<id>urn:sha1:fa780334a8c392d959ae05eb19f2410b3a1e6cb0</id>
<content type='text'>
If io_uring.o is built with W=1, it triggers a warning:

io_uring/io_uring.c: In function ‘__io_submit_flush_completions’:
io_uring/io_uring.c:1502:40: warning: variable ‘prev’ set but not used [-Wunused-but-set-variable]
 1502 |         struct io_wq_work_node *node, *prev;
      |                                        ^~~~

which is due to the wq_list_for_each() iterator always keeping a 'prev'
variable. Most users need this to remove an entry from a list, for
example, but __io_submit_flush_completions() never does that.

Add a basic helper that doesn't track prev instead, and use that in
that function.

Reported-by: Vincenzo Palazzo &lt;vincenzopalazzodev@gmail.com&gt;
Reviewed-by: Vincenzo Palazzo &lt;vincenzopalazzodev@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux</title>
<updated>2023-03-03T18:25:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-03-03T18:25:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53ae7e117637ff201fdf038b68e76a7202112dea'/>
<id>urn:sha1:53ae7e117637ff201fdf038b68e76a7202112dea</id>
<content type='text'>
Pull more io_uring updates from Jens Axboe:
 "Here's a set of fixes/changes that didn't make the first cut, either
  because they got queued before I sent the early merge request, or
  fixes that came in afterwards. In detail:

   - Don't set MSG_NOSIGNAL on recv/recvmsg opcodes, as AF_PACKET will
     error out (David)

   - Fix for spurious poll wakeups (me)

   - Fix for a file leak for buffered reads in certain conditions
     (Joseph)

   - Don't allow registered buffers of mixed types (Pavel)

   - Improve handling of huge pages for registered buffers (Pavel)

   - Provided buffer ring size calculation fix (Wojciech)

   - Minor cleanups (me)"

* tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux:
  io_uring/poll: don't pass in wake func to io_init_poll_iocb()
  io_uring: fix fget leak when fs don't support nowait buffered read
  io_uring/poll: allow some retries for poll triggering spuriously
  io_uring: remove MSG_NOSIGNAL from recvmsg
  io_uring/rsrc: always initialize 'folio' to NULL
  io_uring/rsrc: optimise registered huge pages
  io_uring/rsrc: optimise single entry advance
  io_uring/rsrc: disallow multi-source reg buffers
  io_uring: remove unused wq_list_merge
  io_uring: fix size calculation when registering buf ring
  io_uring/rsrc: fix a comment in io_import_fixed()
  io_uring: rename 'in_idle' to 'in_cancel'
  io_uring: consolidate the put_ref-and-return section of adding work
</content>
</entry>
<entry>
<title>io_uring: fix fget leak when fs don't support nowait buffered read</title>
<updated>2023-02-28T12:58:42Z</updated>
<author>
<name>Joseph Qi</name>
<email>joseph.qi@linux.alibaba.com</email>
</author>
<published>2023-02-28T04:54:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=54aa7f2330b82884f4a1afce0220add6e8312f8b'/>
<id>urn:sha1:54aa7f2330b82884f4a1afce0220add6e8312f8b</id>
<content type='text'>
Heming reported a BUG when using io_uring doing link-cp on ocfs2. [1]

Do the following steps can reproduce this BUG:
mount -t ocfs2 /dev/vdc /mnt/ocfs2
cp testfile /mnt/ocfs2/
./link-cp /mnt/ocfs2/testfile /mnt/ocfs2/testfile.1
umount /mnt/ocfs2

Then umount will fail, and it outputs:
umount: /mnt/ocfs2: target is busy.

While tracing umount, it blames mnt_get_count() not return as expected.
Do a deep investigation for fget()/fput() on related code flow, I've
finally found that fget() leaks since ocfs2 doesn't support nowait
buffered read.

io_issue_sqe
|-io_assign_file  // do fget() first
  |-io_read
  |-io_iter_do_read
    |-ocfs2_file_read_iter  // return -EOPNOTSUPP
  |-kiocb_done
    |-io_rw_done
      |-__io_complete_rw_common  // set REQ_F_REISSUE
    |-io_resubmit_prep
      |-io_req_prep_async  // override req-&gt;file, leak happens

This was introduced by commit a196c78b5443 in v5.18. Fix it by don't
re-assign req-&gt;file if it has already been assigned.

[1] https://lore.kernel.org/ocfs2-devel/ab580a75-91c8-d68a-3455-40361be1bfa8@linux.alibaba.com/T/#t

Fixes: a196c78b5443 ("io_uring: assign non-fixed early for async work")
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: Heming Zhao &lt;heming.zhao@suse.com&gt;
Signed-off-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Xiaoguang Wang &lt;xiaoguang.wang@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20230228045459.13524-1-joseph.qi@linux.alibaba.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-02-24T01:09:35Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-24T01:09:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3822a7c40997dc86b1458766a3f146d62393f084'/>
<id>urn:sha1:3822a7c40997dc86b1458766a3f146d62393f084</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() &amp; fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove -&gt;rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
</content>
</entry>
<entry>
<title>io_uring: rename 'in_idle' to 'in_cancel'</title>
<updated>2023-02-22T16:57:23Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-02-17T15:27:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d664282a03fec09682f10252d3c785c2513691d'/>
<id>urn:sha1:8d664282a03fec09682f10252d3c785c2513691d</id>
<content type='text'>
This better describes what it does - it's incremented when the task is
currently undergoing a cancelation operation, due to exiting or exec'ing.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: consolidate the put_ref-and-return section of adding work</title>
<updated>2023-02-22T16:57:23Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-02-17T15:22:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ce8e04f6e5d3b2d14cd00cc4c0b1cc8cbdcf4d12'/>
<id>urn:sha1:ce8e04f6e5d3b2d14cd00cc4c0b1cc8cbdcf4d12</id>
<content type='text'>
We've got a few cases of this, move them to one section and just use
gotos to get there. Reduces the text section on both arm64 and x86-64,
using gcc-12.2.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: Support calling io_uring_register with a registered ring fd</title>
<updated>2023-02-16T13:09:30Z</updated>
<author>
<name>Josh Triplett</name>
<email>josh@joshtriplett.org</email>
</author>
<published>2023-02-15T00:42:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7d3fd88d61a41016da01889f076fd1c60c7298fc'/>
<id>urn:sha1:7d3fd88d61a41016da01889f076fd1c60c7298fc</id>
<content type='text'>
Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
of the opcode) to treat the fd as a registered index rather than a file
descriptor.

This makes it possible for a library to open an io_uring, register the
ring fd, close the ring fd, and subsequently use the ring entirely via
registered index.

Signed-off-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Link: https://lore.kernel.org/r/f2396369e638284586b069dbddffb8c992afba95.1676419314.git.josh@joshtriplett.org
[axboe: remove extra high bit clear]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async</title>
<updated>2023-01-29T22:18:26Z</updated>
<author>
<name>Dylan Yudaken</name>
<email>dylany@meta.com</email>
</author>
<published>2023-01-27T13:52:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6bb30855560e6343e7b88595d7c3159d0f848a04'/>
<id>urn:sha1:6bb30855560e6343e7b88595d7c3159d0f848a04</id>
<content type='text'>
REQ_F_FORCE_ASYNC was being ignored for re-queueing linked
requests. Instead obey that flag.

Signed-off-by: Dylan Yudaken &lt;dylany@meta.com&gt;
Link: https://lore.kernel.org/r/20230127135227.3646353-2-dylany@meta.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
