<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/ipc/sem.c, branch v5.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-07-01T18:06:07Z</updated>
<entry>
<title>ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock</title>
<updated>2021-07-01T18:06:07Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2021-07-01T01:57:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=17d056e0bdaab3d3f1fbec1ac154addcc4183aed'/>
<id>urn:sha1:17d056e0bdaab3d3f1fbec1ac154addcc4183aed</id>
<content type='text'>
The patch solves three weaknesses in ipc/sem.c:

1) The initial read of use_global_lock in sem_lock() is an intentional
   race.  KCSAN detects these accesses and prints a warning.

2) The code assumes that plain C read/writes are not mangled by the CPU
   or the compiler.

3) The comment it sysvipc_sem_proc_show() was hard to understand: The
   rest of the comments in ipc/sem.c speaks about sem_perm.lock, and
   suddenly this function speaks about ipc_lock_object().

To solve 1) and 2), use READ_ONCE()/WRITE_ONCE().  Plain C reads are used
in code that owns sma-&gt;sem_perm.lock.

The comment is updated to solve 3)

[manfred@colorfullife.com: use READ_ONCE()/WRITE_ONCE() for use_global_lock]
  Link: https://lkml.kernel.org/r/20210627161919.3196-3-manfred@colorfullife.com

Link: https://lkml.kernel.org/r/20210514175319.12195-1-manfred@colorfullife.com
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: &lt;1vier1@web.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc sem: use kvmalloc for sem_undo allocation</title>
<updated>2021-07-01T18:06:07Z</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@virtuozzo.com</email>
</author>
<published>2021-07-01T01:57:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc37a3b8b4388e73e8e3525556d9f1feeb232bb9'/>
<id>urn:sha1:fc37a3b8b4388e73e8e3525556d9f1feeb232bb9</id>
<content type='text'>
Patch series "ipc: allocations cleanup", v2.

Some ipc objects use the wrong allocation functions: small objects can use
kmalloc(), and vice versa, potentially large objects can use kmalloc().

This patch (of 2):

Size of sem_undo can exceed one page and with the maximum possible nsems =
32000 it can grow up to 64Kb.  Let's switch its allocation to kvmalloc to
avoid user-triggered disruptive actions like OOM killer in case of
high-order memory shortage.

User triggerable high order allocations are quite a problem on heavily
fragmented systems.  They can be a DoS vector.

Link: https://lkml.kernel.org/r/ebc3ac79-3190-520d-81ce-22ad194986ec@virtuozzo.com
Link: https://lkml.kernel.org/r/a6354fd9-2d55-2e63-dd4d-fa7dc1d11134@virtuozzo.com
Signed-off-by: Vasily Averin &lt;vvs@virtuozzo.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry</title>
<updated>2021-05-23T01:09:07Z</updated>
<author>
<name>Varad Gautam</name>
<email>varad.gautam@suse.com</email>
</author>
<published>2021-05-23T00:41:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a11ddb37bf367e6b5239b95ca759e5389bb46048'/>
<id>urn:sha1:a11ddb37bf367e6b5239b95ca759e5389bb46048</id>
<content type='text'>
do_mq_timedreceive calls wq_sleep with a stack local address.  The
sender (do_mq_timedsend) uses this address to later call pipelined_send.

This leads to a very hard to trigger race where a do_mq_timedreceive
call might return and leave do_mq_timedsend to rely on an invalid
address, causing the following crash:

  RIP: 0010:wake_q_add_safe+0x13/0x60
  Call Trace:
   __x64_sys_mq_timedsend+0x2a9/0x490
   do_syscall_64+0x80/0x680
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f5928e40343

The race occurs as:

1. do_mq_timedreceive calls wq_sleep with the address of `struct
   ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
   holds a valid `struct ext_wait_queue *` as long as the stack has not
   been overwritten.

2. `ewq_addr` gets added to info-&gt;e_wait_q[RECV].list in wq_add, and
   do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
   __pipelined_op.

3. Sender calls __pipelined_op::smp_store_release(&amp;this-&gt;state,
   STATE_READY).  Here is where the race window begins.  (`this` is
   `ewq_addr`.)

4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
   will see `state == STATE_READY` and break.

5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
   to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
   stack.  (Although the address may not get overwritten until another
   function happens to touch it, which means it can persist around for an
   indefinite time.)

6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
   `struct ext_wait_queue *`, and uses it to find a task_struct to pass to
   the wake_q_add_safe call.  In the lucky case where nothing has
   overwritten `ewq_addr` yet, `ewq_addr-&gt;task` is the right task_struct.
   In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
   bogus address as the receiver's task_struct causing the crash.

do_mq_timedsend::__pipelined_op() should not dereference `this` after
setting STATE_READY, as the receiver counterpart is now free to return.
Change __pipelined_op to call wake_q_add_safe on the receiver's
task_struct returned by get_task_struct, instead of dereferencing `this`
which sits on the receiver's stack.

As Manfred pointed out, the race potentially also exists in
ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare.  Fix
those in the same way.

Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers")
Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers")
Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers")
Signed-off-by: Varad Gautam &lt;varad.gautam@suse.com&gt;
Reported-by: Matthias von Faber &lt;matthias.vonfaber@aox-tech.de&gt;
Acked-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: spelling fix</title>
<updated>2021-05-07T07:26:34Z</updated>
<author>
<name>Bhaskar Chowdhury</name>
<email>unixbhaskar@gmail.com</email>
</author>
<published>2021-05-07T01:06:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7497835f7e8dae01c4850ce7204f6a8a7f58f2e5'/>
<id>urn:sha1:7497835f7e8dae01c4850ce7204f6a8a7f58f2e5</id>
<content type='text'>
s/purpuse/purpose/

Link: https://lkml.kernel.org/r/20210319221432.26631-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury &lt;unixbhaskar@gmail.com&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: mundane typo fixes</title>
<updated>2021-05-07T07:26:33Z</updated>
<author>
<name>Bhaskar Chowdhury</name>
<email>unixbhaskar@gmail.com</email>
</author>
<published>2021-05-07T01:05:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b1989a3db45a6e8a5f1178bab621e8b9b8838602'/>
<id>urn:sha1:b1989a3db45a6e8a5f1178bab621e8b9b8838602</id>
<content type='text'>
s/runtine/runtime/
s/AQUIRE/ACQUIRE/
s/seperately/separately/
s/wont/won\'t/
s/succesfull/successful/

Link: https://lkml.kernel.org/r/20210326022240.26375-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury &lt;unixbhaskar@gmail.com&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Use fallthrough pseudo-keyword</title>
<updated>2020-08-23T22:36:59Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2020-08-23T22:36:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df561f6688fef775baa341a0f5d960becd248b11'/>
<id>urn:sha1:df561f6688fef775baa341a0f5d960becd248b11</id>
<content type='text'>
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipc: uninline functions</title>
<updated>2020-08-12T17:58:02Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-08-12T01:37:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=00898e8599a184ae05e2c00903e85dc80d82a43d'/>
<id>urn:sha1:00898e8599a184ae05e2c00903e85dc80d82a43d</id>
<content type='text'>
Two functions are only called via function pointers, don't bother
inlining them.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Link: http://lkml.kernel.org/r/20200710200312.GA960353@localhost.localdomain
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"</title>
<updated>2020-02-21T19:22:15Z</updated>
<author>
<name>Ioanna Alifieraki</name>
<email>ioanna-maria.alifieraki@canonical.com</email>
</author>
<published>2020-02-21T04:04:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=edf28f4061afe4c2d9eb1c3323d90e882c1d6800'/>
<id>urn:sha1:edf28f4061afe4c2d9eb1c3323d90e882c1d6800</id>
<content type='text'>
This reverts commit a97955844807e327df11aa33869009d14d6b7de0.

Commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage
in exit_sem()") removes a lock that is needed.  This leads to a process
looping infinitely in exit_sem() and can also lead to a crash.  There is
a reproducer available in [1] and with the commit reverted the issue
does not reproduce anymore.

Using the reproducer found in [1] is fairly easy to reach a point where
one of the child processes is looping infinitely in exit_sem between
for(;;) and if (semid == -1) block, while it's trying to free its last
sem_undo structure which has already been freed by freeary().

Each sem_undo struct is on two lists: one per semaphore set (list_id)
and one per process (list_proc).  The list_id list tracks undos by
semaphore set, and the list_proc by process.

Undo structures are removed either by freeary() or by exit_sem().  The
freeary function is invoked when the user invokes a syscall to remove a
semaphore set.  During this operation freeary() traverses the list_id
associated with the semaphore set and removes the undo structures from
both the list_id and list_proc lists.

For this case, exit_sem() is called at process exit.  Each process
contains a struct sem_undo_list (referred to as "ulp") which contains
the head for the list_proc list.  When the process exits, exit_sem()
traverses this list to remove each sem_undo struct.  As in freeary(),
whenever a sem_undo struct is removed from list_proc, it is also removed
from the list_id list.

Removing elements from list_id is safe for both exit_sem() and freeary()
due to sem_lock().  Removing elements from list_proc is not safe;
freeary() locks &amp;un-&gt;ulp-&gt;lock when it performs
list_del_rcu(&amp;un-&gt;list_proc) but exit_sem() does not (locking was
removed by commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list
lock usage in exit_sem()").

This can result in the following situation while executing the
reproducer [1] : Consider a child process in exit_sem() and the parent
in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).

 - The list_proc for the child contains the last two undo structs A and
   B (the rest have been removed either by exit_sem() or freeary()).

 - The semid for A is 1 and semid for B is 2.

 - exit_sem() removes A and at the same time freeary() removes B.

 - Since A and B have different semid sem_lock() will acquire different
   locks for each process and both can proceed.

The bug is that they remove A and B from the same list_proc at the same
time because only freeary() acquires the ulp lock. When exit_sem()
removes A it makes ulp-&gt;list_proc.next to point at B and at the same
time freeary() removes B setting B-&gt;semid=-1.

At the next iteration of for(;;) loop exit_sem() will try to remove B.

The only way to break from for(;;) is for (&amp;un-&gt;list_proc ==
&amp;ulp-&gt;list_proc) to be true which is not. Then exit_sem() will check if
B-&gt;semid=-1 which is and will continue looping in for(;;) until the
memory for B is reallocated and the value at B-&gt;semid is changed.

At that point, exit_sem() will crash attempting to unlink B from the
lists (this can be easily triggered by running the reproducer [1] a
second time).

To prove this scenario instrumentation was added to keep information
about each sem_undo (un) struct that is removed per process and per
semaphore set (sma).

          CPU0                                CPU1
  [caller holds sem_lock(sma for A)]      ...
  freeary()                               exit_sem()
  ...                                     ...
  ...                                     sem_lock(sma for B)
  spin_lock(A-&gt;ulp-&gt;lock)                 ...
  list_del_rcu(un_A-&gt;list_proc)           list_del_rcu(un_B-&gt;list_proc)

Undo structures A and B have different semid and sem_lock() operations
proceed.  However they belong to the same list_proc list and they are
removed at the same time.  This results into ulp-&gt;list_proc.next
pointing to the address of B which is already removed.

After reverting commit a97955844807 ("ipc,sem: remove uneeded
sem_undo_list lock usage in exit_sem()") the issue was no longer
reproducible.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779

Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
Fixes: a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
Signed-off-by: Ioanna Alifieraki &lt;ioanna-maria.alifieraki@canonical.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Acked-by: Herton R. Krzesinski &lt;herton@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: &lt;malat@debian.org&gt;
Cc: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: document and update memory barriers</title>
<updated>2020-02-04T03:05:24Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2020-02-04T01:34:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8116b54e7e23ef948ecac0e0ab78d10888265cab'/>
<id>urn:sha1:8116b54e7e23ef948ecac0e0ab78d10888265cab</id>
<content type='text'>
Document and update the memory barriers in ipc/sem.c:

- Add smp_store_release() to wake_up_sem_queue_prepare() and
  document why it is needed.

- Read q-&gt;status using READ_ONCE+smp_acquire__after_ctrl_dep().
  as the pair for the barrier inside wake_up_sem_queue_prepare().

- Add comments to all barriers, and mention the rules in the block
  regarding locking.

- Switch to using wake_q_add_safe().

Link: http://lkml.kernel.org/r/20191020123305.14715-6-manfred@colorfullife.com
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: &lt;1vier1@web.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: convert to use built-in RCU list checking</title>
<updated>2019-09-26T00:51:41Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2019-09-25T23:48:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=984035ad7b247ccc62b06e113eea3fc673f114cc'/>
<id>urn:sha1:984035ad7b247ccc62b06e113eea3fc673f114cc</id>
<content type='text'>
CONFIG_PROVE_RCU_LIST requires list_for_each_entry_rcu() to pass a lockdep
expression if using srcu or locking for protection.  It can only check
regular RCU protection, all other protection needs to be passed as lockdep
expression.

Link: http://lkml.kernel.org/r/20190830231817.76862-2-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: "Gustavo A. R. Silva" &lt;gustavo@embeddedor.com&gt;
Cc: Jonathan Derrick &lt;jonathan.derrick@intel.com&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Lorenzo Pieralisi &lt;lorenzo.pieralisi@arm.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
