<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/signal.c, branch v5.6</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.6</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.6'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-02-26T17:54:03Z</updated>
<entry>
<title>signal: avoid double atomic counter increments for user accounting</title>
<updated>2020-02-26T17:54:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-24T20:47:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fda31c50292a5062332fa0343c084bd9f46604d9'/>
<id>urn:sha1:fda31c50292a5062332fa0343c084bd9f46604d9</id>
<content type='text'>
When queueing a signal, we increment both the users count of pending
signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
of the user struct itself (because we keep a reference to the user in
the signal structure in order to correctly account for it when freeing).

That turns out to be fairly expensive, because both of them are atomic
updates, and particularly under extreme signal handling pressure on big
machines, you can get a lot of cache contention on the user struct.
That can then cause horrid cacheline ping-pong when you do these
multiple accesses.

So change the reference counting to only pin the user for the _first_
pending signal, and to unpin it when the last pending signal is
dequeued.  That means that when a user sees a lot of concurrent signal
queuing - which is the only situation when this matters - the only
atomic access needed is generally the 'sigpending' count update.

This was noticed because of a particularly odd timing artifact on a
dual-socket 96C/192T Cascade Lake platform: when you get into bad
contention, on that machine for some reason seems to be much worse when
the contention happens in the upper 32-byte half of the cacheline.

As a result, the kernel test robot will-it-scale 'signal1' benchmark had
an odd performance regression simply due to random alignment of the
'struct user_struct' (and pointed to a completely unrelated and
apparently nonsensical commit for the regression).

Avoiding the double increments (and decrements on the dequeueing side,
of course) makes for much less contention and hugely improved
performance on that will-it-scale microbenchmark.

Quoting Feng Tang:

 "It makes a big difference, that the performance score is tripled! bump
  from original 17000 to 54000. Also the gap between 5.0-rc6 and
  5.0-rc6+Jiri's patch is reduced to around 2%"

[ The "2% gap" is the odd cacheline placement difference on that
  platform: under the extreme contention case, the effect of which half
  of the cacheline was hot was 5%, so with the reduced contention the
  odd timing artifact is reduced too ]

It does help in the non-contended case too, but is not nearly as
noticeable.

Reported-and-tested-by: Feng Tang &lt;feng.tang@intel.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Huang, Ying &lt;ying.huang@intel.com&gt;
Cc: Philip Li &lt;philip.li@intel.com&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched.h: Annotate sighand_struct with __rcu</title>
<updated>2020-01-26T09:54:47Z</updated>
<author>
<name>Madhuparna Bhowmik</name>
<email>madhuparnabhowmik10@gmail.com</email>
</author>
<published>2020-01-24T04:59:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=913292c97d750fe4188b4f5aa770e5e0ca1e5a91'/>
<id>urn:sha1:913292c97d750fe4188b4f5aa770e5e0ca1e5a91</id>
<content type='text'>
This patch fixes the following sparse errors by annotating the
sighand_struct with __rcu

kernel/fork.c:1511:9: error: incompatible types in comparison expression
kernel/exit.c:100:19: error: incompatible types in comparison expression
kernel/signal.c:1370:27: error: incompatible types in comparison expression

This fix introduces the following sparse error in signal.c due to
checking the sighand pointer without rcu primitives:

kernel/signal.c:1386:21: error: incompatible types in comparison expression

This new sparse error is also fixed in this patch.

Signed-off-by: Madhuparna Bhowmik &lt;madhuparnabhowmik10@gmail.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/r/20200124045908.26389-1-madhuparnabhowmik10@gmail.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()</title>
<updated>2019-10-11T15:39:57Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2019-10-09T15:02:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=937c6b27c73e02cd4114f95f5c37ba2c29fadba1'/>
<id>urn:sha1:937c6b27c73e02cd4114f95f5c37ba2c29fadba1</id>
<content type='text'>
ptrace_stop() does preempt_enable_no_resched() to avoid the preemption,
but after that cgroup_enter_frozen() does spin_lock/unlock and this adds
another preemption point.

Reported-and-tested-by: Bruce Ashfield &lt;bruce.ashfield@gmail.com&gt;
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux</title>
<updated>2019-09-16T16:28:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-16T16:28:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c17112a5c413f20188da276c138484e7127cdc84'/>
<id>urn:sha1:c17112a5c413f20188da276c138484e7127cdc84</id>
<content type='text'>
Pull pidfd/waitid updates from Christian Brauner:
 "This contains two features and various tests.

  First, it adds support for waiting on process through pidfds by adding
  the P_PIDFD type to the waitid() syscall. This completes the basic
  functionality of the pidfd api (cf. [1]). In the meantime we also have
  a new adition to the userspace projects that make use of the pidfd
  api. The qt project was nice enough to send a mail pointing out that
  they have a pr up to switch to the pidfd api (cf. [2]).

  Second, this tag contains an extension to the waitid() syscall to make
  it possible to wait on the current process group in a race free manner
  (even though the actual problem is very unlikely) by specifing 0
  together with the P_PGID type. This extension traces back to a
  discussion on the glibc development mailing list.

  There are also a range of tests for the features above. Additionally,
  the test-suite which detected the pidfd-polling race we fixed in [3]
  is included in this tag"

[1] https://lwn.net/Articles/794707/
[2] https://codereview.qt-project.org/c/qt/qtbase/+/108456
[3] commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state")

* tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  waitid: Add support for waiting for the current process group
  tests: add pidfd poll tests
  tests: move common definitions and functions into pidfd.h
  pidfd: add pidfd_wait tests
  pidfd: add P_PIDFD to waitid()
</content>
</entry>
<entry>
<title>signal: Allow cifs and drbd to receive their terminating signals</title>
<updated>2019-08-19T11:34:13Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2019-08-16T17:33:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33da8e7c814f77310250bb54a9db36a44c5de784'/>
<id>urn:sha1:33da8e7c814f77310250bb54a9db36a44c5de784</id>
<content type='text'>
My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd.  I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN.  So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.

Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig.  As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL.  Which after the first signal will allow userspace
to send signals to these kernel threads.  At least for
wake_ack_receiver in drbd that does not appear actively wrong.

So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.

Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.

This change was partly inspired by similar cifs and drbd patches that
added allow_signal.

Reported-by: ronnie sahlberg &lt;ronniesahlberg@gmail.com&gt;
Reported-by: Christoph Böhmwalder &lt;christoph.boehmwalder@linbit.com&gt;
Tested-by: Christoph Böhmwalder &lt;christoph.boehmwalder@linbit.com&gt;
Cc: Steve French &lt;smfrench@gmail.com&gt;
Cc: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Cc: David Laight &lt;David.Laight@ACULAB.COM&gt;
Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>kernel/signal.c: fix a kernel-doc markup</title>
<updated>2019-08-03T14:02:00Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab+samsung@kernel.org</email>
</author>
<published>2019-08-03T04:48:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=68d8681e97bd1c90259f341c1695af05002070ef'/>
<id>urn:sha1:68d8681e97bd1c90259f341c1695af05002070ef</id>
<content type='text'>
The kernel-doc parser doesn't handle expressions with %foo*.  Instead,
when an asterisk should be part of a constant, it uses an alternative
notation: `foo*`.

Link: http://lkml.kernel.org/r/7f18c2e0b5e39e6b7eb55ddeb043b8b260b49f2d.1563361575.git.mchehab+samsung@kernel.org
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Cc: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidfd: add P_PIDFD to waitid()</title>
<updated>2019-08-01T19:49:46Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2019-07-27T22:22:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3695eae5fee0605f316fbaad0b9e3de791d7dfaf'/>
<id>urn:sha1:3695eae5fee0605f316fbaad0b9e3de791d7dfaf</id>
<content type='text'>
This adds the P_PIDFD type to waitid().
One of the last remaining bits for the pidfd api is to make it possible
to wait on pidfds. With P_PIDFD added to waitid() the parts of userspace
that want to use the pidfd api to exclusively manage processes can do so
now.

One of the things this will unblock in the future is the ability to make
it possible to retrieve the exit status via waitid(P_PIDFD) for
non-parent processes if handed a _suitable_ pidfd that has this feature
set. This is similar to what you can do on FreeBSD with kqueue(). It
might even end up being possible to wait on a process as a non-parent if
an appropriate property is enabled on the pidfd.

With P_PIDFD no scoping of the process identified by the pidfd is
possible, i.e. it explicitly blocks things such as wait4(-1), wait4(0),
waitid(P_ALL), waitid(P_PGID) etc. It only allows for semantics
equivalent to wait4(pid), waitid(P_PID). Users that need scoping should
rely on pid-based wait*() syscalls for now.

Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Andy Lutomirsky &lt;luto@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lore.kernel.org/r/20190727222229.6516-2-christian@brauner.io
</content>
</entry>
<entry>
<title>pidfd: Add warning if exit_state is 0 during notification</title>
<updated>2019-07-29T15:20:19Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2019-07-24T16:48:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1caf7d50f46bd0388e38e653b146aa81700e8eb8'/>
<id>urn:sha1:1caf7d50f46bd0388e38e653b146aa81700e8eb8</id>
<content type='text'>
Previously a condition got missed where the pidfd waiters are awakened
before the exit_state gets set. This can result in a missed notification
[1] and the polling thread waiting forever.

It is fixed now, however it would be nice to avoid this kind of issue
going unnoticed in the future. So just add a warning to catch it in the
future.

/* References */
[1]: https://lore.kernel.org/lkml/20190717172100.261204-1-joel@joelfernandes.org/

Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Link: https://lore.kernel.org/r/20190724164816.201099-1-joel@joelfernandes.org
Signed-off-by: Christian Brauner &lt;christian@brauner.io&gt;
</content>
</entry>
<entry>
<title>signal: simplify set_user_sigmask/restore_user_sigmask</title>
<updated>2019-07-17T02:23:24Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2019-07-16T23:29:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b772434be0891ed1081a08ae7cfd4666728f8e82'/>
<id>urn:sha1:b772434be0891ed1081a08ae7cfd4666728f8e82</id>
<content type='text'>
task-&gt;saved_sigmask and -&gt;restore_sigmask are only used in the ret-from-
syscall paths.  This means that set_user_sigmask() can save -&gt;blocked in
-&gt;saved_sigmask and do set_restore_sigmask() to indicate that -&gt;blocked
was modified.

This way the callers do not need 2 sigset_t's passed to set/restore and
restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
into the trivial helper which just calls restore_saved_sigmask().

Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Eric Wong &lt;e@80x24.org&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: David Laight &lt;David.Laight@aculab.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux</title>
<updated>2019-07-11T05:17:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-11T05:17:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5450e8a316a64cddcbc15f90733ebc78aa736545'/>
<id>urn:sha1:5450e8a316a64cddcbc15f90733ebc78aa736545</id>
<content type='text'>
Pull pidfd updates from Christian Brauner:
 "This adds two main features.

   - First, it adds polling support for pidfds. This allows process
     managers to know when a (non-parent) process dies in a race-free
     way.

     The notification mechanism used follows the same logic that is
     currently used when the parent of a task is notified of a child's
     death. With this patchset it is possible to put pidfds in an
     {e}poll loop and get reliable notifications for process (i.e.
     thread-group) exit.

   - The second feature compliments the first one by making it possible
     to retrieve pollable pidfds for processes that were not created
     using CLONE_PIDFD.

     A lot of processes get created with traditional PID-based calls
     such as fork() or clone() (without CLONE_PIDFD). For these
     processes a caller can currently not create a pollable pidfd. This
     is a problem for Android's low memory killer (LMK) and service
     managers such as systemd.

  Both patchsets are accompanied by selftests.

  It's perhaps worth noting that the work done so far and the work done
  in this branch for pidfd_open() and polling support do already see
  some adoption:

   - Android is in the process of backporting this work to all their LTS
     kernels [1]

   - Service managers make use of pidfd_send_signal but will need to
     wait until we enable waiting on pidfds for full adoption.

   - And projects I maintain make use of both pidfd_send_signal and
     CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

[2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753

* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add pidfd_open() tests
  arch: wire-up pidfd_open()
  pid: add pidfd_open()
  pidfd: add polling selftests
  pidfd: add polling support
</content>
</entry>
</feed>
