<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/pid.c, branch v5.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-11-15T22:49:22Z</updated>
<entry>
<title>fork: extend clone3() to support setting a PID</title>
<updated>2019-11-15T22:49:22Z</updated>
<author>
<name>Adrian Reber</name>
<email>areber@redhat.com</email>
</author>
<published>2019-11-15T12:36:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=49cb2fc42ce4b7a656ee605e30c302efaa39c1a7'/>
<id>urn:sha1:49cb2fc42ce4b7a656ee605e30c302efaa39c1a7</id>
<content type='text'>
The main motivation to add set_tid to clone3() is CRIU.

To restore a process with the same PID/TID CRIU currently uses
/proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
ns_last_pid and then (quickly) does a clone(). This works most of the
time, but it is racy. It is also slow as it requires multiple syscalls.

Extending clone3() to support *set_tid makes it possible restore a
process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
race free (as long as the desired PID/TID is available).

This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
on clone3() with *set_tid as they are currently in place for ns_last_pid.

The original version of this change was using a single value for
set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
decided to change set_tid to an array to enable setting the PID of a
process in multiple PID namespaces at the same time. If a process is
created in a PID namespace it is possible to influence the PID inside
and outside of the PID namespace. Details also in the corresponding
selftest.

To create a process with the following PIDs:

      PID NS level         Requested PID
        0 (host)              31496
        1                        42
        2                         1

For that example the two newly introduced parameters to struct
clone_args (set_tid and set_tid_size) would need to be:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid[2] = 31496;
  set_tid_size = 3;

If only the PIDs of the two innermost nested PID namespaces should be
defined it would look like this:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid_size = 2;

The PID of the newly created process would then be the next available
free PID in the PID namespace level 0 (host) and 42 in the PID namespace
at level 1 and the PID of the process in the innermost PID namespace
would be 1.

The set_tid array is used to specify the PID of a process starting
from the innermost nested PID namespaces up to set_tid_size PID namespaces.

set_tid_size cannot be larger then the current PID namespace level.

Signed-off-by: Adrian Reber &lt;areber@redhat.com&gt;
Reviewed-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Acked-by: Andrei Vagin &lt;avagin@gmail.com&gt;
Link: https://lore.kernel.org/r/20191115123621.142252-1-areber@redhat.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>pid: use pid_has_task() in pidfd_open()</title>
<updated>2019-10-17T13:37:00Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2019-10-17T10:18:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1e1d0f0b1a3e3533ea4cd4021eb251e53827c70b'/>
<id>urn:sha1:1e1d0f0b1a3e3533ea4cd4021eb251e53827c70b</id>
<content type='text'>
Use the new pid_has_task() helper in pidfd_open(). This simplifies the
code and avoids taking rcu_read_{lock,unlock}() and leads to overall
nicer code.

Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Link: https://lore.kernel.org/r/20191017101832.5985-5-christian.brauner@ubuntu.com
</content>
</entry>
<entry>
<title>pid: use pid_has_task() in __change_pid()</title>
<updated>2019-10-17T13:36:56Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2019-10-17T10:18:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1d416a113f0c0da7a3608ffe4d44bd8e9c4859fa'/>
<id>urn:sha1:1d416a113f0c0da7a3608ffe4d44bd8e9c4859fa</id>
<content type='text'>
Replace hlist_empty() with the new pid_has_task() helper which is more
idiomatic, easier to grep for, and unifies how callers perform this
check.

Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Link: https://lore.kernel.org/r/20191017101832.5985-3-christian.brauner@ubuntu.com
</content>
</entry>
<entry>
<title>kernel/pid.c: convert struct pid count to refcount_t</title>
<updated>2019-07-17T02:23:24Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2019-07-16T23:30:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f57e515a1b56325a28a0972c632a623a9c84590c'/>
<id>urn:sha1:f57e515a1b56325a28a0972c632a623a9c84590c</id>
<content type='text'>
struct pid's count is an atomic_t field used as a refcount.  Use
refcount_t for it which is basically atomic_t but does additional
checking to prevent use-after-free bugs.

For memory ordering, the only change is with the following:

 -	if ((atomic_read(&amp;pid-&gt;count) == 1) ||
 -	     atomic_dec_and_test(&amp;pid-&gt;count)) {
 +	if (refcount_dec_and_test(&amp;pid-&gt;count)) {
 		kmem_cache_free(ns-&gt;pid_cachep, pid);

Here the change is from: Fully ordered --&gt; RELEASE + ACQUIRE (as per
refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the
free happens after the refcount_dec_and_test().

The above hunk also removes atomic_read() since it is not needed for the
code to work and it is unclear how beneficial it is.  The removal lets
refcount_dec_and_test() check for cases where get_pid() happened before
the object was freed.

Link: http://lkml.kernel.org/r/20190701183826.191936-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Reviewed-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: KJ Tsanaktsidis &lt;ktsanaktsidis@zendesk.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux</title>
<updated>2019-07-11T05:17:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-11T05:17:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5450e8a316a64cddcbc15f90733ebc78aa736545'/>
<id>urn:sha1:5450e8a316a64cddcbc15f90733ebc78aa736545</id>
<content type='text'>
Pull pidfd updates from Christian Brauner:
 "This adds two main features.

   - First, it adds polling support for pidfds. This allows process
     managers to know when a (non-parent) process dies in a race-free
     way.

     The notification mechanism used follows the same logic that is
     currently used when the parent of a task is notified of a child's
     death. With this patchset it is possible to put pidfds in an
     {e}poll loop and get reliable notifications for process (i.e.
     thread-group) exit.

   - The second feature compliments the first one by making it possible
     to retrieve pollable pidfds for processes that were not created
     using CLONE_PIDFD.

     A lot of processes get created with traditional PID-based calls
     such as fork() or clone() (without CLONE_PIDFD). For these
     processes a caller can currently not create a pollable pidfd. This
     is a problem for Android's low memory killer (LMK) and service
     managers such as systemd.

  Both patchsets are accompanied by selftests.

  It's perhaps worth noting that the work done so far and the work done
  in this branch for pidfd_open() and polling support do already see
  some adoption:

   - Android is in the process of backporting this work to all their LTS
     kernels [1]

   - Service managers make use of pidfd_send_signal but will need to
     wait until we enable waiting on pidfds for full adoption.

   - And projects I maintain make use of both pidfd_send_signal and
     CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

[2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753

* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add pidfd_open() tests
  arch: wire-up pidfd_open()
  pid: add pidfd_open()
  pidfd: add polling selftests
  pidfd: add polling support
</content>
</entry>
<entry>
<title>pid: add pidfd_open()</title>
<updated>2019-06-28T10:17:55Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian@brauner.io</email>
</author>
<published>2019-05-24T10:43:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=32fcb426ec001cb6d5a4a195091a8486ea77e2df'/>
<id>urn:sha1:32fcb426ec001cb6d5a4a195091a8486ea77e2df</id>
<content type='text'>
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.

Signed-off-by: Christian Brauner &lt;christian@brauner.io&gt;
Reviewed-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Andy Lutomirsky &lt;luto@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-api@vger.kernel.org
</content>
</entry>
<entry>
<title>pidfd: add polling support</title>
<updated>2019-06-28T10:17:55Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2019-04-30T16:21:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b53b0b9d9a613c418057f6cb921c2f40a6f78c24'/>
<id>urn:sha1:b53b0b9d9a613c418057f6cb921c2f40a6f78c24</id>
<content type='text'>
This patch adds polling support to pidfd.

Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
Using the polling support, LMK will be able to get notified when a process
exists in race-free and fast way, and allows the LMK to do other things
(such as by polling on other fds) while awaiting the process being killed
to die.

For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when the
tasks waiting on a poll of pidfd are also awakened in this patch.

We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
   it in task_struct would not be option in this case because the task can
   be reaped and destroyed before the poll returns.

2. By including the struct pid for the waitqueue means that during
   de_thread(), the new thread group leader automatically gets the new
   waitqueue/pid even though its task_struct is different.

Appropriate test cases are added in the second patch to provide coverage of
all the cases the patch is handling.

Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Daniel Colascione &lt;dancol@google.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Tim Murray &lt;timmurray@google.com&gt;
Cc: Jonathan Kowalski &lt;bl0pbl33p@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: kernel-team@android.com
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Co-developed-by: Daniel Colascione &lt;dancol@google.com&gt;
Signed-off-by: Daniel Colascione &lt;dancol@google.com&gt;
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Signed-off-by: Christian Brauner &lt;christian@brauner.io&gt;
</content>
</entry>
<entry>
<title>treewide: Add SPDX license identifier for missed files</title>
<updated>2019-05-21T08:50:45Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-19T12:08:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=457c89965399115e5cd8bf38f9c597293405703d'/>
<id>urn:sha1:457c89965399115e5cd8bf38f9c597293405703d</id>
<content type='text'>
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/pid.c: remove unneeded hash header file</title>
<updated>2019-05-15T02:52:51Z</updated>
<author>
<name>Timmy Li</name>
<email>scuttimmy@gmail.com</email>
</author>
<published>2019-05-14T22:45:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1fd402df4586bcc239298081449ce58a78211626'/>
<id>urn:sha1:1fd402df4586bcc239298081449ce58a78211626</id>
<content type='text'>
Hash functions are not needed since idr is used now.  Let's remove hash
header file for cleanup.

Link: http://lkml.kernel.org/r/20190430053319.95913-1-scuttimmy@gmail.com
Signed-off-by: Timmy Li &lt;scuttimmy@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: KJ Tsanaktsidis &lt;ktsanaktsidis@zendesk.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Fix failure path in alloc_pid()</title>
<updated>2018-12-28T20:42:30Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2018-12-28T15:22:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a80dade010c7a7f4885a4c4c2a7ac22cc7b34df'/>
<id>urn:sha1:1a80dade010c7a7f4885a4c4c2a7ac22cc7b34df</id>
<content type='text'>
The failure path removes the allocated PIDs from the wrong namespace.
This could lead to us inadvertently reusing PIDs in the leaf namespace
and leaking PIDs in parent namespaces.

Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
