<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/arch/alpha/kernel/syscalls, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-05-14T14:44:25Z</updated>
<entry>
<title>vfs: add faccessat2 syscall</title>
<updated>2020-05-14T14:44:25Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-05-14T14:44:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8ffd8bcdd28296a198f237cc595148a8d4adfbe'/>
<id>urn:sha1:c8ffd8bcdd28296a198f237cc595148a8d4adfbe</id>
<content type='text'>
POSIX defines faccessat() as having a fourth "flags" argument, while the
linux syscall doesn't have it.  Glibc tries to emulate AT_EACCESS and
AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken.

Add a new faccessat(2) syscall with the added flags argument and implement
both flags.

The value of AT_EACCESS is defined in glibc headers to be the same as
AT_REMOVEDIR.  Use this value for the kernel interface as well, together
with the explanatory comment.

Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can
be useful and is trivial to implement.

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>asm-generic: fix unistd_32.h generation format</title>
<updated>2020-04-07T17:43:42Z</updated>
<author>
<name>Michal Simek</name>
<email>michal.simek@xilinx.com</email>
</author>
<published>2020-04-07T03:09:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=06e85c7e9a1c1356038936566fc23f7c0d363b96'/>
<id>urn:sha1:06e85c7e9a1c1356038936566fc23f7c0d363b96</id>
<content type='text'>
Generated files are also checked by sparse that's why add newline to
remove sparse (C=1) warning.

The issue was found on Microblaze and reported like this:
./arch/microblaze/include/generated/uapi/asm/unistd_32.h:438:45: warning:
no newline at end of file

Mips and PowerPC have it already but let's align with style used by m68k.

Signed-off-by: Michal Simek &lt;michal.simek@xilinx.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Stefan Asserhall &lt;stefan.asserhall@xilinx.com&gt;
Acked-by: Max Filippov &lt;jcmvbkbc@gmail.com&gt; (xtensa)
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Paul Burton &lt;paulburton@kernel.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Richard Henderson &lt;rth@twiddle.net&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Link: http://lkml.kernel.org/r/4d32ab4e1fb2edb691d2e1687e8fb303c09fd023.1581504803.git.michal.simek@xilinx.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux</title>
<updated>2020-01-30T03:38:34Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-01-30T03:38:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=83fa805bcbfc53ae82eedd65132794ae324798e5'/>
<id>urn:sha1:83fa805bcbfc53ae82eedd65132794ae324798e5</id>
<content type='text'>
Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper
</content>
</entry>
<entry>
<title>open: introduce openat2(2) syscall</title>
<updated>2020-01-18T14:19:18Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>cyphar@cyphar.com</email>
</author>
<published>2020-01-18T12:07:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179'/>
<id>urn:sha1:fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179</id>
<content type='text'>
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).

Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.

In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.

Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).

/* Syscall Prototype. */
  /*
   * open_how is an extensible structure (similar in interface to
   * clone3(2) or sched_setattr(2)). The size parameter must be set to
   * sizeof(struct open_how), to allow for future extensions. All future
   * extensions will be appended to open_how, with their zero value
   * acting as a no-op default.
   */
  struct open_how { /* ... */ };

  int openat2(int dfd, const char *pathname,
              struct open_how *how, size_t size);

/* Description. */
The initial version of 'struct open_how' contains the following fields:

  flags
    Used to specify openat(2)-style flags. However, any unknown flag
    bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
    will result in -EINVAL. In addition, this field is 64-bits wide to
    allow for more O_ flags than currently permitted with openat(2).

  mode
    The file mode for O_CREAT or O_TMPFILE.

    Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.

  resolve
    Restrict path resolution (in contrast to O_* flags they affect all
    path components). The current set of flags are as follows (at the
    moment, all of the RESOLVE_ flags are implemented as just passing
    the corresponding LOOKUP_ flag).

    RESOLVE_NO_XDEV       =&gt; LOOKUP_NO_XDEV
    RESOLVE_NO_SYMLINKS   =&gt; LOOKUP_NO_SYMLINKS
    RESOLVE_NO_MAGICLINKS =&gt; LOOKUP_NO_MAGICLINKS
    RESOLVE_BENEATH       =&gt; LOOKUP_BENEATH
    RESOLVE_IN_ROOT       =&gt; LOOKUP_IN_ROOT

open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though -&gt;mode arguably should be a u16) to avoid having padding fields
which are never used in the future.

Note that as a result of the new how-&gt;flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).

After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.

/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.

In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).

/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).

Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.

Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).

[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb834 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs

Suggested-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>arch: wire up pidfd_getfd syscall</title>
<updated>2020-01-13T20:49:47Z</updated>
<author>
<name>Sargun Dhillon</name>
<email>sargun@sargun.me</email>
</author>
<published>2020-01-07T17:59:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9a2cef09c801de54feecd912303ace5c27237f12'/>
<id>urn:sha1:9a2cef09c801de54feecd912303ace5c27237f12</id>
<content type='text'>
This wires up the pidfd_getfd syscall for all architectures.

Signed-off-by: Sargun Dhillon &lt;sargun@sargun.me&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>y2038: use compat_{get,set}_itimer on alpha</title>
<updated>2019-11-15T13:38:30Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2019-10-25T14:59:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4c22ea2b91203564fdf392b3d3cae249b652a8ae'/>
<id>urn:sha1:4c22ea2b91203564fdf392b3d3cae249b652a8ae</id>
<content type='text'>
The itimer handling for the old alpha osf_setitimer/osf_getitimer
system calls is identical to the compat version of getitimer/setitimer,
so just use those directly.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>arch: mark syscall number 435 reserved for clone3</title>
<updated>2019-07-14T22:39:33Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian@brauner.io</email>
</author>
<published>2019-07-14T19:22:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a271a68e030f3e134de12087117574a883e20f0'/>
<id>urn:sha1:1a271a68e030f3e134de12087117574a883e20f0</id>
<content type='text'>
A while ago Arnd made it possible to give new system calls the same
syscall number on all architectures (except alpha). To not break this
nice new feature let's mark 435 for clone3 as reserved on all
architectures that do not yet implement it.
Even if an architecture does not plan to implement it this ensures that
new system calls coming after clone3 will have the same number on all
architectures.

Signed-off-by: Christian Brauner &lt;christian@brauner.io&gt;
Cc: linux-arch@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Link: https://lore.kernel.org/r/20190714192205.27190-2-christian@brauner.io
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Christian Brauner &lt;christian@brauner.io&gt;
</content>
</entry>
<entry>
<title>arch: wire-up pidfd_open()</title>
<updated>2019-06-28T10:17:55Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian@brauner.io</email>
</author>
<published>2019-05-24T10:44:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7615d9e1780e26e0178c93c55b73309a5dc093d7'/>
<id>urn:sha1:7615d9e1780e26e0178c93c55b73309a5dc093d7</id>
<content type='text'>
This wires up the pidfd_open() syscall into all arches at once.

Signed-off-by: Christian Brauner &lt;christian@brauner.io&gt;
Reviewed-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Andy Lutomirsky &lt;luto@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-api@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
</content>
</entry>
<entry>
<title>uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]</title>
<updated>2019-05-16T16:23:45Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2019-05-16T11:52:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d8076bdb56af5e5918376cd1573a6b0007fc1a89'/>
<id>urn:sha1:d8076bdb56af5e5918376cd1573a6b0007fc1a89</id>
<content type='text'>
Wire up the mount API syscalls on non-x86 arches.

Reported-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>arch: add pidfd and io_uring syscalls everywhere</title>
<updated>2019-04-15T14:31:17Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2019-02-28T12:59:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=39036cd2727395c3369b1051005da74059a85317'/>
<id>urn:sha1:39036cd2727395c3369b1051005da74059a85317</id>
<content type='text'>
Add the io_uring and pidfd_send_signal system calls to all architectures.

These system calls are designed to handle both native and compat tasks,
so all entries are the same across architectures, only arm-compat and
the generic tale still use an old format.

Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt; (powerpc)
Acked-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt; (s390)
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
</feed>
