<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/proc/root.c, branch v5.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-06-10T19:54:54Z</updated>
<entry>
<title>proc: s_fs_info may be NULL when proc_kill_sb is called</title>
<updated>2020-06-10T19:54:54Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-06-10T18:35:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=058f2e4da79b23afb56ce3d03d907d6cdd36f2b8'/>
<id>urn:sha1:058f2e4da79b23afb56ce3d03d907d6cdd36f2b8</id>
<content type='text'>
syzbot found that proc_fill_super() fails before filling up sb-&gt;s_fs_info,
deactivate_locked_super() will be called and sb-&gt;s_fs_info will be NULL.
The proc_kill_sb() does not expect fs_info to be NULL which is wrong.

Link: https://lore.kernel.org/lkml/0000000000002d7ca605a7b8b1c5@google.com
Reported-by: syzbot+4abac52934a48af5ff19@syzkaller.appspotmail.com
Fixes: fa10fed30f25 ("proc: allow to mount many instances of proc in one pid namespace")
Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: use named enums for better readability</title>
<updated>2020-04-22T15:51:22Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e61bb8b36a287dddc71bdf30be775e7abcaa595c'/>
<id>urn:sha1:e61bb8b36a287dddc71bdf30be775e7abcaa595c</id>
<content type='text'>
Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: use human-readable values for hidepid</title>
<updated>2020-04-22T15:51:22Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1c6c4d112e81a919d4ea83ec6cbc2f55203217fd'/>
<id>urn:sha1:1c6c4d112e81a919d4ea83ec6cbc2f55203217fd</id>
<content type='text'>
The hidepid parameter values are becoming more and more and it becomes
difficult to remember what each new magic number means.

Backward compatibility is preserved since it is possible to specify
numerical value for the hidepid parameter. This does not break the
fsconfig since it is not possible to specify a numerical value through
it. All numeric values are converted to a string. The type
FSCONFIG_SET_BINARY cannot be used to indicate a numerical value.

Selftest has been added to verify this behavior.

Suggested-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: add option to mount only a pids subset</title>
<updated>2020-04-22T15:51:22Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6814ef2d992af09451bbeda4770daa204461329e'/>
<id>urn:sha1:6814ef2d992af09451bbeda4770daa204461329e</id>
<content type='text'>
This allows to hide all files and directories in the procfs that are not
related to tasks.

Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option</title>
<updated>2020-04-22T15:51:21Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=24a71ce5c47f6b1b3cdacf544cb24220f5c3b7ef'/>
<id>urn:sha1:24a71ce5c47f6b1b3cdacf544cb24220f5c3b7ef</id>
<content type='text'>
If "hidepid=4" mount option is set then do not instantiate pids that
we can not ptrace. "hidepid=4" means that procfs should only contain
pids that the caller can ptrace.

Signed-off-by: Djalal Harouni &lt;tixxdz@gmail.com&gt;
Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: allow to mount many instances of proc in one pid namespace</title>
<updated>2020-04-22T15:51:21Z</updated>
<author>
<name>Alexey Gladkov</name>
<email>gladkov.alexey@gmail.com</email>
</author>
<published>2020-04-19T14:10:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa10fed30f2550313a8284365b3e2398526eb42c'/>
<id>urn:sha1:fa10fed30f2550313a8284365b3e2398526eb42c</id>
<content type='text'>
This patch allows to have multiple procfs instances inside the
same pid namespace. The aim here is lightweight sandboxes, and to allow
that we have to modernize procfs internals.

1) The main aim of this work is to have on embedded systems one
supervisor for apps. Right now we have some lightweight sandbox support,
however if we create pid namespacess we have to manages all the
processes inside too, where our goal is to be able to run a bunch of
apps each one inside its own mount namespace without being able to
notice each other. We only want to use mount namespaces, and we want
procfs to behave more like a real mount point.

2) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/&lt;pids&gt;/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/&lt;pids&gt;/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which users can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/&lt;pid&gt;/ is a Linux ABI using
filesystem syscalls. With this change LSMs should be able to analyze
open/read/write/close...

In the new patch set version I removed the 'newinstance' option
as suggested by Eric W. Biederman.

Selftest has been added to verify new behavior.

Signed-off-by: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Reviewed-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: Handle umounts cleanly</title>
<updated>2020-04-16T04:52:29Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-04-15T17:37:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4fa3b1c417377c352208ee9f487e17cfcee32348'/>
<id>urn:sha1:4fa3b1c417377c352208ee9f487e17cfcee32348</id>
<content type='text'>
syzbot writes:
&gt; KASAN: use-after-free Read in dput (2)
&gt;
&gt; proc_fill_super: allocate dentry failed
&gt; ==================================================================
&gt; BUG: KASAN: use-after-free in fast_dput fs/dcache.c:727 [inline]
&gt; BUG: KASAN: use-after-free in dput+0x53e/0xdf0 fs/dcache.c:846
&gt; Read of size 4 at addr ffff88808a618cf0 by task syz-executor.0/8426
&gt;
&gt; CPU: 0 PID: 8426 Comm: syz-executor.0 Not tainted 5.6.0-next-20200412-syzkaller #0
&gt; Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
&gt; Call Trace:
&gt;  __dump_stack lib/dump_stack.c:77 [inline]
&gt;  dump_stack+0x188/0x20d lib/dump_stack.c:118
&gt;  print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382
&gt;  __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511
&gt;  kasan_report+0x33/0x50 mm/kasan/common.c:625
&gt;  fast_dput fs/dcache.c:727 [inline]
&gt;  dput+0x53e/0xdf0 fs/dcache.c:846
&gt;  proc_kill_sb+0x73/0xf0 fs/proc/root.c:195
&gt;  deactivate_locked_super+0x8c/0xf0 fs/super.c:335
&gt;  vfs_get_super+0x258/0x2d0 fs/super.c:1212
&gt;  vfs_get_tree+0x89/0x2f0 fs/super.c:1547
&gt;  do_new_mount fs/namespace.c:2813 [inline]
&gt;  do_mount+0x1306/0x1b30 fs/namespace.c:3138
&gt;  __do_sys_mount fs/namespace.c:3347 [inline]
&gt;  __se_sys_mount fs/namespace.c:3324 [inline]
&gt;  __x64_sys_mount+0x18f/0x230 fs/namespace.c:3324
&gt;  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
&gt;  entry_SYSCALL_64_after_hwframe+0x49/0xb3
&gt; RIP: 0033:0x45c889
&gt; Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 &lt;48&gt; 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
&gt; RSP: 002b:00007ffc1930ec48 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
&gt; RAX: ffffffffffffffda RBX: 0000000001324914 RCX: 000000000045c889
&gt; RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000000000000
&gt; RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000
&gt; R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
&gt; R13: 0000000000000749 R14: 00000000004ca15a R15: 0000000000000013

Looking at the code now that it the internal mount of proc is no
longer used it is possible to unmount proc.   If proc is unmounted
the fields of the pid namespace that were used for filesystem
specific state are not reinitialized.

Which means that proc_self and proc_thread_self can be pointers to
already freed dentries.

The reported user after free appears to be from mounting and
unmounting proc followed by mounting proc again and using error
injection to cause the new root dentry allocation to fail.  This in
turn results in proc_kill_sb running with proc_self and
proc_thread_self still retaining their values from the previous mount
of proc.  Then calling dput on either proc_self of proc_thread_self
will result in double put.  Which KASAN sees as a use after free.

Solve this by always reinitializing the filesystem state stored
in the struct pid_namespace, when proc is unmounted.

Reported-by: syzbot+72868dd424eb66c6b95f@syzkaller.appspotmail.com
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Fixes: 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: Remove the now unnecessary internal mount of proc</title>
<updated>2020-02-28T18:06:14Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-02-20T14:08:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=69879c01a0c3f70e0887cfb4d9ff439814361e46'/>
<id>urn:sha1:69879c01a0c3f70e0887cfb4d9ff439814361e46</id>
<content type='text'>
There remains no more code in the kernel using pids_ns-&gt;proc_mnt,
therefore remove it from the kernel.

The big benefit of this change is that one of the most error prone and
tricky parts of the pid namespace implementation, maintaining kernel
mounts of proc is removed.

In addition removing the unnecessary complexity of the kernel mount
fixes a regression that caused the proc mount options to be ignored.
Now that the initial mount of proc comes from userspace, those mount
options are again honored.  This fixes Android's usage of the proc
hidepid option.

Reported-by: Alistair Strachan &lt;astrachan@google.com&gt;
Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns.")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2020-02-08T21:26:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-08T21:26:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c9d35ee049b40f1d73e890bf88dd55f83b1e9be8'/>
<id>urn:sha1:c9d35ee049b40f1d73e890bf88dd55f83b1e9be8</id>
<content type='text'>
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context-&gt;log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
</content>
</entry>
<entry>
<title>procfs: switch to use of invalfc()</title>
<updated>2020-02-07T19:48:42Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-12-22T02:34:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bf45f7fcc4003a8347a172354e2b8b59a259822c'/>
<id>urn:sha1:bf45f7fcc4003a8347a172354e2b8b59a259822c</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
