<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/proc/base.c, branch v4.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-12-18T22:25:40Z</updated>
<entry>
<title>proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter</title>
<updated>2015-12-18T22:25:40Z</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2015-12-18T22:22:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=41a0c249cb8706a2efa1ab3d59466b23a27d0c8b'/>
<id>urn:sha1:41a0c249cb8706a2efa1ab3d59466b23a27d0c8b</id>
<content type='text'>
Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit
774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()") removed
the setting of ret after the get_proc_task call and incorrectly left it as
-ESRCH.  Instead, return 0 when successful.

Example breakage:

  echo 0 &gt; /proc/self/coredump_filter
  bash: echo: write error: No such process

Fixes: 774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()")
Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; [4.3+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, oom: add comment for why oom_adj exists</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2015-11-06T02:50:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b72bdfa73603f2f81fbac651b9ae36807e877752'/>
<id>urn:sha1:b72bdfa73603f2f81fbac651b9ae36807e877752</id>
<content type='text'>
/proc/pid/oom_adj exists solely to avoid breaking existing userspace
binaries that write to the tunable.

Add a comment in the only possible location within the kernel tree to
describe the situation and motivation for keeping it around.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/proc, core/debug: Don't expose absolute kernel addresses via wchan</title>
<updated>2015-10-01T10:55:34Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2015-09-30T13:59:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b2f73922d119686323f14fbbe46587f863852328'/>
<id>urn:sha1:b2f73922d119686323f14fbbe46587f863852328</id>
<content type='text'>
So the /proc/PID/stat 'wchan' field (the 30th field, which contains
the absolute kernel address of the kernel function a task is blocked in)
leaks absolute kernel addresses to unprivileged user-space:

        seq_put_decimal_ull(m, ' ', wchan);

The absolute address might also leak via /proc/PID/wchan as well, if
KALLSYMS is turned off or if the symbol lookup fails for some reason:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) &lt; 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

This isn't ideal, because for example it trivially leaks the KASLR offset
to any local attacker:

  fomalhaut:~&gt; printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:

  triton:~/tip&gt; strace -f ps -eo pid:10,tid:10,wchan:30,comm 2&gt;&amp;1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

There's one compatibility quirk here: procps relies on whether the
absolute value is non-zero - and we can provide that functionality
by outputing "0" or "1" depending on whether the task is blocked
(whether there's a wchan address).

These days there appears to be very little legitimate reason
user-space would be interested in  the absolute address. The
absolute address is mostly historic: from the days when we
didn't have kallsyms and user-space procps had to do the
decoding itself via the System.map.

So this patch sets all numeric output to "0" or "1" and keeps only
symbolic output, in /proc/PID/wchan.

( The absolute sleep address can generally still be profiled via
  perf, by tasks with sufficient privileges. )

Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Kostya Serebryany &lt;kcc@google.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: kasan-dev &lt;kasan-dev@googlegroups.com&gt;
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>proc: convert to kstrto*()/kstrto*_from_user()</title>
<updated>2015-09-10T20:29:01Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2015-09-09T22:36:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=774636e19ed514cdf560006813c0473409616de8'/>
<id>urn:sha1:774636e19ed514cdf560006813c0473409616de8</id>
<content type='text'>
Convert from manual allocation/copy_from_user/...  to kstrto*() family
which were designed for exactly that.

One case can not be converted to kstrto*_from_user() to make code even
more simpler because of whitespace stripping, oh well...

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>procfs: always expose /proc/&lt;pid&gt;/map_files/ and make it readable</title>
<updated>2015-09-10T20:29:01Z</updated>
<author>
<name>Calvin Owens</name>
<email>calvinowens@fb.com</email>
</author>
<published>2015-09-09T22:35:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bdb4d100afe9818aebd1d98ced575c5ef143456c'/>
<id>urn:sha1:bdb4d100afe9818aebd1d98ced575c5ef143456c</id>
<content type='text'>
Currently, /proc/&lt;pid&gt;/map_files/ is restricted to CAP_SYS_ADMIN, and is
only exposed if CONFIG_CHECKPOINT_RESTORE is set.

Each mapped file region gets a symlink in /proc/&lt;pid&gt;/map_files/
corresponding to the virtual address range at which it is mapped.  The
symlinks work like the symlinks in /proc/&lt;pid&gt;/fd/, so you can follow them
to the backing file even if that backing file has been unlinked.

Currently, files which are mapped, unlinked, and closed are impossible to
stat() from userspace.  Exposing /proc/&lt;pid&gt;/map_files/ closes this
functionality "hole".

Not being able to stat() such files makes noticing and explicitly
accounting for the space they use on the filesystem impossible.  You can
work around this by summing up the space used by every file in the
filesystem and subtracting that total from what statfs() tells you, but
that obviously isn't great, and it becomes unworkable once your filesystem
becomes large enough.

This patch moves map_files/ out from behind CONFIG_CHECKPOINT_RESTORE, and
adjusts the permissions enforced on it as follows:

* proc_map_files_lookup()
* proc_map_files_readdir()
* map_files_d_revalidate()

	Remove the CAP_SYS_ADMIN restriction, leaving only the current
	restriction requiring PTRACE_MODE_READ. The information made
	available to userspace by these three functions is already
	available in /proc/PID/maps with MODE_READ, so I don't see any
	reason to limit them any further (see below for more detail).

* proc_map_files_follow_link()

	This stub has been added, and requires that the user have
	CAP_SYS_ADMIN in order to follow the links in map_files/,
	since there was concern on LKML both about the potential for
	bypassing permissions on ancestor directories in the path to
	files pointed to, and about what happens with more exotic
	memory mappings created by some drivers (ie dma-buf).

In older versions of this patch, I changed every permission check in
the four functions above to enforce MODE_ATTACH instead of MODE_READ.
This was an oversight on my part, and after revisiting the discussion
it seems that nobody was concerned about anything outside of what is
made possible by -&gt;follow_link(). So in this version, I've left the
checks for PTRACE_MODE_READ as-is.

[akpm@linux-foundation.org: catch up with concurrent proc_pid_follow_link() changes]
Signed-off-by: Calvin Owens &lt;calvinowens@fb.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>/proc/$PID/cmdline: fixup empty ARGV case</title>
<updated>2015-07-17T23:39:54Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2015-07-17T23:24:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3581d458c39bc5e58ceeeaf51796625bc978d2eb'/>
<id>urn:sha1:3581d458c39bc5e58ceeeaf51796625bc978d2eb</id>
<content type='text'>
/proc/*/cmdline code checks if it should look at ENVP area by checking
last byte of ARGV area:

	rv = access_remote_vm(mm, arg_end - 1, &amp;c, 1, 0);
	if (rv &lt;= 0)
		goto out_free_page;

If ARGV is somehow made empty (by doing execve(..., NULL, ...) or
manually setting -&gt;arg_start and -&gt;arg_end to equal values), the decision
will be based on byte which doesn't even belong to ARGV/ENVP.

So, quickly check if ARGV area is empty and report 0 to match previous
behaviour.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2015-07-04T15:56:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-07-04T15:56:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=22a093b2fb52fb656658a32adc80c24ddc200ca4'/>
<id>urn:sha1:22a093b2fb52fb656658a32adc80c24ddc200ca4</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Debug info and other statistics fixes and related enhancements"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Fix numa balancing stats in /proc/pid/sched
  sched/numa: Show numa_group ID in /proc/sched_debug task listings
  sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
  sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
  sched/stat: Simplify the sched_info accounting dependency
</content>
</entry>
<entry>
<title>sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y</title>
<updated>2015-07-04T08:04:31Z</updated>
<author>
<name>Naveen N. Rao</name>
<email>naveen.n.rao@linux.vnet.ibm.com</email>
</author>
<published>2015-06-30T09:06:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5968cecedd7a09f23e9fcb5f9fb4e893712f35ba'/>
<id>urn:sha1:5968cecedd7a09f23e9fcb5f9fb4e893712f35ba</id>
<content type='text'>
Expand /proc/pid/schedstat output:

 - enable it on CONFIG_TASK_DELAY_ACCT=y &amp;&amp; !CONFIG_SCHEDSTATS kernels.

 - dump all zeroes on kernels that are booted with the 'nodelayacct'
   option, which boot option disables delay accounting on
   CONFIG_TASK_DELAY_ACCT=y kernels.

Signed-off-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/5ccbef17d4bc841084ea6e6421d4e4a23b7b806f.1435654789.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs, proc: introduce CONFIG_PROC_CHILDREN</title>
<updated>2015-06-26T00:00:37Z</updated>
<author>
<name>Iago López Galeiras</name>
<email>iago@endocode.com</email>
</author>
<published>2015-06-25T22:00:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2e13ba54a2682eea24918b87ad3edf70c2cf085b'/>
<id>urn:sha1:2e13ba54a2682eea24918b87ad3edf70c2cf085b</id>
<content type='text'>
Commit 818411616baf ("fs, proc: introduce /proc/&lt;pid&gt;/task/&lt;tid&gt;/children
entry") introduced the children entry for checkpoint restore and the
file is only available on kernels configured with CONFIG_EXPERT and
CONFIG_CHECKPOINT_RESTORE.

This is available in most distributions (Fedora, Debian, Ubuntu, CoreOS)
because they usually enable CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE.
But Arch does not enable CONFIG_EXPERT or CONFIG_CHECKPOINT_RESTORE.

However, the children proc file is useful outside of checkpoint restore.
I would like to use it in rkt.  The rkt process exec() another program
it does not control, and that other program will fork()+exec() a child
process.  I would like to find the pid of the child process from an
external tool without iterating in /proc over all processes to find
which one has a parent pid equal to rkt.

This commit introduces CONFIG_PROC_CHILDREN and makes
CONFIG_CHECKPOINT_RESTORE select it.  This allows enabling
/proc/&lt;pid&gt;/task/&lt;tid&gt;/children without needing to enable
CONFIG_CHECKPOINT_RESTORE and CONFIG_EXPERT.

Alban tested that /proc/&lt;pid&gt;/task/&lt;tid&gt;/children is present when the
kernel is configured with CONFIG_PROC_CHILDREN=y but without
CONFIG_CHECKPOINT_RESTORE

Signed-off-by: Iago López Galeiras &lt;iago@endocode.com&gt;
Tested-by: Alban Crequy &lt;alban@endocode.com&gt;
Reviewed-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Djalal Harouni &lt;djalal@endocode.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: fix PAGE_SIZE limit of /proc/$PID/cmdline</title>
<updated>2015-06-26T00:00:37Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2015-06-25T22:00:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2c0bb44620dece7ec97e28167e32c343da22867'/>
<id>urn:sha1:c2c0bb44620dece7ec97e28167e32c343da22867</id>
<content type='text'>
/proc/$PID/cmdline truncates output at PAGE_SIZE. It is easy to see with

	$ cat /proc/self/cmdline $(seq 1037) 2&gt;/dev/null

However, command line size was never limited to PAGE_SIZE but to 128 KB
and relatively recently limitation was removed altogether.

People noticed and ask questions:
http://stackoverflow.com/questions/199130/how-do-i-increase-the-proc-pid-cmdline-4096-byte-limit

seq file interface is not OK, because it kmalloc's for whole output and
open + read(, 1) + sleep will pin arbitrary amounts of kernel memory.  To
not do that, limit must be imposed which is incompatible with arbitrary
sized command lines.

I apologize for hairy code, but this it direct consequence of command line
layout in memory and hacks to support things like "init [3]".

The loops are "unrolled" otherwise it is either macros which hide control
flow or functions with 7-8 arguments with equal line count.

There should be real setproctitle(2) or something.

[akpm@linux-foundation.org: fix a billion min() warnings]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Tested-by: Jarod Wilson &lt;jarod@redhat.com&gt;
Acked-by: Jarod Wilson &lt;jarod@redhat.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Jan Stancek &lt;jstancek@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
