<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/exec.c, branch v3.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-06-20T21:39:36Z</updated>
<entry>
<title>mm: correctly synchronize rss-counters at exit/exec</title>
<updated>2012-06-20T21:39:36Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-06-20T19:53:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4fe7efdbdfb1c7e7a7f31decfd831c0f31d37091'/>
<id>urn:sha1:4fe7efdbdfb1c7e7a7f31decfd831c0f31d37091</id>
<content type='text'>
do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task-&gt;rss_stat and thus make
mm-&gt;rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf &lt;markus@trippelsdorf.de&gt;
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "mm: correctly synchronize rss-counters at exit/exec"</title>
<updated>2012-06-08T00:54:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-06-08T00:54:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=48d212a2eecaca2e1875925837ad27b2f43f48a3'/>
<id>urn:sha1:48d212a2eecaca2e1875925837ad27b2f43f48a3</id>
<content type='text'>
This reverts commit 40af1bbdca47e5c8a2044039bb78ca8fd8b20f94.

It's horribly and utterly broken for at least the following reasons:

 - calling sync_mm_rss() from mmput() is fundamentally wrong, because
   there's absolutely no reason to believe that the task that does the
   mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
   you name it.

 - calling it *after* having done mmdrop() on it is doubly insane, since
   the mm struct may well be gone now.

 - testing mm against NULL before you call it is insane too, since a
NULL mm there would have caused oopses long before.

.. and those are just the three bugs I found before I decided to give up
looking for me and revert it asap.  I should have caught it before I
even took it, but I trusted Andrew too much.

Cc: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Markus Trippelsdorf &lt;markus@trippelsdorf.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: correctly synchronize rss-counters at exit/exec</title>
<updated>2012-06-07T21:43:55Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-06-07T21:21:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=40af1bbdca47e5c8a2044039bb78ca8fd8b20f94'/>
<id>urn:sha1:40af1bbdca47e5c8a2044039bb78ca8fd8b20f94</id>
<content type='text'>
mm-&gt;rss_stat counters have per-task delta: task-&gt;rss_stat.  Before
changing task-&gt;mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task-&gt;signal-&gt;maxrss, taskstats,
audit and other stuff.  Unfortunately the kernel does this before
calling mm_release(), which can call put_user() for processing
task-&gt;clear_child_tid.  So at this point we can trigger page-faults and
task-&gt;rss_stat becomes non-zero again.  As a result mm-&gt;rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should
be no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Reported-by: Markus Trippelsdorf &lt;markus@trippelsdorf.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;		[3.4.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>split -&gt;file_mmap() into -&gt;mmap_addr()/-&gt;mmap_file()</title>
<updated>2012-05-31T17:11:54Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-05-30T17:30:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e5467859f7f79b69fc49004403009dfdba3bec53'/>
<id>urn:sha1:e5467859f7f79b69fc49004403009dfdba3bec53</id>
<content type='text'>
... i.e. file-dependent and address-dependent checks.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2012-05-24T00:42:39Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-05-24T00:42:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=644473e9c60c1ff4f6351fed637a6e5551e3dce7'/>
<id>urn:sha1:644473e9c60c1ff4f6351fed637a6e5551e3dce7</id>
<content type='text'>
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
</content>
</entry>
<entry>
<title>Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-05-23T17:59:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-05-23T17:59:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec0d7f18ab7b5097d7c0c8f3d909ca1031b9d5cd'/>
<id>urn:sha1:ec0d7f18ab7b5097d7c0c8f3d909ca1031b9d5cd</id>
<content type='text'>
Pull fpu state cleanups from Ingo Molnar:
 "This tree streamlines further aspects of FPU handling by eliminating
  the prepare_to_copy() complication and moving that logic to
  arch_dup_task_struct().

  It also fixes the FPU dumps in threaded core dumps, removes and old
  (and now invalid) assumption plus micro-optimizes the exit path by
  avoiding an FPU save for dead tasks."

Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: drop the fpu state during thread exit
  x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  fork: move the real prepare_to_copy() users to arch_dup_task_struct()
</content>
</entry>
<entry>
<title>coredump: ensure the fpu state is flushed for proper multi-threaded core dump</title>
<updated>2012-05-16T22:16:48Z</updated>
<author>
<name>Suresh Siddha</name>
<email>suresh.b.siddha@intel.com</email>
</author>
<published>2012-05-16T22:03:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11aeca0b3a083a457f5c34fe8c677d5e86a0c6b3'/>
<id>urn:sha1:11aeca0b3a083a457f5c34fe8c677d5e86a0c6b3</id>
<content type='text'>
Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
of a multi-threaded application.

A look at the exit seqeuence shows that other threads can still be on the
runqueue potentially at the below shown exit_mm() code snippet:

		if (atomic_dec_and_test(&amp;core_state-&gt;nr_threads))
			complete(&amp;core_state-&gt;startup);

===&gt; other threads can still be active here, but we notify the thread
===&gt; dumping core to wakeup from the coredump_wait() after the last thread
===&gt; joins this point. Core dumping thread will continue dumping
===&gt; all the threads state to the core file.

		for (;;) {
			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
			if (!self.task) /* see coredump_finish() */
				break;
			schedule();
		}

As some of those threads are on the runqueue and didn't call schedule() yet,
their fpu state is still active in the live registers and the thread
proceeding with the coredump will hit the above mentioned BUG_ON while
trying to dump other threads fpustate to the coredump file.

BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
in the code paths for processors supporting xsaveopt. With or without
xsaveopt, multi-threaded coredump is broken and maynot contain
the correct fpustate at the time of exit.

In coredump_wait(), wait for all the threads to be come inactive, so
that we are sure all the extended register state is flushed to
the memory, so that it can be reliably copied to the core file.

Reported-by: Suresh Nalluru &lt;suresh@aristanetworks.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Link: http://lkml.kernel.org/r/1336692811-30576-2-git-send-email-suresh.b.siddha@intel.com
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</content>
</entry>
<entry>
<title>userns: Fail exec for suid and sgid binaries with ids outside our user namespace.</title>
<updated>2012-05-15T21:59:23Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2011-11-17T07:37:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9e4a36ece652908276bc4abb4324ec56292453e1'/>
<id>urn:sha1:9e4a36ece652908276bc4abb4324ec56292453e1</id>
<content type='text'>
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs</title>
<updated>2012-05-03T10:29:34Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2012-03-04T05:17:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8e96e3b7b8407be794ab1fd8e4b332818a358e78'/>
<id>urn:sha1:8e96e3b7b8407be794ab1fd8e4b332818a358e78</id>
<content type='text'>
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs</title>
<updated>2012-04-14T01:13:18Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2012-04-12T21:47:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=259e5e6c75a910f3b5e656151dc602f53f9d7548'/>
<id>urn:sha1:259e5e6c75a910f3b5e656151dc602f53f9d7548</id>
<content type='text'>
With this change, calling
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time.  For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set.  The same is true for file capabilities.

Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.

To determine if the NO_NEW_PRIVS bit is set, a task may call
  prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)

This functionality is desired for the proposed seccomp filter patch
series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.

Another potential use is making certain privileged operations
unprivileged.  For example, chroot may be considered "safe" if it cannot
affect privileged tasks.

Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use.  It is fixed in a subsequent patch.

Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Will Drewry &lt;wad@chromium.org&gt;
Acked-by: Eric Paris &lt;eparis@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;

v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris &lt;james.l.morris@oracle.com&gt;
</content>
</entry>
</feed>
