<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/exit.c, branch v2.6.37</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.37</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.37'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2010-12-02T22:51:16Z</updated>
<entry>
<title>do_exit(): make sure that we run with get_fs() == USER_DS</title>
<updated>2010-12-02T22:51:16Z</updated>
<author>
<name>Nelson Elhage</name>
<email>nelhage@ksplice.com</email>
</author>
<published>2010-12-02T22:31:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33dd94ae1ccbfb7bf0fb6c692bc3d1c4269e6177'/>
<id>urn:sha1:33dd94ae1ccbfb7bf0fb6c692bc3d1c4269e6177</id>
<content type='text'>
If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
otherwise reset before do_exit().  do_exit may later (via mm_release in
fork.c) do a put_user to a user-controlled address, potentially allowing
a user to leverage an oops into a controlled write into kernel memory.

This is only triggerable in the presence of another bug, but this
potentially turns a lot of DoS bugs into privilege escalations, so it's
worth fixing.  I have proof-of-concept code which uses this bug along
with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
I've tested that this is not theoretical.

A more logical place to put this fix might be when we know an oops has
occurred, before we call do_exit(), but that would involve changing
every architecture, in multiple places.

Let's just stick it in do_exit instead.

[akpm@linux-foundation.org: update code comment]
Signed-off-by: Nelson Elhage &lt;nelhage@ksplice.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>posix-cpu-timers: workaround to suppress the problems with mt exec</title>
<updated>2010-11-05T21:16:03Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-11-05T15:53:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e0a70217107e6f9844628120412cb27bb4cea194'/>
<id>urn:sha1:e0a70217107e6f9844628120412cb27bb4cea194</id>
<content type='text'>
posix-cpu-timers.c correctly assumes that the dying process does
posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD
timers from signal-&gt;cpu_timers list.

But, it also assumes that timer-&gt;it.cpu.task is always the group
leader, and thus the dead -&gt;task means the dead thread group.

This is obviously not true after de_thread() changes the leader.
After that almost every posix_cpu_timer_ method has problems.

It is not simple to fix this bug correctly. First of all, I think
that timer-&gt;it.cpu should use struct pid instead of task_struct.
Also, the locking should be reworked completely. In particular,
tasklist_lock should not be used at all. This all needs a lot of
nontrivial and hard-to-test changes.

Change __exit_signal() to do posix_cpu_timers_exit_group() when
the old leader dies during exec. This is not the fix, just the
temporary hack to hide the problem for 2.6.37 and stable. IOW,
this is obviously wrong but this is what we currently have anyway:
cpu timers do not work after mt exec.

In theory this change adds another race. The exiting leader can
detach the timers which were attached to the new leader. However,
the window between de_thread() and release_task() is small, we
can pretend that sys_timer_create() was called before de_thread().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exit: add lock context annotation on find_new_reaper()</title>
<updated>2010-10-28T01:03:13Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@gmail.com</email>
</author>
<published>2010-10-27T22:34:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d16e15f5b029fc7d03540ba0e5fb23b0abb0ebe0'/>
<id>urn:sha1:d16e15f5b029fc7d03540ba0e5fb23b0abb0ebe0</id>
<content type='text'>
find_new_reaper() releases and regrabs tasklist_lock but was missing
proper annotations.  Add it.  This remove following sparse warning:

 warning: context imbalance in 'find_new_reaper' - unexpected unlock

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>oom: add per-mm oom disable count</title>
<updated>2010-10-26T23:52:05Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2010-10-26T21:21:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3d5992d2ac7dc09aed8ab537cba074589f0f0a52'/>
<id>urn:sha1:3d5992d2ac7dc09aed8ab537cba074589f0f0a52</id>
<content type='text'>
It's pointless to kill a task if another thread sharing its mm cannot be
killed to allow future memory freeing.  A subsequent patch will prevent
kills in such cases, but first it's necessary to have a way to flag a task
that shares memory with an OOM_DISABLE task that doesn't incur an
additional tasklist scan, which would make select_bad_process() an O(n^2)
function.

This patch adds an atomic counter to struct mm_struct that follows how
many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
They cannot be killed by the kernel, so their memory cannot be freed in
oom conditions.

This only requires task_lock() on the task that we're operating on, it
does not require mm-&gt;mmap_sem since task_lock() pins the mm and the
operation is atomic.

[rientjes@google.com: changelog and sys_unshare() code]
[rientjes@google.com: protect oom_disable_count with task_lock in fork]
[rientjes@google.com: use old_mm for oom_disable_count in exec]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>perf: Fix up delayed_put_task_struct()</title>
<updated>2010-09-09T19:07:09Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2010-09-09T19:01:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4e231c7962ce711c7d8c2a4dc23ecd1e8fc28363'/>
<id>urn:sha1:4e231c7962ce711c7d8c2a4dc23ecd1e8fc28363</id>
<content type='text'>
I missed a perf_event_ctxp user when converting it to an array. Pull this
last user into perf_event.c as well and fix it up.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>Fix unprotected access to task credentials in waitid()</title>
<updated>2010-08-18T01:07:43Z</updated>
<author>
<name>Daniel J Blueman</name>
<email>daniel.blueman@gmail.com</email>
</author>
<published>2010-08-17T22:56:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f362b73244fb16ea4ae127ced1467dd8adaa7733'/>
<id>urn:sha1:f362b73244fb16ea4ae127ced1467dd8adaa7733</id>
<content type='text'>
Using a program like the following:

	#include &lt;stdlib.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/types.h&gt;
	#include &lt;sys/wait.h&gt;

	int main() {
		id_t id;
		siginfo_t infop;
		pid_t res;

		id = fork();
		if (id == 0) { sleep(1); exit(0); }
		kill(id, SIGSTOP);
		alarm(1);
		waitid(P_PID, id, &amp;infop, WCONTINUED);
		return 0;
	}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	2 locks held by waitid02/22252:
	 #0:  (tasklist_lock){.?.?..}, at: [&lt;ffffffff81061ce5&gt;] do_wait+0xc5/0x310
	 #1:  (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){-.-...}, at: [&lt;ffffffff810611da&gt;]
	wait_consider_task+0x19a/0xbe0

	stack backtrace:
	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
	Call Trace:
	 [&lt;ffffffff81095da4&gt;] lockdep_rcu_dereference+0xa4/0xc0
	 [&lt;ffffffff81061b31&gt;] wait_consider_task+0xaf1/0xbe0
	 [&lt;ffffffff81061d15&gt;] do_wait+0xf5/0x310
	 [&lt;ffffffff810620b6&gt;] sys_waitid+0x86/0x1f0
	 [&lt;ffffffff8105fce0&gt;] ? child_wait_callback+0x0/0x70
	 [&lt;ffffffff81003282&gt;] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman &lt;daniel.blueman@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ptrace: optimize exit_ptrace() for the likely case</title>
<updated>2010-08-11T15:59:19Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-08-11T01:03:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7e49c1488ab20342eaaf38f1ca35a207f4c051d'/>
<id>urn:sha1:c7e49c1488ab20342eaaf38f1ca35a207f4c051d</id>
<content type='text'>
exit_ptrace() takes tasklist_lock unconditionally.  We need this lock to
avoid the race with ptrace_traceme(), it acts as a barrier.

Change its caller, forget_original_parent(), to call exit_ptrace() under
tasklist_lock.  Change exit_ptrace() to drop and reacquire this lock if
needed.

This allows us to add the fastpath list_empty(ptraced) check.  In the
likely no-tracees case exit_ptrace() just returns and we avoid the lock()
+ unlock() sequence.

"Zhang, Yanmin" &lt;yanmin_zhang@linux.intel.com&gt; suggested to add this
check, and he reports that this change adds about 11% improvement in some
tests.

Suggested-and-tested-by: "Zhang, Yanmin" &lt;yanmin_zhang@linux.intel.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: turn signal_struct-&gt;count into "int nr_threads"</title>
<updated>2010-05-27T16:12:47Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3ac022cb9dc5883505a88b159d1b240ad1ef405'/>
<id>urn:sha1:b3ac022cb9dc5883505a88b159d1b240ad1ef405</id>
<content type='text'>
No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal-&gt;live and kill -&gt;nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exit: move taskstats_tgid_free() from __exit_signal() to free_signal_struct()</title>
<updated>2010-05-27T16:12:46Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=97101eb41d0d3c97543878ce40e0b8a8b2747ed7'/>
<id>urn:sha1:97101eb41d0d3c97543878ce40e0b8a8b2747ed7</id>
<content type='text'>
Move taskstats_tgid_free() from __exit_signal() to free_signal_struct().

This way signal-&gt;stats never points to nowhere and we can read -&gt;stats
lockless.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exit: __exit_signal: use thread_group_leader() consistently</title>
<updated>2010-05-27T16:12:46Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-05-26T21:43:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d40e48e02f3785b9342ee4eb3d7cc9f12981b7f5'/>
<id>urn:sha1:d40e48e02f3785b9342ee4eb3d7cc9f12981b7f5</id>
<content type='text'>
Cleanup:

- Add the boolean, group_dead = thread_group_leader(), for clarity.

- Do not test/set sig == NULL to detect the all-dead case, use this
  boolean.

- Pass this boolen to __unhash_process() and use it instead of another
  thread_group_leader() call which needs -&gt;group_leader.

  This can be considered as microoptimization, but hopefully this also
  allows us do do other cleanups later.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
