<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/futex.c, branch v2.6.24</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.24</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.24'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2008-01-09T00:21:39Z</updated>
<entry>
<title>futex: Prevent stale futex owner when interrupted/timeout</title>
<updated>2008-01-09T00:21:39Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-01-08T18:47:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cdf71a10c7b6432d9b48e292cca2c62a0b9fa6cf'/>
<id>urn:sha1:cdf71a10c7b6432d9b48e292cca2c62a0b9fa6cf</id>
<content type='text'>
Roland Westrelin did a great analysis of a long standing thinko in the
return path of futex_lock_pi.

While we fixed the lock steal case long ago, which was easy to trigger,
we never had a test case which exposed this problem and stupidly never
thought about the reverse lock stealing scenario and the return to user
space with a stale state.

When a blocked tasks returns from rt_mutex_timed_locked without holding
the rt_mutex (due to a signal or timeout) and at the same time the task
holding the futex is releasing the futex and assigning the ownership of
the futex to the returning task, then it might happen that a third task
acquires the rt_mutex before the final rt_mutex_trylock() of the
returning task happens under the futex hash bucket lock. The returning
task returns to user space with ETIMEOUT or EINTR, but the user space
futex value is assigned to this task. The task which acquired the
rt_mutex fixes the user space futex value right after the hash bucket
lock has been released by the returning task, but for a short period of
time the user space value is wrong.

Detailed description is available at:

   https://bugzilla.redhat.com/show_bug.cgi?id=400541

The fix for this is the same as we do when the rt_mutex was acquired by
a higher priority task via lock stealing from the designated new owner.
In that case we already fix the user space value and the internal
pi_state up before we return. This mechanism can be used to fixup the
above corner case as well. When the returning task, which failed to
acquire the rt_mutex, notices that it is the designated owner of the
futex, then it fixes up the stale user space value and the pi_state,
before returning to user space. This happens with the futex hash bucket
lock held, so the task which acquired the rt_mutex is guaranteed to be
blocked on the hash bucket lock. We can access the rt_mutex owner, which
gives us the pid of the new owner, safely here as the owner is not able
to modify (release) it while waiting on the hash bucket lock.

Rename the "curr" argument of fixup_pi_state_owner() to "newowner" to
avoid confusion with current and add the check for the stale state into
the failure path of rt_mutex_trylock() in the return path of
unlock_futex_pi(). If the situation is detected use
fixup_pi_state_owner() to assign everything to the owner of the
rt_mutex.

Pointed-out-and-tested-by: Roland Westrelin &lt;roland.westrelin@sun.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex: correctly return -EFAULT not -EINVAL</title>
<updated>2007-12-05T14:46:09Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2007-12-05T14:46:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cde898fa80a45bb23eab2a060fc79d0913081409'/>
<id>urn:sha1:cde898fa80a45bb23eab2a060fc79d0913081409</id>
<content type='text'>
return -EFAULT not -EINVAL. Found by review.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>futex: fix for futex_wait signal stack corruption</title>
<updated>2007-12-05T14:46:09Z</updated>
<author>
<name>Steven Rostedt</name>
<email>srostedt@redhat.com</email>
</author>
<published>2007-12-05T14:46:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ce6bd420f43b28038a2c6e8fbb86ad24014727b6'/>
<id>urn:sha1:ce6bd420f43b28038a2c6e8fbb86ad24014727b6</id>
<content type='text'>
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too.  The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.

Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.

Here's the relevant code from kernel/futex.c: (not in order in the file)

[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
                          struct timespec __user *utime, u32 __user *uaddr2,
                          u32 val3)
{
        struct timespec ts;
        ktime_t t, *tp = NULL;
        u32 val2 = 0;
        int cmd = op &amp; FUTEX_CMD_MASK;

        if (utime &amp;&amp; (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
                if (copy_from_user(&amp;ts, utime, sizeof(ts)) != 0)
                        return -EFAULT;
                if (!timespec_valid(&amp;ts))
                        return -EINVAL;

                t = timespec_to_ktime(ts);
                if (cmd == FUTEX_WAIT)
                        t = ktime_add(ktime_get(), t);
                tp = &amp;t;
        }
[...]
        return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}

[...]

long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
                u32 __user *uaddr2, u32 val2, u32 val3)
{
        int ret;
        int cmd = op &amp; FUTEX_CMD_MASK;
        struct rw_semaphore *fshared = NULL;

        if (!(op &amp; FUTEX_PRIVATE_FLAG))
                fshared = &amp;current-&gt;mm-&gt;mmap_sem;

        switch (cmd) {
        case FUTEX_WAIT:
                ret = futex_wait(uaddr, fshared, val, timeout);

[...]

static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
                      u32 val, ktime_t *abs_time)
{
[...]
               struct restart_block *restart;
                restart = &amp;current_thread_info()-&gt;restart_block;
                restart-&gt;fn = futex_wait_restart;
                restart-&gt;arg0 = (unsigned long)uaddr;
                restart-&gt;arg1 = (unsigned long)val;
                restart-&gt;arg2 = (unsigned long)abs_time;
                restart-&gt;arg3 = 0;
                if (fshared)
                        restart-&gt;arg3 |= ARG3_SHARED;
                return -ERESTART_RESTARTBLOCK;
[...]

static long futex_wait_restart(struct restart_block *restart)
{
        u32 __user *uaddr = (u32 __user *)restart-&gt;arg0;
        u32 val = (u32)restart-&gt;arg1;
        ktime_t *abs_time = (ktime_t *)restart-&gt;arg2;
        struct rw_semaphore *fshared = NULL;

        restart-&gt;fn = do_no_restart_syscall;
        if (restart-&gt;arg3 &amp; ARG3_SHARED)
                fshared = &amp;current-&gt;mm-&gt;mmap_sem;
        return (long)futex_wait(uaddr, fshared, val, abs_time);
}

So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK.  The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.

This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.

I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.

The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters.  This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.

Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for.  If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.

Signed-off-by: Steven Rostedt &lt;srostedt@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/futex.c: make 3 functions static</title>
<updated>2007-11-05T10:53:46Z</updated>
<author>
<name>Adrian Bunk</name>
<email>bunk@kernel.org</email>
</author>
<published>2007-11-02T15:43:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fad23fc78b959dae89768e523c3a6f5edb83bbe9'/>
<id>urn:sha1:fad23fc78b959dae89768e523c3a6f5edb83bbe9</id>
<content type='text'>
The following functions can now become static again:
- get_futex_key()
- get_futex_key_refs()
- drop_futex_key_refs()

Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
</content>
</entry>
<entry>
<title>Uninline find_task_by_xxx set of functions</title>
<updated>2007-10-19T18:53:40Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=228ebcbe634a30aec35132ea4375721bcc41bec0'/>
<id>urn:sha1:228ebcbe634a30aec35132ea4375721bcc41bec0</id>
<content type='text'>
The find_task_by_something is a set of macros are used to find task by pid
depending on what kind of pid is proposed - global or virtual one.  All of
them are wrappers above the most generic one - find_task_by_pid_type_ns() -
and just substitute some args for it.

It turned out, that dereferencing the current-&gt;nsproxy-&gt;pid_ns construction
and pushing one more argument on the stack inline cause kernel text size to
grow.

This patch moves all this stuff out-of-line into kernel/pid.c.  Together
with the next patch it saves a bit less than 400 bytes from the .text
section.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid namespaces: changes to show virtual ids to user</title>
<updated>2007-10-19T18:53:40Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b488893a390edfe027bae7a46e9af8083e740668'/>
<id>urn:sha1:b488893a390edfe027bae7a46e9af8083e740668</id>
<content type='text'>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
 - all in-kernel data structures must store either struct pid itself
   or the pid's global nr, obtained with pid_nr() call;
 - when seeking the task from kernel code with the stored id one
   should use find_task_by_pid() call that works with global pids;
 - when showing pid's numerical value to the user the virtual one
   should be used, but however when one shows task's pid outside this
   task's namespace the global one is to be used;
 - when getting the pid from userspace one need to consider this as
   the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@openvz.org&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sparse pointer use of zero as null</title>
<updated>2007-10-18T21:37:31Z</updated>
<author>
<name>Stephen Hemminger</name>
<email>shemminger@linux-foundation.org</email>
</author>
<published>2007-10-18T10:07:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c80544dc0b87bb65038355e7aafdc30be16b26ab'/>
<id>urn:sha1:c80544dc0b87bb65038355e7aafdc30be16b26ab</id>
<content type='text'>
Get rid of sparse related warnings from places that use integer as NULL
pointer.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger &lt;shemminger@linux-foundation.org&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Jeff Garzik &lt;jeff@garzik.org&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
Cc: Ian Kent &lt;raven@themaw.net&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>change inotifyfs magic as the same magic is used for futexfs</title>
<updated>2007-10-17T15:43:00Z</updated>
<author>
<name>Andrey Mirkin</name>
<email>major@openvz.org</email>
</author>
<published>2007-10-17T06:30:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd5eea4214f72bd7ac77c1c5346a9c096319131a'/>
<id>urn:sha1:fd5eea4214f72bd7ac77c1c5346a9c096319131a</id>
<content type='text'>
Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a
little bit confusing.  Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as
magic for inotifyfs.

Signed-off-by: Andrey Mirkin &lt;major@openvz.org&gt;
Acked-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>robust futex thread exit race</title>
<updated>2007-10-01T14:52:23Z</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2007-10-01T08:20:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9f96cb1e8bca179a92afa40dfc3c49990f1cfc71'/>
<id>urn:sha1:9f96cb1e8bca179a92afa40dfc3c49990f1cfc71</id>
<content type='text'>
Calling handle_futex_death in exit_robust_list for the different robust
mutexes of a thread basically frees the mutex.  Another thread might grab
the lock immediately which updates the next pointer of the mutex.
fetch_robust_entry over the next pointer might therefore branch into the
robust mutex list of a different thread.  This can cause two problems: 1)
some mutexes held by the dead thread are not getting freed and 2) some
mutexs held by a different thread are freed.

The next point need to be read before calling handle_futex_death.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex_unlock_pi() hurts my brain and may cause application deadlock</title>
<updated>2007-08-23T02:52:44Z</updated>
<author>
<name>john stultz</name>
<email>johnstul@us.ibm.com</email>
</author>
<published>2007-08-22T21:01:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=187226f57f1381cfc63216979b4375f30e593795'/>
<id>urn:sha1:187226f57f1381cfc63216979b4375f30e593795</id>
<content type='text'>
Avoid futex_unlock_pi returning -EFAULT (which results in deadlock), by
clearing uval before jumping to retry_locked.

Signed-off-by: John Stultz &lt;johnstul@us.ibm.com&gt;
Acked-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
