<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/futex.c, branch v3.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-03-21T05:11:17Z</updated>
<entry>
<title>futex: revert back to the explicit waiter counting code</title>
<updated>2014-03-21T05:11:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-03-21T05:11:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11d4616bd07f38d496bd489ed8fad1dc4d928823'/>
<id>urn:sha1:11d4616bd07f38d496bd489ed8fad1dc4d928823</id>
<content type='text'>
Srikar Dronamraju reports that commit b0c29f79ecea ("futexes: Avoid
taking the hb-&gt;lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.

The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.

So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.

Reported-and-tested-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Original-code-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-01-20T18:42:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-01-20T18:42:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8'/>
<id>urn:sha1:a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8</id>
<content type='text'>
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
</content>
</entry>
<entry>
<title>futexes: Fix futex_hashsize initialization</title>
<updated>2014-01-16T14:14:32Z</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2014-01-16T13:54:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63b1a81699c2a45c9f737419b1ec1da0ecf92812'/>
<id>urn:sha1:63b1a81699c2a45c9f737419b1ec1da0ecf92812</id>
<content type='text'>
"futexes: Increase hash table size for better performance"
introduces a new alloc_large_system_hash() call.

alloc_large_system_hash() however may allocate less memory than
requested, e.g. limited by MAX_ORDER.

Hence pass a pointer to alloc_large_system_hash() which will
contain the hash shift when the function returns. Afterwards
correctly set futex_hashsize.

Fixes a crash on s390 where the requested allocation size was
4MB but only 1MB was allocated.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Link: http://lkml.kernel.org/r/20140116135450.GA4345@osiris
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>rtmutex: Turn the plist into an rb-tree</title>
<updated>2014-01-13T12:41:50Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-11-07T13:43:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fb00aca474405f4fa8a8519c3179fed722eabd83'/>
<id>urn:sha1:fb00aca474405f4fa8a8519c3179fed722eabd83</id>
<content type='text'>
Turn the pi-chains from plist to rb-tree, in the rt_mutex code,
and provide a proper comparison function for -deadline and
-priority tasks.

This is done mainly because:
 - classical prio field of the plist is just an int, which might
   not be enough for representing a deadline;
 - manipulating such a list would become O(nr_deadline_tasks),
   which might be to much, as the number of -deadline task increases.

Therefore, an rb-tree is used, and tasks are queued in it according
to the following logic:
 - among two -priority (i.e., SCHED_BATCH/OTHER/RR/FIFO) tasks, the
   one with the higher (lower, actually!) prio wins;
 - among a -priority and a -deadline task, the latter always wins;
 - among two -deadline tasks, the one with the earliest deadline
   wins.

Queueing and dequeueing functions are changed accordingly, for both
the list of a task's pi-waiters and the list of tasks blocked on
a pi-lock.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Dario Faggioli &lt;raistlin@linux.it&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@gmail.com&gt;
Signed-off-again-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1383831828-15501-10-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>futexes: Avoid taking the hb-&gt;lock if there's nothing to wake up</title>
<updated>2014-01-13T10:45:21Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-01-12T23:31:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0c29f79ecea0b6fbcefc999e70f2843ae8306db'/>
<id>urn:sha1:b0c29f79ecea0b6fbcefc999e70f2843ae8306db</id>
<content type='text'>
In futex_wake() there is clearly no point in taking the hb-&gt;lock
if we know beforehand that there are no tasks to be woken. While
the hash bucket's plist head is a cheap way of knowing this, we
cannot rely 100% on it as there is a racy window between the
futex_wait call and when the task is actually added to the
plist. To this end, we couple it with the spinlock check as
tasks trying to enter the critical region are most likely
potential waiters that will be added to the plist, thus
preventing tasks sleeping forever if wakers don't acknowledge
all possible waiters.

Furthermore, the futex ordering guarantees are preserved,
ensuring that waiters either observe the changed user space
value before blocking or is woken by a concurrent waker. For
wakers, this is done by relying on the barriers in
get_futex_key_refs() -- for archs that do not have implicit mb
in atomic_inc(), we explicitly add them through a new
futex_get_mm function. For waiters we rely on the fact that
spin_lock calls already update the head counter, so spinners
are visible even if the lock hasn't been acquired yet.

For more details please refer to the updated comments in the
code and related discussion:

  https://lkml.org/lkml/2013/11/26/556

Special thanks to tglx for careful review and feedback.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Jeff Mahoney &lt;jeffm@suse.com&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: Tom Vaden &lt;tom.vaden@hp.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1389569486-25487-5-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>futexes: Document multiprocessor ordering guarantees</title>
<updated>2014-01-13T10:45:19Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-01-12T23:31:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=99b60ce69734dfeda58c6184a326b9475ce1dba3'/>
<id>urn:sha1:99b60ce69734dfeda58c6184a326b9475ce1dba3</id>
<content type='text'>
That's essential, if you want to hack on futexes.

Reviewed-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Reviewed-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Jeff Mahoney &lt;jeffm@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: Tom Vaden &lt;tom.vaden@hp.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1389569486-25487-4-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>futexes: Increase hash table size for better performance</title>
<updated>2014-01-13T10:45:18Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-01-12T23:31:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a52b89ebb6d4499be38780db8d176c5d3a6fbc17'/>
<id>urn:sha1:a52b89ebb6d4499be38780db8d176c5d3a6fbc17</id>
<content type='text'>
Currently, the futex global hash table suffers from its fixed,
smallish (for today's standards) size of 256 entries, as well as
its lack of NUMA awareness. Large systems, using many futexes,
can be prone to high amounts of collisions; where these futexes
hash to the same bucket and lead to extra contention on the same
hb-&gt;lock. Furthermore, cacheline bouncing is a reality when we
have multiple hb-&gt;locks residing on the same cacheline and
different futexes hash to adjacent buckets.

This patch keeps the current static size of 16 entries for small
systems, or otherwise, 256 * ncpus (or larger as we need to
round the number to a power of 2). Note that this number of CPUs
accounts for all CPUs that can ever be available in the system,
taking into consideration things like hotpluging. While we do
impose extra overhead at bootup by making the hash table larger,
this is a one time thing, and does not shadow the benefits of
this patch.

Furthermore, as suggested by tglx, by cache aligning the hash
buckets we can avoid access across cacheline boundaries and also
avoid massive cache line bouncing if multiple cpus are hammering
away at different hash buckets which happen to reside in the
same cache line.

Also, similar to other core kernel components (pid, dcache,
tcp), by using alloc_large_system_hash() we benefit from its
NUMA awareness and thus the table is distributed among the nodes
instead of in a single one.

For a custom microbenchmark that pounds on the uaddr hashing --
making the wait path fail at futex_wait_setup() returning
-EWOULDBLOCK for large amounts of futexes, we can see the
following benefits on a 80-core, 8-socket 1Tb server:

 +---------+--------------------+------------------------+-----------------------+-------------------------------+
 | threads | baseline (ops/sec) | aligned-only (ops/sec) | large table (ops/sec) | large table+aligned (ops/sec) |
 +---------+--------------------+------------------------+-----------------------+-------------------------------+
 |     512 |              32426 | 50531  (+55.8%)        | 255274  (+687.2%)     | 292553  (+802.2%)             |
 |     256 |              65360 | 99588  (+52.3%)        | 443563  (+578.6%)     | 508088  (+677.3%)             |
 |     128 |             125635 | 200075 (+59.2%)        | 742613  (+491.1%)     | 835452  (+564.9%)             |
 |      80 |             193559 | 323425 (+67.1%)        | 1028147 (+431.1%)     | 1130304 (+483.9%)             |
 |      64 |             247667 | 443740 (+79.1%)        | 997300  (+302.6%)     | 1145494 (+362.5%)             |
 |      32 |             628412 | 721401 (+14.7%)        | 965996  (+53.7%)      | 1122115 (+78.5%)              |
 +---------+--------------------+------------------------+-----------------------+-------------------------------+

Reviewed-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Reviewed-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Reviewed-and-tested-by: Jason Low &lt;jason.low2@hp.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Jeff Mahoney &lt;jeffm@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: Tom Vaden &lt;tom.vaden@hp.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Link: http://lkml.kernel.org/r/1389569486-25487-3-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>futexes: Clean up various details</title>
<updated>2014-01-13T10:45:17Z</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2014-01-12T23:31:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0d00c7b20c7716ce08399570ea48813ecf001aa8'/>
<id>urn:sha1:0d00c7b20c7716ce08399570ea48813ecf001aa8</id>
<content type='text'>
- Remove unnecessary head variables.
- Delete unused parameter in queue_unlock().

Reviewed-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Reviewed-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Jeff Mahoney &lt;jeffm@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: Tom Vaden &lt;tom.vaden@hp.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1389569486-25487-2-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>futex: move user address verification up to common code</title>
<updated>2013-12-12T17:53:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-12T17:53:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5cdec2d833748fbd27d3682f7209225c504c79c5'/>
<id>urn:sha1:5cdec2d833748fbd27d3682f7209225c504c79c5</id>
<content type='text'>
When debugging the read-only hugepage case, I was confused by the fact
that get_futex_key() did an access_ok() only for the non-shared futex
case, since the user address checking really isn't in any way specific
to the private key handling.

Now, it turns out that the shared key handling does effectively do the
equivalent checks inside get_user_pages_fast() (it doesn't actually
check the address range on x86, but does check the page protections for
being a user page).  So it wasn't actually a bug, but the fact that we
treat the address differently for private and shared futexes threw me
for a loop.

Just move the check up, so that it gets done for both cases.  Also, use
the 'rw' parameter for the type, even if it doesn't actually matter any
more (it's a historical artifact of the old racy i386 "page faults from
kernel space don't check write protections").

Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex: fix handling of read-only-mapped hugepages</title>
<updated>2013-12-12T17:38:42Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-12T17:38:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f12d5bfceb7e1f9051563381ec047f7f13956c3c'/>
<id>urn:sha1:f12d5bfceb7e1f9051563381ec047f7f13956c3c</id>
<content type='text'>
The hugepage code had the exact same bug that regular pages had in
commit 7485d0d3758e ("futexes: Remove rw parameter from
get_futex_key()").

The regular page case was fixed by commit 9ea71503a8ed ("futex: Fix
regression with read only mappings"), but the transparent hugepage case
(added in a5b338f2b0b1: "thp: update futex compound knowledge") case
remained broken.

Found by Dave Jones and his trinity tool.

Reported-and-tested-by: Dave Jones &lt;davej@fedoraproject.org&gt;
Cc: stable@kernel.org # v2.6.38+
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
