<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/locking, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-02-14T20:02:05Z</updated>
<entry>
<title>Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2016-02-14T20:02:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-02-14T20:02:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cb490d632b9c7a3f08a9c73b88956132b6ad299c'/>
<id>urn:sha1:cb490d632b9c7a3f08a9c73b88956132b6ad299c</id>
<content type='text'>
Pull lockdep fix from Thomas Gleixner:
 "A single fix for the stack trace caching logic in lockdep, where the
  duplicate avoidance managed to store no back trace at all"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix stack trace caching logic
</content>
</entry>
<entry>
<title>kernel/locking/lockdep.c: convert hash tables to hlists</title>
<updated>2016-02-12T02:35:48Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2016-02-12T00:13:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4a389810bc3cb0e73443104f0827e81e23cb1e12'/>
<id>urn:sha1:4a389810bc3cb0e73443104f0827e81e23cb1e12</id>
<content type='text'>
Mike said:

: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.  e
: kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized.  Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.

Fix this ordering issue permanently: change lockdep so that it uses
hlists for the hash tables.  Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for
operation immediately upon boot - lockdep_init() need not have run.

The patch will also save some memory.

lockdep_init() and lockdep_initialized can be done away with now - a 4.6
patch has been prepared to do this.

Reported-by: Mike Krinkin &lt;krinkin.m.u@gmail.com&gt;
Suggested-by: Mike Krinkin &lt;krinkin.m.u@gmail.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>locking/lockdep: Fix stack trace caching logic</title>
<updated>2016-02-09T10:06:08Z</updated>
<author>
<name>Dmitry Vyukov</name>
<email>dvyukov@google.com</email>
</author>
<published>2016-02-04T13:40:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8a5fd56431fe1682e870bd6ab0c276e74befbeb9'/>
<id>urn:sha1:8a5fd56431fe1682e870bd6ab0c276e74befbeb9</id>
<content type='text'>
check_prev_add() caches saved stack trace in static trace variable
to avoid duplicate save_trace() calls in dependencies involving trylocks.
But that caching logic contains a bug. We may not save trace on first
iteration due to early return from check_prev_add(). Then on the
second iteration when we actually need the trace we don't save it
because we think that we've already saved it.

Let check_prev_add() itself control when stack is saved.

There is another bug. Trace variable is protected by graph lock.
But we can temporary release graph lock during printing.

Fix this by invalidating cached stack trace when we release graph lock.

Signed-off-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: glider@google.com
Cc: kcc@google.com
Cc: peter@hurleysoftware.com
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>rtmutex: Make wait_lock irq safe</title>
<updated>2016-01-26T10:08:35Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-01-13T10:25:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4abf91047cf054f203dcfac97e1038388826937'/>
<id>urn:sha1:b4abf91047cf054f203dcfac97e1038388826937</id>
<content type='text'>
Sasha reported a lockdep splat about a potential deadlock between RCU boosting
rtmutex and the posix timer it_lock.

CPU0					CPU1

rtmutex_lock(&amp;rcu-&gt;rt_mutex)
  spin_lock(&amp;rcu-&gt;rt_mutex.wait_lock)
					local_irq_disable()
					spin_lock(&amp;timer-&gt;it_lock)
					spin_lock(&amp;rcu-&gt;mutex.wait_lock)
--&gt; Interrupt
    spin_lock(&amp;timer-&gt;it_lock)

This is caused by the following code sequence on CPU1

     rcu_read_lock()
     x = lookup();
     if (x)
     	spin_lock_irqsave(&amp;x-&gt;it_lock);
     rcu_read_unlock();
     return x;

We could fix that in the posix timer code by keeping rcu read locked across
the spinlocked and irq disabled section, but the above sequence is common and
there is no reason not to support it.

Taking rt_mutex.wait_lock irq safe prevents the deadlock.

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2016-01-11T22:18:38Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-01-11T22:18:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=24af98c4cf5f5e69266e270c7f3fb34b82ff6656'/>
<id>urn:sha1:24af98c4cf5f5e69266e270c7f3fb34b82ff6656</id>
<content type='text'>
Pull locking updates from Ingo Molnar:
 "So we have a laundry list of locking subsystem changes:

   - continuing barrier API and code improvements

   - futex enhancements

   - atomics API improvements

   - pvqspinlock enhancements: in particular lock stealing and adaptive
     spinning

   - qspinlock micro-enhancements"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
  futex: Cleanup the goto confusion in requeue_pi()
  futex: Remove pointless put_pi_state calls in requeue()
  futex: Document pi_state refcounting in requeue code
  futex: Rename free_pi_state() to put_pi_state()
  futex: Drop refcount if requeue_pi() acquired the rtmutex
  locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
  lcoking/barriers, arch: Use smp barriers in smp_store_release()
  locking/cmpxchg, arch: Remove tas() definitions
  locking/pvqspinlock: Queue node adaptive spinning
  locking/pvqspinlock: Allow limited lock stealing
  locking/pvqspinlock: Collect slowpath lock statistics
  sched/core, locking: Document Program-Order guarantees
  locking, sched: Introduce smp_cond_acquire() and use it
  locking/pvqspinlock, x86: Optimize the PV unlock code path
  locking/qspinlock: Avoid redundant read of next pointer
  locking/qspinlock: Prefetch the next node cacheline
  locking/qspinlock: Use _acquire/_release() versions of cmpxchg() &amp; xchg()
  atomics: Add test for atomic operations with _relaxed variants
</content>
</entry>
<entry>
<title>locking/osq: Fix ordering of node initialisation in osq_lock</title>
<updated>2015-12-17T19:40:29Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2015-12-11T17:46:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4b29f94856ad68329132c2306e9a114920643e3'/>
<id>urn:sha1:b4b29f94856ad68329132c2306e9a114920643e3</id>
<content type='text'>
The Cavium guys reported a soft lockup on their arm64 machine, caused by
commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"):

    mutex_optimistic_spin+0x9c/0x1d0
    __mutex_lock_slowpath+0x44/0x158
    mutex_lock+0x54/0x58
    kernfs_iop_permission+0x38/0x70
    __inode_permission+0x88/0xd8
    inode_permission+0x30/0x6c
    link_path_walk+0x68/0x4d4
    path_openat+0xb4/0x2bc
    do_filp_open+0x74/0xd0
    do_sys_open+0x14c/0x228
    SyS_openat+0x3c/0x48
    el0_svc_naked+0x24/0x28

This is because in osq_lock we initialise the node for the current CPU:

    node-&gt;locked = 0;
    node-&gt;next = NULL;
    node-&gt;cpu = curr;

and then publish the current CPU in the lock tail:

    old = atomic_xchg_acquire(&amp;lock-&gt;tail, curr);

Once the update to lock-&gt;tail is visible to another CPU, the node is
then live and can be both read and updated by concurrent lockers.

Unfortunately, the ACQUIRE semantics of the xchg operation mean that
there is no guarantee the contents of the node will be visible before
lock tail is updated.  This can lead to lock corruption when, for
example, a concurrent locker races to set the next field.

Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
Reported-by: David Daney &lt;ddaney@caviumnetworks.com&gt;
Reported-by: Andrew Pinski &lt;andrew.pinski@caviumnetworks.com&gt;
Tested-by: Andrew Pinski &lt;andrew.pinski@caviumnetworks.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1449856001-21177-1-git-send-email-will.deacon@arm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>locking/pvqspinlock: Queue node adaptive spinning</title>
<updated>2015-12-04T10:39:51Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2015-11-10T00:09:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd0272fab785077c121aa91ec2401090965bbc37'/>
<id>urn:sha1:cd0272fab785077c121aa91ec2401090965bbc37</id>
<content type='text'>
In an overcommitted guest where some vCPUs have to be halted to make
forward progress in other areas, it is highly likely that a vCPU later
in the spinlock queue will be spinning while the ones earlier in the
queue would have been halted. The spinning in the later vCPUs is then
just a waste of precious CPU cycles because they are not going to
get the lock soon as the earlier ones have to be woken up and take
their turn to get the lock.

This patch implements an adaptive spinning mechanism where the vCPU
will call pv_wait() if the previous vCPU is not running.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

		    Westmere			Haswell
  Patch		32 vCPUs    48 vCPUs	32 vCPUs    48 vCPUs
  -----		--------    --------    --------    --------
  Before patch   3m02.3s     5m00.2s     1m43.7s     3m03.5s
  After patch    3m03.0s     4m37.5s	 1m43.0s     2m47.2s

For 32 vCPUs, this patch doesn't cause any noticeable change in
performance. For 48 vCPUs (over-committed), there is about 8%
performance improvement.

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447114167-47185-8-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/pvqspinlock: Allow limited lock stealing</title>
<updated>2015-12-04T10:39:51Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2015-11-10T21:18:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1c4941fd53afb46ab15826628e4819866d008a28'/>
<id>urn:sha1:1c4941fd53afb46ab15826628e4819866d008a28</id>
<content type='text'>
This patch allows one attempt for the lock waiter to steal the lock
when entering the PV slowpath. To prevent lock starvation, the pending
bit will be set by the queue head vCPU when it is in the active lock
spinning loop to disable any lock stealing attempt.  This helps to
reduce the performance penalty caused by lock waiter preemption while
not having much of the downsides of a real unfair lock.

The pv_wait_head() function was renamed as pv_wait_head_or_lock()
as it was modified to acquire the lock before returning. This is
necessary because of possible lock stealing attempts from other tasks.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

                    Westmere                    Haswell
  Patch         32 vCPUs    48 vCPUs    32 vCPUs    48 vCPUs
  -----         --------    --------    --------    --------
  Before patch   3m15.6s    10m56.1s     1m44.1s     5m29.1s
  After patch    3m02.3s     5m00.2s     1m43.7s     3m03.5s

For the overcommited case (48 vCPUs), this patch is able to reduce
kernel build time by more than 54% for Westmere and 44% for Haswell.

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447190336-53317-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/pvqspinlock: Collect slowpath lock statistics</title>
<updated>2015-12-04T10:39:50Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2015-11-10T00:09:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=45e898b735620f426eddf105fc886d2966593a58'/>
<id>urn:sha1:45e898b735620f426eddf105fc886d2966593a58</id>
<content type='text'>
This patch enables the accumulation of kicking and waiting related
PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
option is selected. It also enables the collection of data which
enable us to calculate the kicking and wakeup latencies which have
a heavy dependency on the CPUs being used.

The statistical counters are per-cpu variables to minimize the
performance overhead in their updates. These counters are exported
via the debugfs filesystem under the qlockstat directory.  When the
corresponding debugfs files are read, summation and computing of the
required data are then performed.

The measured latencies for different CPUs are:

	CPU		Wakeup		Kicking
	---		------		-------
	Haswell-EX	63.6us		 7.4us
	Westmere-EX	67.6us		 9.3us

The measured latencies varied a bit from run-to-run. The wakeup
latency is much higher than the kicking latency.

A sample of statistical counters after system bootup (with vCPU
overcommit) was:

	pv_hash_hops=1.00
	pv_kick_unlock=1148
	pv_kick_wake=1146
	pv_latency_kick=11040
	pv_latency_wake=194840
	pv_spurious_wakeup=7
	pv_wait_again=4
	pv_wait_head=23
	pv_wait_node=1129

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447114167-47185-6-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking, sched: Introduce smp_cond_acquire() and use it</title>
<updated>2015-12-04T09:33:41Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-10-16T12:39:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3e0b1b6d841a4b2f64fc09ea728913da8218424'/>
<id>urn:sha1:b3e0b1b6d841a4b2f64fc09ea728913da8218424</id>
<content type='text'>
Introduce smp_cond_acquire() which combines a control dependency and a
read barrier to form acquire semantics.

This primitive has two benefits:

 - it documents control dependencies,
 - its typically cheaper than using smp_load_acquire() in a loop.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
