<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/locking/rwsem.c, branch v5.11</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.11</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.11'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-12-09T16:08:48Z</updated>
<entry>
<title>locking/rwsem: Remove reader optimistic spinning</title>
<updated>2020-12-09T16:08:48Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-11-21T04:14:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=617f3ef95177840c77f59c2aec1029d27d5547d6'/>
<id>urn:sha1:617f3ef95177840c77f59c2aec1029d27d5547d6</id>
<content type='text'>
Reader optimistic spinning is helpful when the reader critical section
is short and there aren't that many readers around. It also improves
the chance that a reader can get the lock as writer optimistic spinning
disproportionally favors writers much more than readers.

Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers
in wait queue"), all the waiting readers are woken up so that they can
all get the read lock and run in parallel. When the number of contending
readers is large, allowing reader optimistic spinning will likely cause
reader fragmentation where multiple smaller groups of readers can get
the read lock in a sequential manner separated by writers. That reduces
reader parallelism.

One possible way to address that drawback is to limit the number of
readers (preferably one) that can do optimistic spinning. These readers
act as representatives of all the waiting readers in the wait queue as
they will wake up all those waiting readers once they get the lock.

Alternatively, as reader optimistic lock stealing has already enhanced
fairness to readers, it may be easier to just remove reader optimistic
spinning and simplifying the optimistic spinning code as a result.

Performance measurements (locking throughput kops/s) using a locking
microbenchmark with 50/50 reader/writer distribution and turbo-boost
disabled was done on a 2-socket Cascade Lake system (48-core 96-thread)
to see the impacts of these changes:

  1) Vanilla     - 5.10-rc3 kernel
  2) Before      - 5.10-rc3 kernel with previous patches in this series
  2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch
  3) no-rspin    - 5.10-rc3 kernel with reader spinning disabled

  # of threads  CS Load   Vanilla  Before   limit-rspin   no-rspin
  ------------  -------   -------  ------   -----------   --------
       2            1      5,185    5,662      5,214       5,077
       4            1      5,107    4,983      5,188       4,760
       8            1      4,782    4,564      4,720       4,628
      16            1      4,680    4,053      4,567       3,402
      32            1      4,299    1,115      1,118       1,098
      64            1      3,218      983      1,001         957
      96            1      1,938      944        957         930

       2           20      2,008    2,128      2,264       1,665
       4           20      1,390    1,033      1,046       1,101
       8           20      1,472    1,155      1,098       1,213
      16           20      1,332    1,077      1,089       1,122
      32           20        967      914        917         980
      64           20        787      874        891         858
      96           20        730      836        847         844

       2          100        372      356        360         355
       4          100        492      425        434         392
       8          100        533      537        529         538
      16          100        548      572        568         598
      32          100        499      520        527         537
      64          100        466      517        526         512
      96          100        406      497        506         509

The column "CS Load" represents the number of pause instructions issued
in the locking critical section. A CS load of 1 is extremely short and
is not likey in real situations. A load of 20 (moderate) and 100 (long)
are more realistic.

It can be seen that the previous patches in this series have reduced
performance in general except in highly contended cases with moderate
or long critical sections that performance improves a bit. This change
is mostly caused by the "Prevent potential lock starvation" patch that
reduce reader optimistic spinning and hence reduce reader fragmentation.

The patch that further limit reader optimistic spinning doesn't seem to
have too much impact on overall performance as shown in the benchmark
data.

The patch that disables reader optimistic spinning shows reduced
performance at lightly loaded cases, but comparable or slightly better
performance on with heavier contention.

This patch just removes reader optimistic spinning for now. As readers
are not going to do optimistic spinning anymore, we don't need to
consider if the OSQ is empty or not when doing lock stealing.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Link: https://lkml.kernel.org/r/20201121041416.12285-6-longman@redhat.com
</content>
</entry>
<entry>
<title>locking/rwsem: Enable reader optimistic lock stealing</title>
<updated>2020-12-09T16:08:48Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-11-21T04:14:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a728dff855a318bb58bcc1259b1826a7ad9f0bd'/>
<id>urn:sha1:1a728dff855a318bb58bcc1259b1826a7ad9f0bd</id>
<content type='text'>
If the optimistic spinning queue is empty and the rwsem does not have
the handoff or write-lock bits set, it is actually not necessary to
call rwsem_optimistic_spin() to spin on it. Instead, it can steal the
lock directly as its reader bias is in the count already.  If it is
the first reader in this state, it will try to wake up other readers
in the wait queue.

With this patch applied, the following were the lock event counts
after rebooting a 2-socket system and a "make -j96" kernel rebuild.

  rwsem_opt_rlock=4437
  rwsem_rlock=29
  rwsem_rlock_steal=19

So lock stealing represents about 0.4% of all the read locks acquired
in the slow path.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Link: https://lkml.kernel.org/r/20201121041416.12285-4-longman@redhat.com
</content>
</entry>
<entry>
<title>locking/rwsem: Prevent potential lock starvation</title>
<updated>2020-12-09T16:08:48Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-11-21T04:14:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2f06f702925b512a95b95dca3855549c047eef58'/>
<id>urn:sha1:2f06f702925b512a95b95dca3855549c047eef58</id>
<content type='text'>
The lock handoff bit is added in commit 4f23dbc1e657 ("locking/rwsem:
Implement lock handoff to prevent lock starvation") to avoid lock
starvation. However, allowing readers to do optimistic spinning does
introduce an unlikely scenario where lock starvation can happen.

The lock handoff bit may only be set when a waiter is being woken up.
In the case of reader unlock, wakeup happens only when the reader count
reaches 0. If there is a continuous stream of incoming readers acquiring
read lock via optimistic spinning, it is possible that the reader count
may never reach 0 and so the handoff bit will never be asserted.

One way to prevent this scenario from happening is to disallow optimistic
spinning if the rwsem is currently owned by readers. If the previous
or current owner is a writer, optimistic spinning will be allowed.

If the previous owner is a reader but the reader count has reached 0
before, a wakeup should have been issued. So the handoff mechanism
will be kicked in to prevent lock starvation. As a result, it should
be OK to do optimistic spinning in this case.

This patch may have some impact on reader performance as it reduces
reader optimistic spinning especially if the lock critical sections
are short the number of contending readers are small.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Link: https://lkml.kernel.org/r/20201121041416.12285-3-longman@redhat.com
</content>
</entry>
<entry>
<title>locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath()</title>
<updated>2020-12-09T16:08:47Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-11-21T04:14:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8fe8b0564388f41147326f31e4587171aacccd4'/>
<id>urn:sha1:c8fe8b0564388f41147326f31e4587171aacccd4</id>
<content type='text'>
The atomic count value right after reader count increment can be useful
to determine the rwsem state at trylock time. So the count value is
passed down to rwsem_down_read_slowpath() to be used when appropriate.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Link: https://lkml.kernel.org/r/20201121041416.12285-2-longman@redhat.com
</content>
</entry>
<entry>
<title>locking/rwsem: Fold __down_{read,write}*()</title>
<updated>2020-12-09T16:08:47Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-12-08T09:27:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c995e638ccbbc65a76d1713c4fdcf927e7e2cb83'/>
<id>urn:sha1:c995e638ccbbc65a76d1713c4fdcf927e7e2cb83</id>
<content type='text'>
There's a lot needless duplication in __down_{read,write}*(), cure
that with a helper.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>locking/rwsem: Introduce rwsem_write_trylock()</title>
<updated>2020-12-09T16:08:47Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-12-08T09:25:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=285c61aedf6bc5d81b37e4dc48c19012e8ff9836'/>
<id>urn:sha1:285c61aedf6bc5d81b37e4dc48c19012e8ff9836</id>
<content type='text'>
One copy of this logic is better than three.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>locking/rwsem: Better collate rwsem_read_trylock()</title>
<updated>2020-12-09T16:08:47Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-12-08T09:22:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3379116a0ca965b00e6522c7ea3f16c9dbd8f9f9'/>
<id>urn:sha1:3379116a0ca965b00e6522c7ea3f16c9dbd8f9f9</id>
<content type='text'>
All users of rwsem_read_trylock() do rwsem_set_reader_owned(sem) on
success, move it into rwsem_read_trylock() proper.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>rwsem: Implement down_read_interruptible</title>
<updated>2020-12-09T16:08:42Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-12-03T20:11:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=31784cff7ee073b34d6eddabb95e3be2880a425c'/>
<id>urn:sha1:31784cff7ee073b34d6eddabb95e3be2880a425c</id>
<content type='text'>
In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_interruptible.  This is needed for perf_event_open to be
converted (with no semantic changes) from working on a mutex to
wroking on a rwsem.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
</content>
</entry>
<entry>
<title>rwsem: Implement down_read_killable_nested</title>
<updated>2020-12-09T16:08:41Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-12-03T20:10:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f9368b5bf6db0c04afc5454b1be79022a681615'/>
<id>urn:sha1:0f9368b5bf6db0c04afc5454b1be79022a681615</id>
<content type='text'>
In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_killable_nested.  This is needed so that kcmp_lock
can be converted from working on a mutexes to working on rw_semaphores.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
</content>
</entry>
<entry>
<title>lockdep: Introduce wait-type checks</title>
<updated>2020-03-21T15:00:24Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-03-21T11:26:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=de8f5e4f2dc1f032b46afda0a78cab5456974f89'/>
<id>urn:sha1:de8f5e4f2dc1f032b46afda0a78cab5456974f89</id>
<content type='text'>
Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&amp;foo);
  spin_lock(&amp;bar);
  spin_unlock(&amp;bar);
  raw_spin_unlock(&amp;foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&amp;bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&amp;foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
</content>
</entry>
</feed>
