<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/locking/rwsem-xadd.c, branch v4.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-08-18T13:37:11Z</updated>
<entry>
<title>locking/rwsem: Scan the wait_list for readers only once</title>
<updated>2016-08-18T13:37:11Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-08-05T08:04:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=70800c3c0cc525baa38fd0fe4660f2c27f1bfeeb'/>
<id>urn:sha1:70800c3c0cc525baa38fd0fe4660f2c27f1bfeeb</id>
<content type='text'>
When wanting to wakeup readers, __rwsem_mark_wakeup() currently
iterates the wait_list twice while looking to wakeup the first N
queued reader-tasks. While this can be quite inefficient, it was
there such that a awoken reader would be first and foremost
acknowledged by the lock counter.

Keeping the same logic, we can further benefit from the use of
wake_qs and avoid entirely the first wait_list iteration that sets
the counter as wake_up_process() isn't going to occur right away,
and therefore we maintain the counter-&gt;list order of going about
things.

Other than saving cycles with O(n) "scanning", this change also
nicely cleans up a good chunk of __rwsem_mark_wakeup(); both
visually and less tedious to read.

For example, the following improvements where seen on some will
it scale microbenchmarks, on a 48-core Haswell:

                                       v4.7              v4.7-rwsem-v1
  Hmean    signal1-processes-8    5792691.42 (  0.00%)  5771971.04 ( -0.36%)
  Hmean    signal1-processes-12   6081199.96 (  0.00%)  6072174.38 ( -0.15%)
  Hmean    signal1-processes-21   3071137.71 (  0.00%)  3041336.72 ( -0.97%)
  Hmean    signal1-processes-48   3712039.98 (  0.00%)  3708113.59 ( -0.11%)
  Hmean    signal1-processes-79   4464573.45 (  0.00%)  4682798.66 (  4.89%)
  Hmean    signal1-processes-110  4486842.01 (  0.00%)  4633781.71 (  3.27%)
  Hmean    signal1-processes-141  4611816.83 (  0.00%)  4692725.38 (  1.75%)
  Hmean    signal1-processes-172  4638157.05 (  0.00%)  4714387.86 (  1.64%)
  Hmean    signal1-processes-203  4465077.80 (  0.00%)  4690348.07 (  5.05%)
  Hmean    signal1-processes-224  4410433.74 (  0.00%)  4687534.43 (  6.28%)

  Stddev   signal1-processes-8       6360.47 (  0.00%)     8455.31 ( 32.94%)
  Stddev   signal1-processes-12      4004.98 (  0.00%)     9156.13 (128.62%)
  Stddev   signal1-processes-21      3273.14 (  0.00%)     5016.80 ( 53.27%)
  Stddev   signal1-processes-48     28420.25 (  0.00%)    26576.22 ( -6.49%)
  Stddev   signal1-processes-79     22038.34 (  0.00%)    18992.70 (-13.82%)
  Stddev   signal1-processes-110    23226.93 (  0.00%)    17245.79 (-25.75%)
  Stddev   signal1-processes-141     6358.98 (  0.00%)     7636.14 ( 20.08%)
  Stddev   signal1-processes-172     9523.70 (  0.00%)     4824.75 (-49.34%)
  Stddev   signal1-processes-203    13915.33 (  0.00%)     9326.33 (-32.98%)
  Stddev   signal1-processes-224    15573.94 (  0.00%)    10613.82 (-31.85%)

Other runs that saw improvements include context_switch and pipe; and
as expected, this is particularly highlighted on larger thread counts
as it becomes more expensive to walk the list twice.

No change in wakeup ordering or semantics.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-4-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Remove a few useless comments</title>
<updated>2016-08-18T13:37:07Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-08-05T08:04:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2867bbaf5d8f1534cae15175a389c5cbf58fec1'/>
<id>urn:sha1:c2867bbaf5d8f1534cae15175a389c5cbf58fec1</id>
<content type='text'>
Our rwsem code (xadd, at least) is rather well documented, but
there are a few really annoying comments in there that serve
no purpose and we shouldn't bother with them.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Return void in __rwsem_mark_wake()</title>
<updated>2016-08-18T13:37:03Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-08-05T08:04:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=84b23f9b58687a11ced66cc4be9b0219e8ecab84'/>
<id>urn:sha1:84b23f9b58687a11ced66cc4be9b0219e8ecab84</id>
<content type='text'>
We currently return a rw_semaphore structure, which is the
same lock we passed to the function's argument in the first
place. While there are several functions that choose this
return value, the callers use it, for example, for things
like ERR_PTR. This is not the case for __rwsem_mark_wake(),
and in addition this function is really about the lock
waiters (which we know there are at this point), so its
somewhat odd to be returning the sem structure.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()</title>
<updated>2016-06-16T08:48:35Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-05-18T10:42:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86a3b5f34fc1fb307abef4fde76bebd3edce0324'/>
<id>urn:sha1:86a3b5f34fc1fb307abef4fde76bebd3edce0324</id>
<content type='text'>
Now that we have fetch_add() we can stop using add_return() - val.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Jason Low &lt;jason.low2@hpe.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Waiman Long &lt;waiman.long@hpe.com&gt;
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Streamline the rwsem_optimistic_spin() code</title>
<updated>2016-06-08T13:17:00Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2016-05-18T01:26:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ddd0fa73c2b71c35de4fe7ae60a5f1a6cddc2cf0'/>
<id>urn:sha1:ddd0fa73c2b71c35de4fe7ae60a5f1a6cddc2cf0</id>
<content type='text'>
This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.

Suggested-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1463534783-38814-6-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Improve reader wakeup code</title>
<updated>2016-06-08T13:17:00Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2016-05-18T01:26:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bf7b4c472db44413251bcef79ca1f6bf1ec81475'/>
<id>urn:sha1:bf7b4c472db44413251bcef79ca1f6bf1ec81475</id>
<content type='text'>
In __rwsem_do_wake(), the reader wakeup code will assume a writer
has stolen the lock if the active reader/writer count is not 0.
However, this is not as reliable an indicator as the original
"&lt; RWSEM_WAITING_BIAS" check. If another reader is present, the code
will still break out and exit even if the writer is gone. This patch
changes it to check the same "&lt; RWSEM_WAITING_BIAS" condition to
reduce the chance of false positive.

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1463534783-38814-5-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Add reader-owned state to the owner field</title>
<updated>2016-06-08T13:16:59Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hpe.com</email>
</author>
<published>2016-05-18T01:26:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=19c5d690e41697fcdd19379ab9d10d8d37818414'/>
<id>urn:sha1:19c5d690e41697fcdd19379ab9d10d8d37818414</id>
<content type='text'>
Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
    the owner field is NULL which can mean either the readers own
    the lock or the owning writer hasn't set the owner field yet.
    In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is waiting in the OSQ and a reader takes the lock,
    the writer will continue to spin when out of the OSQ in the main
    rwsem_optimistic_spin() loop as the owner field is NULL wasting
    CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw         988 MB/s          1192 MB/s      +21%
  randwrite     1513 MB/s          1623 MB/s      +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Jason Low &lt;jason.low2@hp.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Douglas Hatch &lt;doug.hatch@hpe.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott J Norton &lt;scott.norton@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1463534783-38814-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Convert sem-&gt;count to 'atomic_long_t'</title>
<updated>2016-06-08T13:16:42Z</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hpe.com</email>
</author>
<published>2016-06-04T05:26:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ee62b1870be8e630158701632a533d0378e15b8'/>
<id>urn:sha1:8ee62b1870be8e630158701632a533d0378e15b8</id>
<content type='text'>
Convert the rwsem count variable to an atomic_long_t since we use it
as an atomic variable. This also allows us to remove the
rwsem_atomic_{add,update}() "abstraction" which would now be an unnecesary
level of indirection. In follow up patches, we also remove the
rwsem_atomic_{add,update}() definitions across the various architectures.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Jason Low &lt;jason.low2@hpe.com&gt;
[ Build warning fixes on various architectures. ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Terry Rudd &lt;terry.rudd@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Link: http://lkml.kernel.org/r/1465017963-4839-2-git-send-email-jason.low2@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Optimize write lock by reducing operations in slowpath</title>
<updated>2016-06-03T07:47:13Z</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hpe.com</email>
</author>
<published>2016-05-17T00:38:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c0fcb6c2d332041256dc55d8a1ec3c0a2d0befb8'/>
<id>urn:sha1:c0fcb6c2d332041256dc55d8a1ec3c0a2d0befb8</id>
<content type='text'>
When acquiring the rwsem write lock in the slowpath, we first try
to set count to RWSEM_WAITING_BIAS. When that is successful,
we then atomically add the RWSEM_WAITING_BIAS in cases where
there are other tasks on the wait list. This causes write lock
operations to often issue multiple atomic operations.

We can instead make the list_is_singular() check first, and then
set the count accordingly, so that we issue at most 1 atomic
operation when acquiring the write lock and reduce unnecessary
cacheline contention.

Signed-off-by: Jason Low &lt;jason.low2@hpe.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Waiman Long&lt;Waiman.Long@hpe.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Richard Henderson &lt;rth@twiddle.net&gt;
Cc: Terry Rudd &lt;terry.rudd@hpe.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: http://lkml.kernel.org/r/1463445486-16078-2-git-send-email-jason.low2@hpe.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>locking/rwsem: Rework zeroing reader waiter-&gt;task</title>
<updated>2016-06-03T07:47:12Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-05-13T18:56:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e38513905eeaae59056eac2c9ac55a43b1fc41b2'/>
<id>urn:sha1:e38513905eeaae59056eac2c9ac55a43b1fc41b2</id>
<content type='text'>
Readers that are awoken will expect a nil -&gt;task indicating
that a wakeup has occurred. Because of the way readers are
implemented, there's a small chance that the waiter will never
block in the slowpath (rwsem_down_read_failed), and therefore
requires some form of reference counting to avoid the following
scenario:

rwsem_down_read_failed()		rwsem_wake()
  get_task_struct();
  spin_lock_irq(&amp;wait_lock);
  list_add_tail(&amp;waiter.list)
  spin_unlock_irq(&amp;wait_lock);
					  raw_spin_lock_irqsave(&amp;wait_lock)
					  __rwsem_do_wake()
  while (1) {
    set_task_state(TASK_UNINTERRUPTIBLE);
					    waiter-&gt;task = NULL
    if (!waiter.task) // true
      break;
    schedule() // never reached

   __set_task_state(TASK_RUNNING);
 do_exit();
					    wake_up_process(tsk); // boom

... and therefore race with do_exit() when the caller returns.

There is also a mismatch between the smp_mb() and its documentation,
in that the serialization is done between reading the task and the
nil store. Furthermore, in addition to having the overlapping of
loads and stores to waiter-&gt;task guaranteed to be ordered within
that CPU, both wake_up_process() originally and now wake_q_add()
already imply barriers upon successful calls, which serves the
comment.

Now, as an alternative to perhaps inverting the checks in the blocker
side (which has its own penalty in that schedule is unavoidable),
with lockless wakeups this situation is naturally addressed and we
can just use the refcount held by wake_q_add(), instead doing so
explicitly. Of course, we must guarantee that the nil store is done
as the _last_ operation in that the task must already be marked for
deletion to not fall into the race above. Spurious wakeups are also
handled transparently in that the task's reference is only removed
when wake_up_q() is actually called _after_ the nil store.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Waiman.Long@hpe.com
Cc: dave@stgolabs.net
Cc: jason.low2@hp.com
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/1463165787-25937-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
