<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/rcu/tree.c, branch v3.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-07-09T16:15:31Z</updated>
<entry>
<title>rcu: Remove CONFIG_PROVE_RCU_DELAY</title>
<updated>2014-07-09T16:15:31Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2014-06-23T19:09:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11992c703a1c7d95f5d8759498d7617d4a504819'/>
<id>urn:sha1:11992c703a1c7d95f5d8759498d7617d4a504819</id>
<content type='text'>
The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
effective at finding race conditions, so this commit removes it.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
[ paulmck: Remove definition and uses as noted by Paul Bolle. ]
</content>
</entry>
<entry>
<title>rcu: Use __this_cpu_read() instead of per_cpu_ptr()</title>
<updated>2014-07-09T16:15:21Z</updated>
<author>
<name>Shan Wei</name>
<email>davidshan@tencent.com</email>
</author>
<published>2014-06-19T21:12:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d860d40327dde251d508a234fa00bd0d90fbb656'/>
<id>urn:sha1:d860d40327dde251d508a234fa00bd0d90fbb656</id>
<content type='text'>
The __this_cpu_read() function produces better code than does
per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
4.7.3-12ubuntu1) 4.7.3 produces the following:

ARMv7 per_cpu_ptr():

force_quiescent_state:
    mov    r3, sp    @,
    bic    r1, r3, #8128    @ tmp171,,
    ldr    r2, .L98    @ tmp169,
    bic    r1, r1, #63    @ tmp170, tmp171,
    ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)-&gt;rda
    ldr    r1, [r1, #20]    @ D.35903_68-&gt;cpu, D.35903_68-&gt;cpu
    mov    r6, r0    @ rsp, rsp
    ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
    add    r3, r3, r2    @ tmp175, __ptr, tmp173
    ldr    r5, [r3, #12]    @ rnp_old, D.29162_13-&gt;mynode

ARMv7 __this_cpu_read():

force_quiescent_state:
    ldr    r3, [r0, #220]    @ rsp_7(D)-&gt;rda, rsp_7(D)-&gt;rda
    mov    r6, r0    @ rsp, rsp
    add    r3, r3, #12    @ __ptr, rsp_7(D)-&gt;rda,
    ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13

Using gcc 4.8.2:

x86_64 per_cpu_ptr():

    movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
    movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
    movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)-&gt;rda, __ptr
    movq    24(%rdx,%rax), %r12    # _15-&gt;mynode, rnp_old

x86_64 __this_cpu_read():

    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)-&gt;rda, rsp_9(D)-&gt;rda
    movq %gs:24(%rax),%r12    # _10-&gt;mynode, rnp_old

Because this change produces significant benefits for these two very
diverse architectures, this commit makes this change.

Signed-off-by: Shan Wei &lt;davidshan@tencent.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Pranith Kumar &lt;bobby.prani@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Reviewed-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
</content>
</entry>
<entry>
<title>rcu: Don't use NMIs to dump other CPUs' stacks</title>
<updated>2014-07-09T16:15:04Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2014-06-18T16:18:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bc1dce514e9b29b64df28a533015885862f47814'/>
<id>urn:sha1:bc1dce514e9b29b64df28a533015885862f47814</id>
<content type='text'>
Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks.  This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
</content>
</entry>
<entry>
<title>rcu: Check both root and current rcu_node when setting up future grace period</title>
<updated>2014-07-09T16:15:01Z</updated>
<author>
<name>Pranith Kumar</name>
<email>bobby.prani@gmail.com</email>
</author>
<published>2014-06-11T17:32:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=48bd8e9b82a750b983823f391c67e70553757afa'/>
<id>urn:sha1:48bd8e9b82a750b983823f391c67e70553757afa</id>
<content type='text'>
The rcu_start_future_gp() function checks the current rcu_node's -&gt;gpnum
and -&gt;completed twice, once without ACCESS_ONCE() and once with it.
Which is pointless because we hold that rcu_node's -&gt;lock at that point.
The intent was to check the current rcu_node structure and the root
rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
commit therefore makes that change.

The reason that it is safe to locklessly check the root rcu_nodes's
-&gt;gpnum and -&gt;completed fields is that we hold the current rcu_node's
-&gt;lock, which constrains the root rcu_node's ability to change its
-&gt;gpnum and -&gt;completed fields.  Of course, if there is a single rcu_node
structure, then rnp_root==rnp, and holding the lock prevents all changes.
If there is more than one rcu_node structure, then the code updates the
fields in the following order:

1.	Increment rnp_root-&gt;gpnum to start new grace period.
2.	Increment rnp-&gt;gpnum to initialize the current rcu_node,
	continuing initialization for the new grace period.
3.	Increment rnp_root-&gt;completed to end the current grace period.
4.	Increment rnp-&gt;completed to continue cleaning up after the
	old grace period.

So there are four possible combinations of relative values of these
four fields:

N   N   N   N:  RCU idle, new grace period must be initiated.
		Although rnp_root-&gt;gpnum might be incremented immediately
		after we check, that will just result in unnecessary work.
		The grace period already started, and we try to start it.

N+1 N   N   N:  RCU grace period just started.  No further change is
		possible because we hold rnp-&gt;lock, so the checks of
		rnp_root-&gt;gpnum and rnp_root-&gt;completed are stable.
		We know that our request for a future grace period will
		be seen during grace-period cleanup.

N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp-&gt;gpnum is
		different than rnp-&gt;completed, we won't even look at
		rnp_root-&gt;gpnum and rnp_root-&gt;completed, so the possible
		concurrent change to rnp_root-&gt;completed does not matter.
		We know that our request for a future grace period will
		be seen during grace-period cleanup, which cannot pass
		this rcu_node because we hold its -&gt;lock.

N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
		Because rnp-&gt;gpnum is different than rnp-&gt;completed, we
		won't look at rnp_root-&gt;gpnum and rnp_root-&gt;completed, so
		the possible concurrent change to rnp_root-&gt;completed does
		not matter.  We know that our request for a future grace
		period will be seen during grace-period cleanup, which
		cannot pass this rcu_node because we hold its -&gt;lock.

Therefore, despite initial appearances, the lockless check is safe.

Signed-off-by: Pranith Kumar &lt;bobby.prani@gmail.com&gt;
[ paulmck: Update comment to say why the lockless check is safe. ]
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu: Loosen __call_rcu()'s rcu_head alignment constraint</title>
<updated>2014-07-09T16:14:50Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2014-06-09T15:24:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1146edcbef3789228454c4aa42c08ddc2c275990'/>
<id>urn:sha1:1146edcbef3789228454c4aa42c08ddc2c275990</id>
<content type='text'>
The m68k architecture aligns only to 16-bit boundaries, which can cause
the align-to-32-bits check in __call_rcu() to trigger.  Because there is
currently no known potential need for more than one low-order bit, this
commit loosens the check to 16-bit boundaries.

Reported-by: Greg Ungerer &lt;gerg@uclinux.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
</content>
</entry>
<entry>
<title>rcu: Eliminate read-modify-write ACCESS_ONCE() calls</title>
<updated>2014-07-09T16:14:49Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2014-06-02T21:54:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a792563bd47632d85158c72e2acf4484eed0ec32'/>
<id>urn:sha1:a792563bd47632d85158c72e2acf4484eed0ec32</id>
<content type='text'>
RCU contains code of the following forms:

	ACCESS_ONCE(x)++;
	ACCESS_ONCE(x) += y;
	ACCESS_ONCE(x) -= y;

Now these constructs do operate correctly, but they really result in a
pair of volatile accesses, one to do the load and another to do the store.
This can be confusing, as the casual reader might well assume that (for
example) gcc might generate a memory-to-memory add instruction for each
of these three cases.  In fact, gcc will do no such thing.  Also, there
is a good chance that the kernel will move to separate load and store
variants of ACCESS_ONCE(), and constructs like the above could easily
confuse both people and scripts attempting to make that sort of change.
Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
only need the store to be volatile, so that the read-modify-write form
might be misleading.

This commit therefore changes the above forms in RCU so that each instance
of ACCESS_ONCE() either does a load or a store, but not both.  In a few
cases, ACCESS_ONCE() was not critical, for example, for maintaining
statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
entirely.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu: Make rcu node arrays static const char * const</title>
<updated>2014-07-09T16:14:34Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2014-05-06T17:21:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4426b49c65e0d266f8a9181ca51d5bf11407714'/>
<id>urn:sha1:b4426b49c65e0d266f8a9181ca51d5bf11407714</id>
<content type='text'>
Those two arrays are being passed to lockdep_init_map(), which expects
const char *, and are stored in lockdep_map the same way.

Cc: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu: Reduce overhead of cond_resched() checks for RCU</title>
<updated>2014-06-23T18:19:32Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2014-06-20T23:49:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4a81e8328d3791a4f99bf5b436d050f6dc5ffea3'/>
<id>urn:sha1:4a81e8328d3791a4f99bf5b436d050f6dc5ffea3</id>
<content type='text'>
Commit ac1bea85781e (Make cond_resched() report RCU quiescent states)
fixed a problem where a CPU looping in the kernel with but one runnable
task would give RCU CPU stall warnings, even if the in-kernel loop
contained cond_resched() calls.  Unfortunately, in so doing, it introduced
performance regressions in Anton Blanchard's will-it-scale "open1" test.
The problem appears to be not so much the increased cond_resched() path
length as an increase in the rate at which grace periods complete, which
increased per-update grace-period overhead.

This commit takes a different approach to fixing this bug, mainly by
moving the RCU-visible quiescent state from cond_resched() to
rcu_note_context_switch(), and by further reducing the check to a
simple non-zero test of a single per-CPU variable.  However, this
approach requires that the force-quiescent-state processing send
resched IPIs to the offending CPUs.  These will be sent only once
the grace period has reached an age specified by the boot/sysfs
parameter rcutree.jiffies_till_sched_qs, or once the grace period
reaches an age halfway to the point at which RCU CPU stall warnings
will be emitted, whichever comes first.

Reported-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Lameter &lt;cl@gentwo.org&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
[ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
  ktest build robot.  Also fixed smp_mb() comment as noted by
  Oleg Nesterov. ]

Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next</title>
<updated>2014-06-03T19:57:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-06-03T19:57:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=776edb59317ada867dfcddde40b55648beeb0078'/>
<id>urn:sha1:776edb59317ada867dfcddde40b55648beeb0078</id>
<content type='text'>
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
</content>
</entry>
<entry>
<title>rcu: Variable name changed in tree_plugin.h and used in tree.c</title>
<updated>2014-05-14T18:41:04Z</updated>
<author>
<name>Uma Sharma</name>
<email>uma.sharma523@gmail.com</email>
</author>
<published>2014-03-24T05:32:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e534165bbf6a04d001748c573c7d6a7bae3713a5'/>
<id>urn:sha1:e534165bbf6a04d001748c573c7d6a7bae3713a5</id>
<content type='text'>
The variable and struct both having the name "rcu_state" confuses
sparse in some situations, so this commit changes the variable to
"rcu_state_p" in order to avoid this confusion.  This also makes
things easier for human readers.

Signed-off-by: Uma Sharma &lt;uma.sharma523@gmail.com&gt;
[ paulmck: Changed the declaration and several additional uses. ]
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
</feed>
