<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/smp.c, branch v2.6.38</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.38</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.38'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2011-01-21T02:30:37Z</updated>
<entry>
<title>Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2011-01-21T02:30:37Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-01-21T02:30:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2b1caf6ed7b888c95a1909d343799672731651a5'/>
<id>urn:sha1:2b1caf6ed7b888c95a1909d343799672731651a5</id>
<content type='text'>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
  lockdep: Move early boot local IRQ enable/disable status to init/main.c
</content>
</entry>
<entry>
<title>kernel/smp.c: consolidate writes in smp_call_function_interrupt()</title>
<updated>2011-01-21T01:02:06Z</updated>
<author>
<name>Milton Miller</name>
<email>miltonm@bga.com</email>
</author>
<published>2011-01-20T22:44:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=225c8e010f2d17a62aef131e24c6e7c111f36f9b'/>
<id>urn:sha1:225c8e010f2d17a62aef131e24c6e7c111f36f9b</id>
<content type='text'>
We have to test the cpu mask in the interrupt handler before checking the
refs, otherwise we can start to follow an entry before its deleted and
find it partially initailzed for the next trip.  Presently we also clear
the cpumask bit before executing the called function, which implies
getting write access to the line.  After the function is called we then
decrement refs, and if they go to zero we then unlock the structure.

However, this implies getting write access to the call function data
before and after another the function is called.  If we can assert that no
smp_call_function execution function is allowed to enable interrupts, then
we can move both writes to after the function is called, hopfully allowing
both writes with one cache line bounce.

On a 256 thread system with a kernel compiled for 1024 threads, the time
to execute testcase in the "smp_call_function_many race" changelog was
reduced by about 30-40ms out of about 545 ms.

I decided to keep this as WARN because its now a buggy function, even
though the stack trace is of no value -- a simple printk would give us the
information needed.

Raw data:

Without patch:
  ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
  ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
  ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
  ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
  ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
  ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
  ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
  ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

With patch:
  ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
  ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
  ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
  ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
  ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
  ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
  ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

	#include &lt;linux/module.h&gt;
	#include &lt;linux/init.h&gt;
	#include &lt;linux/sched.h&gt; /* sched clock */

	#define ITERATIONS 100

	static void do_nothing_ipi(void *dummy)
	{
	}

	static void do_ipis(struct work_struct *dummy)
	{
		int i;

		for (i = 0; i &lt; ITERATIONS; i++)
			smp_call_function(do_nothing_ipi, NULL, 1);

		printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
	}

	static struct work_struct work[NR_CPUS];

	static int __init testcase_init(void)
	{
		int cpu;
		u64 start, started, done;

		start = local_clock();
		for_each_online_cpu(cpu) {
			INIT_WORK(&amp;work[cpu], do_ipis);
			schedule_work_on(cpu, &amp;work[cpu]);
		}
		started = local_clock();
		for_each_online_cpu(cpu)
			flush_work(&amp;work[cpu]);
		done = local_clock();
		pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
			started-start, done-started, done-start);

		return 0;
	}

	static void __exit testcase_exit(void)
	{
	}

	module_init(testcase_init)
	module_exit(testcase_exit)
	MODULE_LICENSE("GPL");
	MODULE_AUTHOR("Anton Blanchard");

Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/smp.c: fix smp_call_function_many() SMP race</title>
<updated>2011-01-21T01:02:06Z</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2011-01-20T22:44:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6dc19899958e420a931274b94019e267e2396d3e'/>
<id>urn:sha1:6dc19899958e420a931274b94019e267e2396d3e</id>
<content type='text'>
I noticed a failure where we hit the following WARN_ON in
generic_smp_call_function_interrupt:

                if (!cpumask_test_and_clear_cpu(cpu, data-&gt;cpumask))
                        continue;

                data-&gt;csd.func(data-&gt;csd.info);

                refs = atomic_dec_return(&amp;data-&gt;refs);
                WARN_ON(refs &lt; 0);      &lt;-------------------------

We atomically tested and cleared our bit in the cpumask, and yet the
number of cpus left (ie refs) was 0.  How can this be?

It turns out commit 54fdade1c3332391948ec43530c02c4794a38172
("generic-ipi: make struct call_function_data lockless") is at fault.  It
removes locking from smp_call_function_many and in doing so creates a
rather complicated race.

The problem comes about because:

 - The smp_call_function_many interrupt handler walks call_function.queue
   without any locking.
 - We reuse a percpu data structure in smp_call_function_many.
 - We do not wait for any RCU grace period before starting the next
   smp_call_function_many.

Imagine a scenario where CPU A does two smp_call_functions back to back,
and CPU B does an smp_call_function in between.  We concentrate on how CPU
C handles the calls:

CPU A            CPU B                  CPU C              CPU D

smp_call_function
                                        smp_call_function_interrupt
                                            walks
					call_function.queue sees
					data from CPU A on list

                 smp_call_function

                                        smp_call_function_interrupt
                                            walks

                                        call_function.queue sees
                                          (stale) CPU A on list
							   smp_call_function int
							   clears last ref on A
							   list_del_rcu, unlock
smp_call_function reuses
percpu *data A
                                         data-&gt;cpumask sees and
                                         clears bit in cpumask
                                         might be using old or new fn!
                                         decrements refs below 0

set data-&gt;refs (too late!)

The important thing to note is since the interrupt handler walks a
potentially stale call_function.queue without any locking, then another
cpu can view the percpu *data structure at any time, even when the owner
is in the process of initialising it.

The following test case hits the WARN_ON 100% of the time on my PowerPC
box (having 128 threads does help :)

#include &lt;linux/module.h&gt;
#include &lt;linux/init.h&gt;

#define ITERATIONS 100

static void do_nothing_ipi(void *dummy)
{
}

static void do_ipis(struct work_struct *dummy)
{
	int i;

	for (i = 0; i &lt; ITERATIONS; i++)
		smp_call_function(do_nothing_ipi, NULL, 1);

	printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
}

static struct work_struct work[NR_CPUS];

static int __init testcase_init(void)
{
	int cpu;

	for_each_online_cpu(cpu) {
		INIT_WORK(&amp;work[cpu], do_ipis);
		schedule_work_on(cpu, &amp;work[cpu]);
	}

	return 0;
}

static void __exit testcase_exit(void)
{
}

module_init(testcase_init)
module_exit(testcase_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anton Blanchard");

I tried to fix it by ordering the read and the write of -&gt;cpumask and
-&gt;refs.  In doing so I missed a critical case but Paul McKenney was able
to spot my bug thankfully :) To ensure we arent viewing previous
iterations the interrupt handler needs to read -&gt;refs then -&gt;cpumask then
-&gt;refs _again_.

Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

[miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
[miltonm@bga.com: remove excess tests]
Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: &lt;stable@kernel.org&gt; [2.6.32+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c</title>
<updated>2011-01-20T12:32:34Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-01-20T11:07:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bd924e8cbd4b73ffb7d707a774c04f7e2cae88ed'/>
<id>urn:sha1:bd924e8cbd4b73ffb7d707a774c04f7e2cae88ed</id>
<content type='text'>
percpu may end up calling vfree() during early boot which in
turn may call on_each_cpu() for TLB flushes.  The function of
on_each_cpu() can be done safely while IRQ is disabled during
early boot but it assumed that the function is always called
with local IRQ enabled which ended up enabling local IRQ
prematurely during boot and triggering a couple of warnings.

This patch updates on_each_cpu() and smp_call_function_many()
such on_each_cpu() can be used safely while
early_boot_irqs_disabled is set.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
LKML-Reference: &lt;20110120110713.GC6036@htj.dyndns.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>kernel: clean up USE_GENERIC_SMP_HELPERS</title>
<updated>2011-01-13T16:03:08Z</updated>
<author>
<name>Amerigo Wang</name>
<email>amwang@redhat.com</email>
</author>
<published>2011-01-13T00:59:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=351f8f8e6499ae4fff40f5e3a8fe16d9e1903646'/>
<id>urn:sha1:351f8f8e6499ae4fff40f5e3a8fe16d9e1903646</id>
<content type='text'>
For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.

Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.

For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g.  blackfin, only
on_each_cpu() is compiled.

Signed-off-by: Amerigo Wang &lt;amwang@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Typedef SMP call function pointer</title>
<updated>2010-10-27T16:28:36Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2010-10-27T16:28:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3a5f65df5a0fcbaa35e5417c0420d691fee4ac56'/>
<id>urn:sha1:3a5f65df5a0fcbaa35e5417c0420d691fee4ac56</id>
<content type='text'>
Typedef the pointer to the function to be called by smp_call_function() and
friends:

	typedef void (*smp_call_func_t)(void *info);

as it is used in a fair number of places.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: linux-arch@vger.kernel.org
</content>
</entry>
<entry>
<title>generic-ipi: Fix deadlock in __smp_call_function_single</title>
<updated>2010-09-10T14:48:40Z</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2010-09-10T11:47:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27c379f7f89a4d558c685b5d89b5ba2fe79ae701'/>
<id>urn:sha1:27c379f7f89a4d558c685b5d89b5ba2fe79ae701</id>
<content type='text'>
Just got my 6 way machine to a state where cpu 0 is in an
endless loop within __smp_call_function_single.
All other cpus are idle.

The call trace on cpu 0 looks like this:

 __smp_call_function_single
 scheduler_tick
 update_process_times
 tick_sched_timer
 __run_hrtimer
 hrtimer_interrupt
 clock_comparator_work
 do_extint
 ext_int_handler
 ----&gt; timer irq
 cpu_idle

__smp_call_function_single() got called from nohz_balancer_kick()
(inlined) with the remote cpu being 1, wait being 0 and the per
cpu variable remote_sched_softirq_cb (call_single_data) of the
current cpu (0).

Then it loops forever when it tries to grab the lock of the
call_single_data, since it is already locked and enqueued on cpu 0.

My theory how this could have happened: for some reason the
scheduler decided to call __smp_call_function_single() on it's own
cpu, and sends an IPI to itself. The interrupt stays pending
since IRQs are disabled. If then the hypervisor schedules the
cpu away it might happen that upon rescheduling both the IPI and
the timer IRQ are pending. If then interrupts are enabled again
it depends which one gets scheduled first.
If the timer interrupt gets delivered first we end up with the
local deadlock as seen in the calltrace above.

Let's make __smp_call_function_single() check if the target cpu is
the current cpu and execute the function immediately just like
smp_call_function_single does. That should prevent at least the
scenario described here.

It might also be that the scheduler is not supposed to call
__smp_call_function_single with the remote cpu being the current
cpu, but that is a different issue.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
LKML-Reference: &lt;20100910114729.GB2827@osiris.boeblingen.de.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>kernel/: convert cpu notifier to return encapsulate errno value</title>
<updated>2010-05-27T16:12:48Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2010-05-26T21:43:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=80b5184cc537718122e036afe7e62d202b70d077'/>
<id>urn:sha1:80b5184cc537718122e036afe7e62d202b70d077</id>
<content type='text'>
By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for kernel/*.c

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h</title>
<updated>2010-03-30T13:02:32Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-03-24T08:04:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5a0e3ad6af8660be21ca98a971cd00f331318c05'/>
<id>urn:sha1:5a0e3ad6af8660be21ca98a971cd00f331318c05</id>
<content type='text'>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -&gt; slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Guess-its-ok-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
</content>
</entry>
<entry>
<title>generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data</title>
<updated>2010-01-18T08:02:59Z</updated>
<author>
<name>Milton Miller</name>
<email>miltonm@bga.com</email>
</author>
<published>2010-01-18T02:00:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e03bcb68629c7f0728c95f1afe06ce48565c7713'/>
<id>urn:sha1:e03bcb68629c7f0728c95f1afe06ce48565c7713</id>
<content type='text'>
The smp ipi data is passed around and given write access by
other cpus and should be separated from per-cpu data consumed by
this cpu.

Looking for hot lines, I saw call_function_data shared with
tick_cpu_sched.

Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Acked-by: Anton Blanchard &lt;anton@samba.org&gt;
Acked-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: : Nick Piggin &lt;npiggin@suse.de&gt;
LKML-Reference: &lt;20100118020051.GR12666@kryten&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
