linux/kernel/smp.c, branch v2.6.30

cpumask: alloc zeroed cpumask for static cpumask_var_ts

2009-06-09T13:00:27Z

These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu Signed-off-by: Rusty Russell

generic-ipi: eliminate WARN_ON()s during oops/panic

2009-03-13T09:47:34Z

Do not output smp-call related warnings in the oops/panic codepath. Reported-by: Jan Beulich Acked-by: Peter Zijlstra LKML-Reference: <49B91A7E.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar

generic-ipi: cleanups

2009-02-25T15:52:50Z

Andrew pointed out that there's some small amount of style rot in kernel/smp.c. Clean it up. Reported-by: Andrew Morton Cc: Nick Piggin Cc: Jens Axboe Cc: Peter Zijlstra Signed-off-by: Ingo Molnar

generic-ipi: remove CSD_FLAG_WAIT

2009-02-25T13:13:44Z

Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework the code so that we can use CSD_FLAG_LOCK for both purposes. Signed-off-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Linus Torvalds Cc: Nick Piggin Cc: Jens Axboe Cc: "Paul E. McKenney" Cc: Rusty Russell Signed-off-by: Ingo Molnar

generic-ipi: remove kmalloc()

2009-02-25T13:13:43Z

Remove the use of kmalloc() from the smp_call_function_*() calls. Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu data for single cpu ipi calls) started the discussion on the use of kmalloc() in this code and fixed the smp_call_function_single(.wait=0) fallback case. In this patch we complete this by also providing means for the _many() call, which fully removes the need for kmalloc() in this code. The problem with the _many() call is that other cpus might still be observing our entry when we're done with it. It solved this by dynamically allocating data elements and RCU-freeing it. We solve it by using a single per-cpu entry which provides static storage and solves one half of the problem (avoiding referencing freed data). The other half, ensuring the queue iteration it still possible, is done by placing re-used entries at the head of the list. This means that if someone was still iterating that entry when it got moved, he will now re-visit the entries on the list he had already seen, but avoids skipping over entries like would have happened had we placed the new entry at the end. Furthermore, visiting entries twice is not a problem, since we remove our cpu from the entry's cpumask once its called. Many thanks to Oleg for his suggestions and him poking holes in my earlier attempts. Signed-off-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Linus Torvalds Cc: Nick Piggin Cc: Jens Axboe Cc: "Paul E. McKenney" Cc: Rusty Russell Signed-off-by: Ingo Molnar

generic IPI: simplify barriers and locking

2009-02-25T11:27:08Z

Simplify the barriers in generic remote function call interrupt code. Firstly, just unconditionally take the lock and check the list in the generic_call_function_single_interrupt IPI handler. As we've just taken an IPI here, the chances are fairly high that there will be work on the list for us, so do the locking unconditionally. This removes the tricky lockless list_empty check and dubious barriers. The change looks bigger than it is because it is just removing an outer loop. Secondly, clarify architecture specific IPI locking rules. Generic code has no tools to impose any sane ordering on IPIs if they go outside normal cache coherency, ergo the arch code must make them appear to obey cache coherency as a "memory operation" to initiate an IPI, and a "memory operation" to receive one. This way at least they can be reasoned about in generic code, and smp_mb used to provide ordering. The combination of these two changes means that explict barriers can be taken out of queue handling for the single case -- shared data is explicitly locked, and ipi ordering must conform to that, so no barriers needed. An extra barrier is needed in the many handler, so as to ensure we load the list element after the IPI is received. Does any architecture actually *need* these barriers? For the initiator I could see it, but for the handler I would be surprised. So the other thing we could do for simplicity is just to require that, rather than just matching with cache coherency, we just require a full barrier before generating an IPI, and after receiving an IPI. In which case, the smp_mb()s can go away. But just for now, we'll be on the safe side and use the barriers (they're in the slow case anyway). Signed-off-by: Nick Piggin Acked-by: Peter Zijlstra Cc: linux-arch@vger.kernel.org Cc: Andrew Morton Cc: Linus Torvalds Cc: Jens Axboe Cc: Oleg Nesterov Cc: Suresh Siddha Signed-off-by: Ingo Molnar

generic-ipi: use per cpu data for single cpu ipi calls

2009-01-30T17:31:08Z

The smp_call_function can be passed a wait parameter telling it to wait for all the functions running on other CPUs to complete before returning, or to return without waiting. Unfortunately, this is currently just a suggestion and not manditory. That is, the smp_call_function can decide not to return and wait instead. The reason for this is because it uses kmalloc to allocate storage to send to the called CPU and that CPU will free it when it is done. But if we fail to allocate the storage, the stack is used instead. This means we must wait for the called CPU to finish before continuing. Unfortunatly, some callers do no abide by this hint and act as if the non-wait option is mandatory. The MTRR code for instance will deadlock if the smp_call_function is set to wait. This is because the smp_call_function will wait for the other CPUs to finish their called functions, but those functions are waiting on the caller to continue. This patch changes the generic smp_call_function code to use per cpu variables if the allocation of the data fails for a single CPU call. The smp_call_function_many will fall back to the smp_call_function_single if it fails its alloc. The smp_call_function_single is modified to not force the wait state. Since we now are using a single data per cpu we must synchronize the callers to prevent a second caller modifying the data before the first called IPI functions complete. To do so, I added a flag to the call_single_data called CSD_FLAG_LOCK. When the single CPU is called (which can be called when a many call fails an alloc), we set the LOCK bit on this per cpu data. When the caller finishes it clears the LOCK bit. The caller must wait till the LOCK bit is cleared before setting it. When it is cleared, there is no IPI function using it. Signed-off-by: Steven Rostedt Signed-off-by: Peter Zijlstra Acked-by: Jens Axboe Acked-by: Linus Torvalds Signed-off-by: Ingo Molnar

cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: core

2008-12-31T23:42:15Z

Impact: cleanup In future, all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell Signed-off-by: Mike Travis Acked-by: Ingo Molnar Acked-by: James Morris Cc: Eric Biederman

cpumask: arch_send_call_function_ipi_mask: core

2008-12-29T22:35:17Z

Impact: new API to reduce stack usage We're weaning the core code off handing cpumask's around on-stack. This introduces arch_send_call_function_ipi_mask(). Signed-off-by: Rusty Russell

cpumask: smp_call_function_many()

2008-12-29T22:35:16Z

Impact: Implementation change to remove cpumask_t from stack. Actually change smp_call_function_mask() to smp_call_function_many(). We avoid cpumasks on the stack in this version. (S390 has its own version, but that's going away apparently). We have to do some dancing to figure out if 0 or 1 other cpus are in the mask supplied and the online mask without allocating a tmp cpumask. It's still fairly cheap. We allocate the cpumask at the end of the call_function_data structure: if allocation fails we fallback to smp_call_function_single rather than using the baroque quiescing code (which needs a cpumask on stack). (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!) Signed-off-by: Rusty Russell Signed-off-by: Mike Travis Cc: Hiroshi Shimamoto Cc: npiggin@suse.de Cc: axboe@kernel.dk