<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched.c, branch v2.6.28</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.28</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.28'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2008-12-09T18:27:03Z</updated>
<entry>
<title>sched: CPU remove deadlock fix</title>
<updated>2008-12-09T18:27:03Z</updated>
<author>
<name>Brian King</name>
<email>brking@linux.vnet.ibm.com</email>
</author>
<published>2008-12-09T14:47:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9a2bd244e18ffbb96c8b783210fda4eded7c7e6f'/>
<id>urn:sha1:9a2bd244e18ffbb96c8b783210fda4eded7c7e6f</id>
<content type='text'>
Impact: fix possible deadlock in CPU hot-remove path

This patch fixes a possible deadlock scenario in the CPU remove path.
migration_call grabs rq-&gt;lock, then wakes up everything on rq-&gt;migration_queue
with the lock held. Then one of the tasks on the migration queue ends up
calling tg_shares_up which then also tries to acquire the same rq-&gt;lock.

[c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
[c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
[c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
[c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
[c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
[c000000058eab640] c00000000007ada4 .complete+0x54/0x80
[c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
[c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
[c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
[c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
[c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
[c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
[c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
[c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
[c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
[c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40

Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: prevent divide by zero error in cpu_avg_load_per_task, update</title>
<updated>2008-11-29T19:45:15Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2008-11-29T19:45:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af6d596fd603219b054c1c90fb16672a9fd441bd'/>
<id>urn:sha1:af6d596fd603219b054c1c90fb16672a9fd441bd</id>
<content type='text'>
Regarding the bug addressed in:

  4cd4262: sched: prevent divide by zero error in cpu_avg_load_per_task

Linus points out that the fix is not complete:

&gt; There's nothing that keeps gcc from deciding not to reload
&gt; rq-&gt;nr_running.
&gt;
&gt; Of course, in _practice_, I don't think gcc ever will (if it decides
&gt; that it will spill, gcc is likely going to decide that it will
&gt; literally spill the local variable to the stack rather than decide to
&gt; reload off the pointer), but it's a valid compiler optimization, and
&gt; it even has a name (rematerialization).
&gt;
&gt; So I suspect that your patch does fix the bug, but it still leaves the
&gt; fairly unlikely _potential_ for it to re-appear at some point.
&gt;
&gt; We have ACCESS_ONCE() as a macro to guarantee that the compiler
&gt; doesn't rematerialize a pointer access. That also would clarify
&gt; the fact that we access something unsafe outside a lock.

So make sure our nr_running value is immutable and cannot change
after we check it for nonzero.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: prevent divide by zero error in cpu_avg_load_per_task</title>
<updated>2008-11-27T09:29:52Z</updated>
<author>
<name>Steven Rostedt</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2008-11-27T02:04:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4cd4262034849da01eb88659af677b69f8169f06'/>
<id>urn:sha1:4cd4262034849da01eb88659af677b69f8169f06</id>
<content type='text'>
Impact: fix divide by zero crash in scheduler rebalance irq

While testing the branch profiler, I hit this crash:

divide error: 0000 [#1] PREEMPT SMP
[...]
RIP: 0010:[&lt;ffffffff8024a008&gt;]  [&lt;ffffffff8024a008&gt;] cpu_avg_load_per_task+0x50/0x7f
[...]
Call Trace:
 &lt;IRQ&gt; &lt;0&gt; [&lt;ffffffff8024fd43&gt;] find_busiest_group+0x3e5/0xcaa
 [&lt;ffffffff8025da75&gt;] rebalance_domains+0x2da/0xa21
 [&lt;ffffffff80478769&gt;] ? find_next_bit+0x1b2/0x1e6
 [&lt;ffffffff8025e2ce&gt;] run_rebalance_domains+0x112/0x19f
 [&lt;ffffffff8026d7c2&gt;] __do_softirq+0xa8/0x232
 [&lt;ffffffff8020ea7c&gt;] call_softirq+0x1c/0x3e
 [&lt;ffffffff8021047a&gt;] do_softirq+0x94/0x1cd
 [&lt;ffffffff8026d5eb&gt;] irq_exit+0x6b/0x10e
 [&lt;ffffffff8022e6ec&gt;] smp_apic_timer_interrupt+0xd3/0xff
 [&lt;ffffffff8020e4b3&gt;] apic_timer_interrupt+0x13/0x20

The code for cpu_avg_load_per_task has:

	if (rq-&gt;nr_running)
		rq-&gt;avg_load_per_task = rq-&gt;load.weight / rq-&gt;nr_running;

The runqueue lock is not held here, and there is nothing that prevents
the rq-&gt;nr_running from going to zero after it passes the if condition.

The branch profiler simply made the race window bigger.

This patch saves off the rq-&gt;nr_running to a local variable and uses that
for both the condition and the division.

Signed-off-by: Steven Rostedt &lt;srostedt@redhat.com&gt;
Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>cpuset: fix regression when failed to generate sched domains</title>
<updated>2008-11-18T07:44:51Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-11-18T06:02:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=700018e0a77b4113172257fcdaa1c58e27a5074f'/>
<id>urn:sha1:700018e0a77b4113172257fcdaa1c58e27a5074f</id>
<content type='text'>
Impact: properly rebuild sched-domains on kmalloc() failure

When cpuset failed to generate sched domains due to kmalloc()
failure, the scheduler should fallback to the single partition
'fallback_doms' and rebuild sched domains, but now it only
destroys but not rebuilds sched domains.

The regression was introduced by:

| commit dfb512ec4834116124da61d6c1ee10fd0aa32bd6
| Author: Max Krasnyansky &lt;maxk@qualcomm.com&gt;
| Date:   Fri Aug 29 13:11:41 2008 -0700
|
|    sched: arch_reinit_sched_domains() must destroy domains to force rebuild

After the above commit, partition_sched_domains(0, NULL, NULL) will
only destroy sched domains and partition_sched_domains(1, NULL, NULL)
will create the default sched domain.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Max Krasnyansky &lt;maxk@qualcomm.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: fix init_idle()'s use of sched_clock()</title>
<updated>2008-11-12T19:05:50Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2008-11-12T19:05:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5cbd54ef470d880fc37fbe4b21eb514806d51e0d'/>
<id>urn:sha1:5cbd54ef470d880fc37fbe4b21eb514806d51e0d</id>
<content type='text'>
Maciej Rutecki reported:

&gt; I have this bug during suspend to disk:
&gt;
&gt; [  188.592151] Enabling non-boot CPUs ...
&gt; [  188.592151] SMP alternatives: switching to SMP code
&gt; [  188.666058] BUG: using smp_processor_id() in preemptible
&gt; [00000000]
&gt; code: suspend_to_disk/2934
&gt; [  188.666064] caller is native_sched_clock+0x2b/0x80

Which, as noted by Linus, was caused by me, via:

  7cbaef9c "sched: optimize sched_clock() a bit"

Move the rq locking a bit earlier in the initialization sequence,
that will make the sched_clock() call in init_idle() non-preemptible.

Reported-by: Maciej Rutecki &lt;maciej.rutecki@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: fix stale value in average load per task</title>
<updated>2008-11-12T11:33:50Z</updated>
<author>
<name>Balbir Singh</name>
<email>balbir@linux.vnet.ibm.com</email>
</author>
<published>2008-11-12T10:49:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a2d477778e82a60a0b7114cefdb70aa43af28782'/>
<id>urn:sha1:a2d477778e82a60a0b7114cefdb70aa43af28782</id>
<content type='text'>
Impact: fix load balancer load average calculation accuracy

cpu_avg_load_per_task() returns a stale value when nr_running is 0.
It returns an older stale (caculated when nr_running was non zero) value.

This patch returns and sets rq-&gt;avg_load_per_task to zero when nr_running
is 0.

Compile and boot tested on a x86_64 box.

Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>fix for account_group_exec_runtime(), make sure -&gt;signal can't be freed under rq-&gt;lock</title>
<updated>2008-11-11T07:01:43Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2008-11-10T14:39:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad474caca3e2a0550b7ce0706527ad5ab389a4d4'/>
<id>urn:sha1:ad474caca3e2a0550b7ce0706527ad5ab389a4d4</id>
<content type='text'>
Impact: fix hang/crash on ia64 under high load

This is ugly, but the simplest patch by far.

Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check -&gt;signal != NULL.

Change __exit_signal() to do spin_unlock_wait(&amp;task_rq(tsk)-&gt;lock)
before __cleanup_signal() to make sure -&gt;signal can't be freed under
task_rq(tsk)-&gt;lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.

Thanks to Ingo who nacked my previous buggy patch.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Reported-by: Doug Chapman &lt;doug.chapman@hp.com&gt;
</content>
</entry>
<entry>
<title>sched: clean up debug info</title>
<updated>2008-11-10T09:51:51Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2008-11-10T09:46:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5ac5c4d604bf894ef672a7971d03fefdc7ea7e49'/>
<id>urn:sha1:5ac5c4d604bf894ef672a7971d03fefdc7ea7e49</id>
<content type='text'>
Impact: clean up and fix debug info printout

While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.

Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: fix memory leak in a failure path</title>
<updated>2008-11-07T07:29:58Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-11-07T06:47:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ca3273f9646694e0419cfb9d6c12deb1c9aff27c'/>
<id>urn:sha1:ca3273f9646694e0419cfb9d6c12deb1c9aff27c</id>
<content type='text'>
Impact: fix rare memory leak in the sched-domains manual reconfiguration code

In the failure path, rd is not attached to a sched domain,
so it causes a leak.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: fix a bug in sched domain degenerate</title>
<updated>2008-11-07T07:29:57Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-11-06T01:45:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f29c9b1ccb52904ee442a933cf3dee628f9f4e62'/>
<id>urn:sha1:f29c9b1ccb52904ee442a933cf3dee628f9f4e62</id>
<content type='text'>
Impact: re-add incorrectly eliminated sched domain layers

(1) on i386 with SCHED_SMT and SCHED_MC enabled
	# mount -t cgroup -o cpuset xxx /mnt
	# echo 0 &gt; /mnt/cpuset.sched_load_balance
	# mkdir /mnt/0
	# echo 0 &gt; /mnt/0/cpuset.cpus
	# dmesg
	CPU0 attaching sched-domain:
	 domain 0: span 0 level CPU
	  groups: 0

(2) on i386 with SCHED_MC enabled but SCHED_SMT disabled
	# same with (1)
	# dmesg
	CPU0 attaching NULL sched-domain.

The bug is that some sched domains may be skipped unintentionally when
degenerating (optimizing) sched domains.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
