<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v3.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-01-12T08:22:15Z</updated>
<entry>
<title>sched: Calculate effective load even if local weight is 0</title>
<updated>2014-01-12T08:22:15Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2014-01-06T11:39:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9722c2dac708e9468cc0dc30218ef76946ffbc9d'/>
<id>urn:sha1:9722c2dac708e9468cc0dc30218ef76946ffbc9d</id>
<content type='text'>
Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.

Reported-and-tested-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
[ Wrote changelog]
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-12-19T17:11:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-19T17:11:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f7556698a36995a755a4ce154953dcf438145b3b'/>
<id>urn:sha1:f7556698a36995a755a4ce154953dcf438145b3b</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "An RT group-scheduling fix and the sched-domains topology setup fix
  from Mel"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities
  sched: Assign correct scheduling domain to 'sd_llc'
</content>
</entry>
<entry>
<title>sched: numa: skip inaccessible VMAs</title>
<updated>2013-12-19T03:04:51Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-12-19T01:08:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3c67f474558748b604e247d92b55dfe89654c81d'/>
<id>urn:sha1:3c67f474558748b604e247d92b55dfe89654c81d</id>
<content type='text'>
Inaccessible VMA should not be trapping NUMA hint faults. Skip them.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities</title>
<updated>2013-12-17T14:08:44Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>tkhai@yandex.ru</email>
</author>
<published>2013-11-27T15:59:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=757dfcaa41844595964f1220f1d33182dae49976'/>
<id>urn:sha1:757dfcaa41844595964f1220f1d33182dae49976</id>
<content type='text'>
This patch touches the RT group scheduling case.

Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.

The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq.  The rq's cpupri is set to the task's priority
equivalent, but real rq-&gt;rt.highest_prio.curr is less.

The patch below fixes the problem.

Signed-off-by: Kirill Tkhai &lt;tkhai@yandex.ru&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
CC: Steven Rostedt &lt;rostedt@goodmis.org&gt;
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Assign correct scheduling domain to 'sd_llc'</title>
<updated>2013-12-17T14:08:43Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-12-17T09:21:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d4cf996cf134e8ddb4f906b8197feb9267c2b77'/>
<id>urn:sha1:5d4cf996cf134e8ddb4f906b8197feb9267c2b77</id>
<content type='text'>
Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.

One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.

This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.

	4-core machine
			    3.13.0-rc3            3.13.0-rc3
			       vanilla            fixsd-v3r3
	Mean   1        0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2        0.34 (  0.00%)        0.10 ( 70.59%)
	Mean   3        1.29 (  0.00%)        0.93 ( 27.91%)
	Mean   4        7.08 (  0.00%)        0.77 ( 89.12%)
	Mean   5      193.54 (  0.00%)        2.14 ( 98.89%)
	Mean   6      151.12 (  0.00%)        2.06 ( 98.64%)
	Mean   7      115.38 (  0.00%)        2.04 ( 98.23%)
	Mean   8      108.65 (  0.00%)        1.92 ( 98.23%)

	8-core machine
	Mean   1         0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2         0.40 (  0.00%)        0.21 ( 47.50%)
	Mean   3        23.73 (  0.00%)        0.89 ( 96.25%)
	Mean   4        12.79 (  0.00%)        1.04 ( 91.87%)
	Mean   5        13.08 (  0.00%)        2.42 ( 81.50%)
	Mean   6        23.21 (  0.00%)       69.46 (-199.27%)
	Mean   7        15.85 (  0.00%)      101.72 (-541.77%)
	Mean   8       109.37 (  0.00%)       19.13 ( 82.51%)
	Mean   12      124.84 (  0.00%)       28.62 ( 77.07%)
	Mean   16      113.50 (  0.00%)       24.16 ( 78.71%)

It's eliminated for one machine and reduced for another.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Alex Shi &lt;alex.shi@linaro.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: H Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Rework sched_fair time accounting</title>
<updated>2013-12-11T14:52:35Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-11-18T17:27:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9dbdb155532395ba000c5d5d187658b0e17e529f'/>
<id>urn:sha1:9dbdb155532395ba000c5d5d187658b0e17e529f</id>
<content type='text'>
Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
results in him occasionally seeing time going backwards - which
crashes the scheduler ...

Most of our time accounting can actually handle that except the most
common one; the tick time update of sched_fair.

There is a further problem with that code; previously we assumed that
because we get a tick every TICK_NSEC our time delta could never
exceed 32bits and math was simpler.

However, ever since Frederic managed to get NO_HZ_FULL merged; this is
no longer the case since now a task can run for a long time indeed
without getting a tick. It only takes about ~4.2 seconds to overflow
our u32 in nanoseconds.

This means we not only need to better deal with time going backwards;
but also means we need to be able to deal with large deltas.

This patch reworks the entire code and uses mul_u64_u32_shr() as
proposed by Andy a long while ago.

We express our virtual time scale factor in a u32 multiplier and shift
right and the 32bit mul_u64_u32_shr() implementation reduces to a
single 32x32-&gt;64 multiply if the time delta is still short (common
case).

For 64bit a 64x64-&gt;128 multiply can be used if ARCH_SUPPORTS_INT128.

Reported-and-Tested-by: Christian Engelmayer &lt;cengelma@gmx.at&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: fweisbec@gmail.com
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Initialize power_orig for overlapping groups</title>
<updated>2013-12-11T14:47:59Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-12-11T10:09:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8e8339a3a1069141985daaa2521ba304509ddecd'/>
<id>urn:sha1:8e8339a3a1069141985daaa2521ba304509ddecd</id>
<content type='text'>
Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
Make sure to always initialize power_orig now that we actually use it.

Ideally build_sched_domains() -&gt; init_sched_groups_power() would also
initialize this; but for some yet unexplained reason some setups seem
to miss updates there.

Reported-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Tested-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Expose preempt_schedule_irq()</title>
<updated>2013-11-27T10:04:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2013-11-21T11:41:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=32e475d76a3e40879cd9ee4f69b19615062280d7'/>
<id>urn:sha1:32e475d76a3e40879cd9ee4f69b19615062280d7</id>
<content type='text'>
Tony reported that aa0d53260596 ("ia64: Use preempt_schedule_irq")
broke PREEMPT=n builds on ia64.

Ok, wrapped my brain around it. I tripped over the magic asm foo which
has a single need_resched check and schedule point for both sys call
return and interrupt return.

So you need the schedule_preempt_irq() for kernel preemption from
interrupt return while on a normal syscall preemption a schedule would
be sufficient. But using schedule_preempt_irq() is not harmful here in
any way. It just sets the preempt_active bit also in cases where it
would not be required.

Even on preempt=n kernels adding the preempt_active bit is completely
harmless. So instead of having an extra function, moving the existing
one out of the ifdef PREEMPT looks like the sanest thing to do.

It would also allow getting rid of various other sti/schedule/cli asm
magic in other archs.

Reported-and-Tested-by: Tony Luck &lt;tony.luck@gmail.com&gt;
Fixes: aa0d53260596 ("ia64: Use preempt_schedule_irq")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[slightly edited Changelog]
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311211230030.30673@ionos.tec.linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix a trivial typo in comments</title>
<updated>2013-11-19T16:01:18Z</updated>
<author>
<name>Shigeru Yoshida</name>
<email>shigeru.yoshida@gmail.com</email>
</author>
<published>2013-11-17T03:12:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0515973ffb16c2852a1bb1df2ca1456556faaaa5'/>
<id>urn:sha1:0515973ffb16c2852a1bb1df2ca1456556faaaa5</id>
<content type='text'>
Fix a trivial typo in rq_attach_root().

Signed-off-by: Shigeru Yoshida &lt;shigeru.yoshida@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131117.121236.1990617639803941055.shigeru.yoshida@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Avoid NULL dereference on sd_busy</title>
<updated>2013-11-19T16:01:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-11-19T15:41:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=42eb088ed246a5a817bb45a8b32fe234cf1c0f8b'/>
<id>urn:sha1:42eb088ed246a5a817bb45a8b32fe234cf1c0f8b</id>
<content type='text'>
Commit 37dc6b50cee9 ("sched: Remove unnecessary iteration over sched
domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some
conditions leading to a possible NULL deref in set_cpu_sd_state_idle().

Reported-by: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
