<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/core.c, branch v3.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-12-17T14:08:43Z</updated>
<entry>
<title>sched: Assign correct scheduling domain to 'sd_llc'</title>
<updated>2013-12-17T14:08:43Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-12-17T09:21:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d4cf996cf134e8ddb4f906b8197feb9267c2b77'/>
<id>urn:sha1:5d4cf996cf134e8ddb4f906b8197feb9267c2b77</id>
<content type='text'>
Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.

One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.

This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.

	4-core machine
			    3.13.0-rc3            3.13.0-rc3
			       vanilla            fixsd-v3r3
	Mean   1        0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2        0.34 (  0.00%)        0.10 ( 70.59%)
	Mean   3        1.29 (  0.00%)        0.93 ( 27.91%)
	Mean   4        7.08 (  0.00%)        0.77 ( 89.12%)
	Mean   5      193.54 (  0.00%)        2.14 ( 98.89%)
	Mean   6      151.12 (  0.00%)        2.06 ( 98.64%)
	Mean   7      115.38 (  0.00%)        2.04 ( 98.23%)
	Mean   8      108.65 (  0.00%)        1.92 ( 98.23%)

	8-core machine
	Mean   1         0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2         0.40 (  0.00%)        0.21 ( 47.50%)
	Mean   3        23.73 (  0.00%)        0.89 ( 96.25%)
	Mean   4        12.79 (  0.00%)        1.04 ( 91.87%)
	Mean   5        13.08 (  0.00%)        2.42 ( 81.50%)
	Mean   6        23.21 (  0.00%)       69.46 (-199.27%)
	Mean   7        15.85 (  0.00%)      101.72 (-541.77%)
	Mean   8       109.37 (  0.00%)       19.13 ( 82.51%)
	Mean   12      124.84 (  0.00%)       28.62 ( 77.07%)
	Mean   16      113.50 (  0.00%)       24.16 ( 78.71%)

It's eliminated for one machine and reduced for another.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Alex Shi &lt;alex.shi@linaro.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: H Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Initialize power_orig for overlapping groups</title>
<updated>2013-12-11T14:47:59Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-12-11T10:09:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8e8339a3a1069141985daaa2521ba304509ddecd'/>
<id>urn:sha1:8e8339a3a1069141985daaa2521ba304509ddecd</id>
<content type='text'>
Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
Make sure to always initialize power_orig now that we actually use it.

Ideally build_sched_domains() -&gt; init_sched_groups_power() would also
initialize this; but for some yet unexplained reason some setups seem
to miss updates there.

Reported-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Tested-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Expose preempt_schedule_irq()</title>
<updated>2013-11-27T10:04:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2013-11-21T11:41:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=32e475d76a3e40879cd9ee4f69b19615062280d7'/>
<id>urn:sha1:32e475d76a3e40879cd9ee4f69b19615062280d7</id>
<content type='text'>
Tony reported that aa0d53260596 ("ia64: Use preempt_schedule_irq")
broke PREEMPT=n builds on ia64.

Ok, wrapped my brain around it. I tripped over the magic asm foo which
has a single need_resched check and schedule point for both sys call
return and interrupt return.

So you need the schedule_preempt_irq() for kernel preemption from
interrupt return while on a normal syscall preemption a schedule would
be sufficient. But using schedule_preempt_irq() is not harmful here in
any way. It just sets the preempt_active bit also in cases where it
would not be required.

Even on preempt=n kernels adding the preempt_active bit is completely
harmless. So instead of having an extra function, moving the existing
one out of the ifdef PREEMPT looks like the sanest thing to do.

It would also allow getting rid of various other sti/schedule/cli asm
magic in other archs.

Reported-and-Tested-by: Tony Luck &lt;tony.luck@gmail.com&gt;
Fixes: aa0d53260596 ("ia64: Use preempt_schedule_irq")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[slightly edited Changelog]
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311211230030.30673@ionos.tec.linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix a trivial typo in comments</title>
<updated>2013-11-19T16:01:18Z</updated>
<author>
<name>Shigeru Yoshida</name>
<email>shigeru.yoshida@gmail.com</email>
</author>
<published>2013-11-17T03:12:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0515973ffb16c2852a1bb1df2ca1456556faaaa5'/>
<id>urn:sha1:0515973ffb16c2852a1bb1df2ca1456556faaaa5</id>
<content type='text'>
Fix a trivial typo in rq_attach_root().

Signed-off-by: Shigeru Yoshida &lt;shigeru.yoshida@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131117.121236.1990617639803941055.shigeru.yoshida@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Avoid NULL dereference on sd_busy</title>
<updated>2013-11-19T16:01:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-11-19T15:41:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=42eb088ed246a5a817bb45a8b32fe234cf1c0f8b'/>
<id>urn:sha1:42eb088ed246a5a817bb45a8b32fe234cf1c0f8b</id>
<content type='text'>
Commit 37dc6b50cee9 ("sched: Remove unnecessary iteration over sched
domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some
conditions leading to a possible NULL deref in set_cpu_sd_state_idle().

Reported-by: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Optimize task_sched_runtime()</title>
<updated>2013-11-13T12:33:54Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-11-11T17:21:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=911b2898b3c9fe0048e9485ad1629ed4fce330fd'/>
<id>urn:sha1:911b2898b3c9fe0048e9485ad1629ed4fce330fd</id>
<content type='text'>
Large multi-threaded apps like to hit this using do_sys_times() and
then queue up on the rq-&gt;lock.

Avoid when possible.

Larry reported ~20% performance increase his test case.

Reported-by: Larry Woodman &lt;lwoodman@redhat.com&gt;
Suggested-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20131111172925.GG26898@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus</title>
<updated>2013-11-06T11:37:55Z</updated>
<author>
<name>Preeti U Murthy</name>
<email>preeti@linux.vnet.ibm.com</email>
</author>
<published>2013-10-30T03:12:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=37dc6b50cee97954c4e6edcd5b1fa614b76038ee'/>
<id>urn:sha1:37dc6b50cee97954c4e6edcd5b1fa614b76038ee</id>
<content type='text'>
nr_busy_cpus parameter is used by nohz_kick_needed() to find out the
number of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES
flag set.  Therefore instead of updating nr_busy_cpus at every level
of sched domain, since it is irrelevant, we can update this parameter
only at the parent domain of the sd which has this flag set. Introduce
a per-cpu parameter sd_busy which represents this parent domain.

In nohz_kick_needed() we directly query the nr_busy_cpus parameter
associated with the groups of sd_busy.

By associating sd_busy with the highest domain which has
SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains
which could have this flag set and trigger nohz_idle_balancing if any
of the levels have more than one busy cpu.

sd_busy is irrelevant for asymmetric load balancing. However sd_asym
has been introduced to represent the highest sched domain which has
SD_ASYM_PACKING flag set so that it can be queried directly when
required.

While we are at it, we might as well change the nohz_idle parameter to
be updated at the sd_busy domain level alone and not the base domain
level of a CPU.  This will unify the concept of busy cpus at just one
level of sched domain where it is currently used.

Signed-off-by: Preeti U Murthy&lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: svaidy@linux.vnet.ibm.com
Cc: vincent.guittot@linaro.org
Cc: bitbucket@online.de
Cc: benh@kernel.crashing.org
Cc: anton@samba.org
Cc: Morten.Rasmussen@arm.com
Cc: pjt@google.com
Cc: peterz@infradead.org
Cc: mikey@neuling.org
Link: http://lkml.kernel.org/r/20131030031252.23426.4417.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Move completion code from core.c to completion.c</title>
<updated>2013-11-06T06:49:19Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-10-04T20:06:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b8a216269ec0ce2e961d32e6d640d7010b8a818e'/>
<id>urn:sha1:b8a216269ec0ce2e961d32e6d640d7010b8a818e</id>
<content type='text'>
Completions already have their own header file: linux/completion.h
Move the implementation out of kernel/sched/core.c and into its own
file: kernel/sched/completion.c.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/n/tip-x2y49rmxu5dljt66ai2lcfuw@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Move wait code from core.c to wait.c</title>
<updated>2013-11-06T06:49:18Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-10-04T15:24:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4145872f7049e429718b40b86e1b46659988398'/>
<id>urn:sha1:b4145872f7049e429718b40b86e1b46659988398</id>
<content type='text'>
For some reason only the wait part of the wait api lives in
kernel/sched/wait.c and the wake part still lives in kernel/sched/core.c;
ammend this.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/n/tip-ftycee88naznulqk7ei5mbci@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix race on toggling cfs_bandwidth_used</title>
<updated>2013-10-29T11:02:19Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2013-10-16T18:16:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1ee14e6c8cddeeb8a490d7b54cd9016e4bb900b4'/>
<id>urn:sha1:1ee14e6c8cddeeb8a490d7b54cd9016e4bb900b4</id>
<content type='text'>
When we transition cfs_bandwidth_used to false, any currently
throttled groups will incorrectly return false from cfs_rq_throttled.
While tg_set_cfs_bandwidth will unthrottle them eventually, currently
running code (including at least dequeue_task_fair and
distribute_cfs_runtime) will cause errors.

Fix this by turning off cfs_bandwidth_used only after unthrottling all
cfs_rqs.

Tested: toggle bandwidth back and forth on a loaded cgroup. Caused
crashes in minutes without the patch, hasn't crashed with it.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
