<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v3.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-09-20T09:59:39Z</updated>
<entry>
<title>sched/balancing: Fix cfs_rq-&gt;task_h_load calculation</title>
<updated>2013-09-20T09:59:39Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2013-09-14T15:39:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e3115ef5149fc502e3a2e80719dba54a8e7409d'/>
<id>urn:sha1:7e3115ef5149fc502e3a2e80719dba54a8e7409d</id>
<content type='text'>
Patch a003a2 (sched: Consider runnable load average in move_tasks())
sets all top-level cfs_rqs' h_load to rq-&gt;avg.load_avg_contrib, which is
always 0. This mistype leads to all tasks having weight 0 when load
balancing in a cpu-cgroup enabled setup. There obviously should be sum
of weights of all runnable tasks there instead. Fix it.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Reviewed-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1379173186-11944-1-git-send-email-vdavydov@parallels.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/balancing: Fix 'local-&gt;avg_load &gt; busiest-&gt;avg_load' case in fix_small_imbalance()</title>
<updated>2013-09-20T09:59:38Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2013-09-15T13:49:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3029ede39373c368f402a76896600d85a4f7121b'/>
<id>urn:sha1:3029ede39373c368f402a76896600d85a4f7121b</id>
<content type='text'>
In busiest-&gt;group_imb case we can come to fix_small_imbalance() with
local-&gt;avg_load &gt; busiest-&gt;avg_load. This can result in wrong imbalance
fix-up, because there is the following check there where all the
members are unsigned:

if (busiest-&gt;avg_load - local-&gt;avg_load + scaled_busy_load_per_task &gt;=
    (scaled_busy_load_per_task * imbn)) {
	env-&gt;imbalance = busiest-&gt;load_per_task;
	return;
}

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix it by substituting the subtraction with an equivalent addition in
the check.

[ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
  belonging to different cores on an HT-enabled machine with N logical
  cpus: just look at se.nr_migrations growth. ]

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/ef167822e5c5b2d96cf5b0e3e4f4bdff3f0414a2.1379252740.git.vdavydov@parallels.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/balancing: Fix 'local-&gt;avg_load &gt; sds-&gt;avg_load' case in calculate_imbalance()</title>
<updated>2013-09-20T09:59:36Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2013-09-15T13:49:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b18855500fc40da050512d9df82d2f1471e59642'/>
<id>urn:sha1:b18855500fc40da050512d9df82d2f1471e59642</id>
<content type='text'>
In busiest-&gt;group_imb case we can come to calculate_imbalance() with
local-&gt;avg_load &gt;= busiest-&gt;avg_load &gt;= sds-&gt;avg_load. This can result
in imbalance overflow, because it is calculated as follows

env-&gt;imbalance = min(
	max_pull * busiest-&gt;group_power,
	(sds-&gt;avg_load - local-&gt;avg_load) * local-&gt;group_power) / SCHED_POWER_SCALE;

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix this by skipping the assignment and assuming imbalance=0 in case
local-&gt;avg_load &gt; sds-&gt;avg_load.

[ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
  belonging to different cores on an HT-enabled machine with N logical
  cpus: just look at se.nr_migrations growth. ]

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/8f596cc6bc0e5e655119dc892c9bfcad26e971f4.1379252740.git.vdavydov@parallels.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-09-18T16:23:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-09-18T16:23:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e28b2712e5ebd8d73d25561585bc2ae77da5c30'/>
<id>urn:sha1:7e28b2712e5ebd8d73d25561585bc2ae77da5c30</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix comment for sched_info_depart
  sched/Documentation: Update sched-design-CFS.txt documentation
  sched/debug: Take PID namespace into account
  sched/fair: Fix small race where child-&gt;se.parent,cfs_rq might point to invalid ones
</content>
</entry>
<entry>
<title>sched: Fix comment for sched_info_depart</title>
<updated>2013-09-16T09:18:34Z</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2013-09-16T08:30:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13b62e46d5407c7d619aea1dc9c3e0991b631b57'/>
<id>urn:sha1:13b62e46d5407c7d619aea1dc9c3e0991b631b57</id>
<content type='text'>
sched_info_depart seems to be only called from
sched_info_switch(), so only on involuntary task switch.

Fix the comment to match.

Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Link: http://lkml.kernel.org/r/20130916083036.GA1113@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-09-12T17:44:13Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-09-12T17:44:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b55ee2816ed6d8f8a00d4badab0e3642ffbac19f'/>
<id>urn:sha1:b55ee2816ed6d8f8a00d4badab0e3642ffbac19f</id>
<content type='text'>
Pull scheduler fix from Ingo Molnar:
 "Performance regression fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix load balancing performance regression in should_we_balance()
</content>
</entry>
<entry>
<title>sched/debug: Take PID namespace into account</title>
<updated>2013-09-12T17:14:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-09-09T11:01:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc840914e9b07ab4685c195e1e54e58de4f84c03'/>
<id>urn:sha1:fc840914e9b07ab4685c195e1e54e58de4f84c03</id>
<content type='text'>
Emmanuel reported that /proc/sched_debug didn't report the right PIDs
when using namespaces, cure this.

Reported-by: Emmanuel Deloget &lt;emmanuel.deloget@efixo.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20130909110141.GM31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix small race where child-&gt;se.parent,cfs_rq might point to invalid ones</title>
<updated>2013-09-12T17:14:14Z</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2013-09-10T09:16:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6c9a27f5da9609fca46cb2b183724531b48f71ad'/>
<id>urn:sha1:6c9a27f5da9609fca46cb2b183724531b48f71ad</id>
<content type='text'>
There is a small race between copy_process() and cgroup_attach_task()
where child-&gt;se.parent,cfs_rq points to invalid (old) ones.

        parent doing fork()      | someone moving the parent to another cgroup
  -------------------------------+---------------------------------------------
    copy_process()
      + dup_task_struct()
        -&gt; parent-&gt;se is copied to child-&gt;se.
           se.parent,cfs_rq of them point to old ones.

                                     cgroup_attach_task()
                                       + cgroup_task_migrate()
                                         -&gt; parent-&gt;cgroup is updated.
                                       + cpu_cgroup_attach()
                                         + sched_move_task()
                                           + task_move_group_fair()
                                             +- set_task_rq()
                                                -&gt; se.parent,cfs_rq of parent
                                                   are updated.

      + cgroup_fork()
        -&gt; parent-&gt;cgroup is copied to child-&gt;cgroup. (*1)
      + sched_fork()
        + task_fork_fair()
          -&gt; se.parent,cfs_rq of child are accessed
             while they point to old ones. (*2)

In the worst case, this bug can lead to "use-after-free" and cause a panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).

In fact, a panic caused by this bug was originally caught in RHEL6.4.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [&lt;ffffffff81051e3e&gt;] sched_slice+0x6e/0xa0
    [...]
    Call Trace:
     [&lt;ffffffff81051f25&gt;] place_entity+0x75/0xa0
     [&lt;ffffffff81056a3a&gt;] task_fork_fair+0xaa/0x160
     [&lt;ffffffff81063c0b&gt;] sched_fork+0x6b/0x140
     [&lt;ffffffff8106c3c2&gt;] copy_process+0x5b2/0x1450
     [&lt;ffffffff81063b49&gt;] ? wake_up_new_task+0xd9/0x130
     [&lt;ffffffff8106d2f4&gt;] do_fork+0x94/0x460
     [&lt;ffffffff81072a9e&gt;] ? sys_wait4+0xae/0x100
     [&lt;ffffffff81009598&gt;] sys_clone+0x28/0x30
     [&lt;ffffffff8100b393&gt;] stub_clone+0x13/0x20
     [&lt;ffffffff8100b072&gt;] ? system_call_fastpath+0x16/0x1b

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/039601ceae06$733d3130$59b79390$@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix load balancing performance regression in should_we_balance()</title>
<updated>2013-09-10T07:20:42Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2013-09-10T06:54:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0cff9d88ce2f3030f73138078c5b1019f17e1cc'/>
<id>urn:sha1:b0cff9d88ce2f3030f73138078c5b1019f17e1cc</id>
<content type='text'>
Commit 23f0d20 ("sched: Factor out code to should_we_balance()")
introduces the should_we_balance() function.  This function should
return 1 if this cpu is appropriate for balancing. But the newly
introduced code doesn't do so, it returns 0 instead of 1.

This introduces performance regression, reported by Dave Chinner:

                        v4 filesystem           v5 filesystem
3.11+xfsdev:            220k files/s            225k files/s
3.12-git                180k files/s            185k files/s
3.12-git-revert         245k files/s            247k files/s

You can find more detailed information at:

  https://lkml.org/lkml/2013/9/10/1

This patch corrects the return value of should_we_balance()
function as orignally intended.

With this patch, Dave Chinner reports that the regression is gone:

                        v4 filesystem           v5 filesystem
3.11+xfsdev:            220k files/s            225k files/s
3.12-git                180k files/s            185k files/s
3.12-git-revert         245k files/s            247k files/s
3.12-git-fix            249k files/s            248k files/s

Reported-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Tested-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Link: http://lkml.kernel.org/r/20130910065448.GA20368@lge.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-09-05T19:36:46Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-09-05T19:36:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=57d730924d5cc2c3e280af16a9306587c3a511db'/>
<id>urn:sha1:57d730924d5cc2c3e280af16a9306587c3a511db</id>
<content type='text'>
Pull cputime fix from Ingo Molnar:
 "This fixes a longer-standing cputime accounting bug that Stanislaw
  Gruszka finally managed to track down"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cputime: Do not scale when utime == 0
</content>
</entry>
</feed>
