<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v3.6</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.6</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.6'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-09-16T19:29:43Z</updated>
<entry>
<title>Revert "sched: Improve scalability via 'CPU buddies', which withstand random perturbations"</title>
<updated>2012-09-16T19:29:43Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-09-16T19:29:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=37407ea7f93864c2cfc03edf8f37872ec539ea2b'/>
<id>urn:sha1:37407ea7f93864c2cfc03edf8f37872ec539ea2b</id>
<content type='text'>
This reverts commit 970e178985cadbca660feb02f4d2ee3a09f7fdda.

Nikolay Ulyanitsky reported thatthe 3.6-rc5 kernel has a 15-20%
performance drop on PostgreSQL 9.2 on his machine (running "pgbench").

Borislav Petkov was able to reproduce this, and bisected it to this
commit 970e178985ca ("sched: Improve scalability via 'CPU buddies' ...")
apparently because the new single-idle-buddy model simply doesn't find
idle CPU's to reschedule on aggressively enough.

Mike Galbraith suspects that it is likely due to the user-mode spinlocks
in PostgreSQL not reacting well to preemption, but we don't really know
the details - I'll just revert the commit for now.

There are hopefully other approaches to improve scheduler scalability
without it causing these kinds of downsides.

Reported-by: Nikolay Ulyanitsky &lt;lystor@gmail.com&gt;
Bisected-by: Borislav Petkov &lt;bp@alien8.de&gt;
Acked-by: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix kernel-doc warnings in kernel/sched/fair.c</title>
<updated>2012-09-04T12:30:49Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@xenotime.net</email>
</author>
<published>2012-08-19T00:45:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9450d57eab5cad36774c297da123062744472588'/>
<id>urn:sha1:9450d57eab5cad36774c297da123062744472588</id>
<content type='text'>
Fix two kernel-doc warnings in kernel/sched/fair.c:

  Warning(kernel/sched/fair.c:3660): Excess function parameter 'cpus' description in 'update_sg_lb_stats'
  Warning(kernel/sched/fair.c:3806): Excess function parameter 'cpus' description in 'update_sd_lb_stats'

Signed-off-by: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/50303714.3090204@xenotime.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Unthrottle rt runqueues in __disable_runtime()</title>
<updated>2012-09-04T12:30:30Z</updated>
<author>
<name>Peter Boonstoppel</name>
<email>pboonstoppel@nvidia.com</email>
</author>
<published>2012-08-09T22:34:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a4c96ae319b8047f62dedbe1eac79e321c185749'/>
<id>urn:sha1:a4c96ae319b8047f62dedbe1eac79e321c185749</id>
<content type='text'>
migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Signed-off-by: Peter Boonstoppel &lt;pboonstoppel@nvidia.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix load avg vs cpu-hotplug</title>
<updated>2012-09-04T12:30:18Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2012-08-20T09:26:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f319da0c6894fcf55e21320e40506418a2aad629'/>
<id>urn:sha1:f319da0c6894fcf55e21320e40506418a2aad629</id>
<content type='text'>
Rabik and Paul reported two different issues related to the same few
lines of code.

Rabik's issue is that the nr_uninterruptible migration code is wrong in
that he sees artifacts due to this (Rabik please do expand in more
detail).

Paul's issue is that this code as it stands relies on us using
stop_machine() for unplug, we all would like to remove this assumption
so that eventually we can remove this stop_machine() usage altogether.

The only reason we'd have to migrate nr_uninterruptible is so that we
could use for_each_online_cpu() loops in favour of
for_each_possible_cpu() loops, however since nr_uninterruptible() is the
only such loop and its using possible lets not bother at all.

The problem Rabik sees is (probably) caused by the fact that by
migrating nr_uninterruptible we screw rq-&gt;calc_load_active for both rqs
involved.

So don't bother with fancy migration schemes (meaning we now have to
keep using for_each_possible_cpu()) and instead fold any nr_active delta
after we migrate all tasks away to make sure we don't have any skewed
nr_active accounting.

Reported-by: Rakib Mullick &lt;rakib.mullick@gmail.com&gt;
Reported-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1345454817.23018.27.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-08-20T17:35:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-08-20T17:35:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53795ced6e270fbb5cef7b527a71ffbb69657c78'/>
<id>urn:sha1:53795ced6e270fbb5cef7b527a71ffbb69657c78</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix migration thread runtime bogosity
  sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled
  sched,cgroup: Fix up task_groups list
  sched: fix divide by zero at {thread_group,task}_times
  sched, cgroup: Reduce rq-&gt;lock hold times for large cgroup hierarchies
</content>
</entry>
<entry>
<title>sched: Fix migration thread runtime bogosity</title>
<updated>2012-08-13T16:41:55Z</updated>
<author>
<name>Mike Galbraith</name>
<email>mgalbraith@suse.de</email>
</author>
<published>2012-08-04T03:44:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f6189684eb4e85e6c593cd710693f09c944450a'/>
<id>urn:sha1:8f6189684eb4e85e6c593cd710693f09c944450a</id>
<content type='text'>
Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained -&gt;se.exec_start.  The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU   PID USER     CMD
99.3    45 root     [migration/10]
97.7    53 root     [migration/12]
97.0    57 root     [migration/13]
90.1    49 root     [migration/11]
89.6    65 root     [migration/15]
88.7    17 root     [migration/3]
80.4    37 root     [migration/8]
78.1    41 root     [migration/9]
44.2    13 root     [migration/2]

Signed-off-by: Mike Galbraith &lt;mgalbraith@suse.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled</title>
<updated>2012-08-13T16:41:55Z</updated>
<author>
<name>Mike Galbraith</name>
<email>efault@gmx.de</email>
</author>
<published>2012-08-07T08:02:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e221d028bb08b47e624c5f0a31732c642db9d19a'/>
<id>urn:sha1:e221d028bb08b47e624c5f0a31732c642db9d19a</id>
<content type='text'>
Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.

Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>sched,cgroup: Fix up task_groups list</title>
<updated>2012-08-13T16:41:54Z</updated>
<author>
<name>Mike Galbraith</name>
<email>efault@gmx.de</email>
</author>
<published>2012-08-07T03:00:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=35cf4e50b16331def6cfcbee11e49270b6db07f5'/>
<id>urn:sha1:35cf4e50b16331def6cfcbee11e49270b6db07f5</id>
<content type='text'>
With multiple instances of task_groups, for_each_rt_rq() is a noop,
no task groups having been added to the rt.c list instance.  This
renders __enable/disable_runtime() and print_rt_stats() noop, the
user (non) visible effect being that rt task groups are missing in
/proc/sched_debug.

Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: stable@kernel.org # v3.3+
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>sched: fix divide by zero at {thread_group,task}_times</title>
<updated>2012-08-13T16:41:54Z</updated>
<author>
<name>Stanislaw Gruszka</name>
<email>sgruszka@redhat.com</email>
</author>
<published>2012-08-08T09:27:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bea6832cc8c4a0a9a65dd17da6aaa657fe27bc3e'/>
<id>urn:sha1:bea6832cc8c4a0a9a65dd17da6aaa657fe27bc3e</id>
<content type='text'>
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   #3 [ffff880472a51cd0] die at ffffffff8100f26b
   #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   #8 [ffff880472a51f40] sys_times at ffffffff81088524
   #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>sched, cgroup: Reduce rq-&gt;lock hold times for large cgroup hierarchies</title>
<updated>2012-08-13T16:41:54Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-08-08T19:46:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a35b6466aabb051568b844e8c63f87a356d3d129'/>
<id>urn:sha1:a35b6466aabb051568b844e8c63f87a356d3d129</id>
<content type='text'>
Peter Portante reported that for large cgroup hierarchies (and or on
large CPU counts) we get immense lock contention on rq-&gt;lock and stuff
stops working properly.

His workload was a ton of processes, each in their own cgroup,
everybody idling except for a sporadic wakeup once every so often.

It was found that:

  schedule()
    idle_balance()
      load_balance()
        local_irq_save()
        double_rq_lock()
        update_h_load()
          walk_tg_tree(tg_load_down)
            tg_load_down()

Results in an entire cgroup hierarchy walk under rq-&gt;lock for every
new-idle balance and since new-idle balance isn't throttled this
results in a lot of work while holding the rq-&gt;lock.

This patch does two things, it removes the work from under rq-&gt;lock
based on the good principle of race and pray which is widely employed
in the load-balancer as a whole. And secondly it throttles the
update_h_load() calculation to max once per jiffy.

I considered excluding update_h_load() for new-idle balance
all-together, but purely relying on regular balance passes to update
this data might not work out under some rare circumstances where the
new-idle busiest isn't the regular busiest for a while (unlikely, but
a nightmare to debug if someone hits it and suffers).

Cc: pjt@google.com
Cc: Larry Woodman &lt;lwoodman@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Reported-by: Peter Portante &lt;pportant@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
</feed>
