<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v3.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-02-06T21:34:26Z</updated>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2015-02-06T21:34:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-02-06T21:34:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=396e9099ea5b4bc442995c807a779025d06663e8'/>
<id>urn:sha1:396e9099ea5b4bc442995c807a779025d06663e8</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix deadline parameter modification handling
  sched/wait: Remove might_sleep() from wait_event_cmd()
  sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask
  sched/fair: Avoid using uninitialized variable in preferred_group_nid()
</content>
</entry>
<entry>
<title>sched/deadline: Fix deadline parameter modification handling</title>
<updated>2015-02-04T06:42:48Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-01-28T14:08:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=40767b0dc768060266d261b4a330164b4be53f7c'/>
<id>urn:sha1:40767b0dc768060266d261b4a330164b4be53f7c</id>
<content type='text'>
Commit 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to
use in switched_from_dl()") removed the hrtimer_try_cancel() function
call out from init_dl_task_timer(), which gets called from
__setparam_dl().

The result is that we can now re-init the timer while its active --
this is bad and corrupts timer state.

Furthermore; changing the parameters of an active deadline task is
tricky in that you want to maintain guarantees, while immediately
effective change would allow one to circumvent the CBS guarantees --
this too is bad, as one (bad) task should not be able to affect the
others.

Rework things to avoid both problems. We only need to initialize the
timer once, so move that to __sched_fork() for new tasks.

Then make sure __setparam_dl() doesn't affect the current running
state but only updates the parameters used to calculate the next
scheduling period -- this guarantees the CBS functions as expected
(albeit slightly pessimistic).

This however means we need to make sure __dl_clear_params() needs to
reset the active state otherwise new (and tasks flipping between
classes) will not properly (re)compute their first instance.

Todo: close class flipping CBS hole.
Todo: implement delayed BW release.

Reported-by: Luca Abeni &lt;luca.abeni@unitn.it&gt;
Acked-by: Juri Lelli &lt;juri.lelli@arm.com&gt;
Tested-by: Luca Abeni &lt;luca.abeni@unitn.it&gt;
Fixes: 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Kirill Tkhai &lt;tkhai@yandex.ru&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20150128140803.GF23038@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: don't cause task state changes in nested sleep debugging</title>
<updated>2015-02-01T20:23:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-02-01T20:23:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=00845eb968ead28007338b2bb852b8beef816583'/>
<id>urn:sha1:00845eb968ead28007338b2bb852b8beef816583</id>
<content type='text'>
Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont.  But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

 - don't set TASK_RUNNING when the nested condition happens (note: even
   if WARN_ONCE() only _warns_ once, the return value isn't whether the
   warning happened, but whether the condition for the warning was true.
   So despite the warning only happening once, the "if (WARN_ON(..))"
   would trigger for every nested sleep.

 - in the cases where we knowingly disable the warning by using
   "sched_annotate_sleep()", don't change the task state (that is used
   for all core scheduling decisions), instead use '-&gt;task_state_change'
   that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)

Reported-and-bisected-by: Bruno Prémont &lt;bonbons@linux-vserver.org&gt;
Tested-by: Rafael J Wysocki &lt;rjw@rjwysocki.net&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;,
Cc: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;,
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;,
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;,
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask</title>
<updated>2015-01-28T14:28:15Z</updated>
<author>
<name>Mike Galbraith</name>
<email>umgwanakikbuti@gmail.com</email>
</author>
<published>2015-01-28T03:53:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bb2bc55a694d45cdeda91b6f28ab2adec28125ef'/>
<id>urn:sha1:bb2bc55a694d45cdeda91b6f28ab2adec28125ef</id>
<content type='text'>
While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink()
an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it:

 CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19
 Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
 task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000
 RIP: 0010:[&lt;ffffffff81073846&gt;]  [&lt;ffffffff81073846&gt;] cpuset_cpumask_can_shrink+0x56/0xb0
 [...]
 Call Trace:
  [&lt;ffffffff810cb82a&gt;] validate_change+0x18a/0x200
  [&lt;ffffffff810cc877&gt;] cpuset_write_resmask+0x3b7/0x720
  [&lt;ffffffff810c4d58&gt;] cgroup_file_write+0x38/0x100
  [&lt;ffffffff811d953a&gt;] kernfs_fop_write+0x12a/0x180
  [&lt;ffffffff8116e1a3&gt;] vfs_write+0xb3/0x1d0
  [&lt;ffffffff8116ed06&gt;] SyS_write+0x46/0xb0
  [&lt;ffffffff8159ced6&gt;] system_call_fastpath+0x16/0x1b

Signed-off-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Fixes: f82f80426f7a ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Avoid using uninitialized variable in preferred_group_nid()</title>
<updated>2015-01-28T12:14:12Z</updated>
<author>
<name>Jan Beulich</name>
<email>JBeulich@suse.com</email>
</author>
<published>2015-01-23T08:25:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=81907478c4311a679849216abf723999184ab984'/>
<id>urn:sha1:81907478c4311a679849216abf723999184ab984</id>
<content type='text'>
At least some gcc versions - validly afaict - warn about potentially
using max_group uninitialized: There's no way the compiler can prove
that the body of the conditional where it and max_faults get set/
updated gets executed; in fact, without knowing all the details of
other scheduler code, I can't prove this either.

Generally the necessary change would appear to be to clear max_group
prior to entering the inner loop, and break out of the outer loop when
it ends up being all clear after the inner one. This, however, seems
inefficient, and afaict the same effect can be achieved by exiting the
outer loop when max_faults is still zero after the inner loop.

[ mingo: changed the solution to zero initialization: uninitialized_var()
  needs to die, as it's an actively dangerous construct: if in the future
  a known-proven-good piece of code is changed to have a true, buggy
  uninitialized variable, the compiler warning is then supressed...

  The better long term solution is to clean up the code flow, so that
  even simple minded compilers (and humans!) are able to read it without
  getting a headache.  ]

Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/54C2139202000078000588F7@mail.emea.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()</title>
<updated>2015-01-09T10:19:00Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2014-12-25T06:51:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f1a169b88f513e32a432ca0f85bfd282d117bd6'/>
<id>urn:sha1:7f1a169b88f513e32a432ca0f85bfd282d117bd6</id>
<content type='text'>
When alloc_fair_sched_group() in sched_create_group() fails,
free_sched_group() is called, and free_fair_sched_group() is called by
free_sched_group(). Since destroy_cfs_bandwidth() is called by
free_fair_sched_group() without calling init_cfs_bandwidth(),
RCU stall occurs at hrtimer_cancel():

  INFO: rcu_sched self-detected stall on CPU { 1}  (t=60000 jiffies g=13074 c=13073 q=0)
  Task dump for CPU 1:
  (fprintd)       R  running task        0  6249      1 0x00000088
  ...
  Call Trace:
   &lt;IRQ&gt;  [&lt;ffffffff81094988&gt;] sched_show_task+0xa8/0x110
   [&lt;ffffffff81097acd&gt;] dump_cpu_task+0x3d/0x50
   [&lt;ffffffff810c3a80&gt;] rcu_dump_cpu_stacks+0x90/0xd0
   [&lt;ffffffff810c7751&gt;] rcu_check_callbacks+0x491/0x700
   [&lt;ffffffff810cbf2b&gt;] update_process_times+0x4b/0x80
   [&lt;ffffffff810db046&gt;] tick_sched_handle.isra.20+0x36/0x50
   [&lt;ffffffff810db0a2&gt;] tick_sched_timer+0x42/0x70
   [&lt;ffffffff810ccb19&gt;] __run_hrtimer+0x69/0x1a0
   [&lt;ffffffff810db060&gt;] ? tick_sched_handle.isra.20+0x50/0x50
   [&lt;ffffffff810ccedf&gt;] hrtimer_interrupt+0xef/0x230
   [&lt;ffffffff810452cb&gt;] local_apic_timer_interrupt+0x3b/0x70
   [&lt;ffffffff8164a465&gt;] smp_apic_timer_interrupt+0x45/0x60
   [&lt;ffffffff816485bd&gt;] apic_timer_interrupt+0x6d/0x80
   &lt;EOI&gt;  [&lt;ffffffff810cc588&gt;] ? lock_hrtimer_base.isra.23+0x18/0x50
   [&lt;ffffffff81193cf1&gt;] ? __kmalloc+0x211/0x230
   [&lt;ffffffff810cc9d2&gt;] hrtimer_try_to_cancel+0x22/0xd0
   [&lt;ffffffff81193cf1&gt;] ? __kmalloc+0x211/0x230
   [&lt;ffffffff810ccaa2&gt;] hrtimer_cancel+0x22/0x30
   [&lt;ffffffff810a3cb5&gt;] free_fair_sched_group+0x25/0xd0
   [&lt;ffffffff8108df46&gt;] free_sched_group+0x16/0x40
   [&lt;ffffffff810971bb&gt;] sched_create_group+0x4b/0x80
   [&lt;ffffffff810aa383&gt;] sched_autogroup_create_attach+0x43/0x1c0
   [&lt;ffffffff8107dc9c&gt;] sys_setsid+0x7c/0x110
   [&lt;ffffffff81647729&gt;] system_call_fastpath+0x12/0x17

Check whether init_cfs_bandwidth() was called before calling
destroy_cfs_bandwidth().

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
[ Move the check into destroy_cfs_bandwidth() to aid compilability. ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Avoid double-accounting in case of missed deadlines</title>
<updated>2015-01-09T10:18:57Z</updated>
<author>
<name>Luca Abeni</name>
<email>luca.abeni@unitn.it</email>
</author>
<published>2014-12-17T10:50:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=269ad8015a6b2bb1cf9e684da4921eb6fa0a0c88'/>
<id>urn:sha1:269ad8015a6b2bb1cf9e684da4921eb6fa0a0c88</id>
<content type='text'>
The dl_runtime_exceeded() function is supposed to ckeck if
a SCHED_DEADLINE task must be throttled, by checking if its
current runtime is &lt;= 0. However, it also checks if the
scheduling deadline has been missed (the current time is
larger than the current scheduling deadline), further
decreasing the runtime if this happens.
This "double accounting" is wrong:

- In case of partitioned scheduling (or single CPU), this
  happens if task_tick_dl() has been called later than expected
  (due to small HZ values). In this case, the current runtime is
  also negative, and replenish_dl_entity() can take care of the
  deadline miss by recharging the current runtime to a value smaller
  than dl_runtime

- In case of global scheduling on multiple CPUs, scheduling
  deadlines can be missed even if the task did not consume more
  runtime than expected, hence penalizing the task is wrong

This patch fix this problem by throttling a SCHED_DEADLINE task
only when its runtime becomes negative, and not modifying the runtime

Signed-off-by: Luca Abeni &lt;luca.abeni@unitn.it&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Dario Faggioli &lt;raistlin@linux.it&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix migration of SCHED_DEADLINE tasks</title>
<updated>2015-01-09T10:18:56Z</updated>
<author>
<name>Luca Abeni</name>
<email>luca.abeni@unitn.it</email>
</author>
<published>2014-12-17T10:50:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6a503c3be937d275113b702e0421e5b0720abe8a'/>
<id>urn:sha1:6a503c3be937d275113b702e0421e5b0720abe8a</id>
<content type='text'>
According to global EDF, tasks should be migrated between runqueues
without checking if their scheduling deadlines and runtimes are valid.
However, SCHED_DEADLINE currently performs such a check:
a migration happens doing:

	deactivate_task(rq, next_task, 0);
	set_task_cpu(next_task, later_rq-&gt;cpu);
	activate_task(later_rq, next_task, 0);

which ends up calling dequeue_task_dl(), setting the new CPU, and then
calling enqueue_task_dl().

enqueue_task_dl() then calls enqueue_dl_entity(), which calls
update_dl_entity(), which can modify scheduling deadline and runtime,
breaking global EDF scheduling.

As a result, some of the properties of global EDF are not respected:
for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
two cores can have unbounded response times for the third task even
if 30/80+40/80+120/170 = 1.5809 &lt; 2

This can be fixed by invoking update_dl_entity() only in case of
wakeup, or if this is a new SCHED_DEADLINE task.

Signed-off-by: Luca Abeni &lt;luca.abeni@unitn.it&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Dario Faggioli &lt;raistlin@linux.it&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix odd values in effective_load() calculations</title>
<updated>2015-01-09T10:18:54Z</updated>
<author>
<name>Yuyang Du</name>
<email>yuyang.du@intel.com</email>
</author>
<published>2014-12-19T00:29:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=32a8df4e0b33fccc9715213b382160415b5c4008'/>
<id>urn:sha1:32a8df4e0b33fccc9715213b382160415b5c4008</id>
<content type='text'>
In effective_load, we have (long w * unsigned long tg-&gt;shares) / long W,
when w is negative, it is cast to unsigned long and hence the product is
insanely large. Fix this by casting tg-&gt;shares to long.

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Yuyang Du &lt;yuyang.du@intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Andrey Ryabinin &lt;a.ryabinin@samsung.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation</title>
<updated>2014-12-23T10:43:48Z</updated>
<author>
<name>Alex Thorlton</name>
<email>athorlton@sgi.com</email>
</author>
<published>2014-12-18T18:44:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b74e6278fd6db5848163ccdc6e9d8eb6efdee9bd'/>
<id>urn:sha1:b74e6278fd6db5848163ccdc6e9d8eb6efdee9bd</id>
<content type='text'>
When allocating space for load_balance_mask, in sched_init, when
CPUMASK_OFFSTACK is set, we've managed to spill over
KMALLOC_MAX_SIZE on our 6144 core machine.  The patch below
breaks up the allocations so that they don't overflow the max
alloc size.  It also allocates the masks on the the node from
which they'll most commonly be accessed, to minimize remote
accesses on NUMA machines.

Suggested-by: George Beshers &lt;gbeshers@sgi.com&gt;
Signed-off-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: George Beshers &lt;gbeshers@sgi.com&gt;
Cc: Russ Anderson &lt;rja@sgi.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1418928270-148543-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
