<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched_rt.c, branch v2.6.37</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.37</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.37'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2010-10-18T18:52:26Z</updated>
<entry>
<title>sched: Do not account irq time to current task</title>
<updated>2010-10-18T18:52:26Z</updated>
<author>
<name>Venkatesh Pallipadi</name>
<email>venki@google.com</email>
</author>
<published>2010-10-05T00:03:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=305e6835e05513406fa12820e40e4a8ecb63743c'/>
<id>urn:sha1:305e6835e05513406fa12820e40e4a8ecb63743c</id>
<content type='text'>
Scheduler accounts both softirq and interrupt processing times to the
currently running task. This means, if the interrupt processing was
for some other task in the system, then the current task ends up being
penalized as it gets shorter runtime than otherwise.

Change sched task accounting to acoount only actual task time from
currently running task. Now update_curr(), modifies the delta_exec to
depend on rq-&gt;clock_task.

Note that this change only handles CONFIG_IRQ_TIME_ACCOUNTING case. We can
extend this to CONFIG_VIRT_CPU_ACCOUNTING with minimal effort. But, thats
for later.

This change will impact scheduling behavior in interrupt heavy conditions.

Tested on a 4-way system with eth0 handled by CPU 2 and a network heavy
task (nc) running on CPU 3 (and no RSS/RFS). With that I have CPU 2
spending 75%+ of its time in irq processing. CPU 3 spending around 35%
time running nc task.

Now, if I run another CPU intensive task on CPU 2, without this change
/proc/&lt;pid&gt;/schedstat shows 100% of time accounted to this task. With this
change, it rightly shows less than 25% accounted to this task as remaining
time is actually spent on irq processing.

Signed-off-by: Venkatesh Pallipadi &lt;venki@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;1286237003-12406-7-git-send-email-venki@google.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Unindent labels</title>
<updated>2010-10-18T16:41:56Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2010-10-17T19:46:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4924627423d5e286136ad2520f5be536345ae590'/>
<id>urn:sha1:4924627423d5e286136ad2520f5be536345ae590</id>
<content type='text'>
Labels should be on column 0.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Give CPU bound RT tasks preference</title>
<updated>2010-09-21T11:57:12Z</updated>
<author>
<name>Steven Rostedt</name>
<email>srostedt@redhat.com</email>
</author>
<published>2010-09-21T02:40:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3bc211cfe7d5fe94b310480d78e00bea96fbf2a'/>
<id>urn:sha1:b3bc211cfe7d5fe94b310480d78e00bea96fbf2a</id>
<content type='text'>
If a high priority task is waking up on a CPU that is running a
lower priority task that is bound to a CPU, see if we can move the
high RT task to another CPU first. Note, if all other CPUs are
running higher priority tasks than the CPU bounded current task,
then it will be preempted regardless.

Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Gregory Haskins &lt;ghaskins@novell.com&gt;
LKML-Reference: &lt;20100921024138.888922071@goodmis.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Try not to migrate higher priority RT tasks</title>
<updated>2010-09-21T11:57:12Z</updated>
<author>
<name>Steven Rostedt</name>
<email>srostedt@redhat.com</email>
</author>
<published>2010-09-21T02:40:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43fa5460fe60dea5c610490a1d263415419c60f6'/>
<id>urn:sha1:43fa5460fe60dea5c610490a1d263415419c60f6</id>
<content type='text'>
When first working on the RT scheduler design, we concentrated on
keeping all CPUs running RT tasks instead of having multiple RT
tasks on a single CPU waiting for the migration thread to move
them. Instead we take a more proactive stance and push or pull RT
tasks from one CPU to another on wakeup or scheduling.

When an RT task wakes up on a CPU that is running another RT task,
instead of preempting it and killing the cache of the running RT
task, we look to see if we can migrate the RT task that is waking
up, even if the RT task waking up is of higher priority.

This may sound a bit odd, but RT tasks should be limited in
migration by the user anyway. But in practice, people do not do
this, which causes high prio RT tasks to bounce around the CPUs.
This becomes even worse when we have priority inheritance, because
a high prio task can block on a lower prio task and boost its
priority. When the lower prio task wakes up the high prio task, if
it happens to be on the same CPU it will migrate off of it.

But in reality, the above does not happen much either, because the
wake up of the lower prio task, which has already been boosted, if
it was on the same CPU as the higher prio task, it would then
migrate off of it. But anyway, we do not want to migrate them
either.

To examine the scheduling, I created a test program and examined it
under kernelshark. The test program created CPU * 2 threads, where
each thread had a different priority. The program takes different
options. The options used in this change log was to have priority
inheritance mutexes or not.

All threads did the following loop:

static void grab_lock(long id, int iter, int l)
{
	ftrace_write("thread %ld iter %d, taking lock %d\n",
		     id, iter, l);
	pthread_mutex_lock(&amp;locks[l]);
	ftrace_write("thread %ld iter %d, took lock %d\n",
		     id, iter, l);
	busy_loop(nr_tasks - id);
	ftrace_write("thread %ld iter %d, unlock lock %d\n",
		     id, iter, l);
	pthread_mutex_unlock(&amp;locks[l]);
}

void *start_task(void *id)
{
	[...]
	while (!done) {
		for (l = 0; l &lt; nr_locks; l++) {
			grab_lock(id, i, l);
			ftrace_write("thread %ld iter %d sleeping\n",
				     id, i);
			ms_sleep(id);
		}
		i++;
	}
	[...]
}

The busy_loop(ms) keeps the CPU spinning for ms milliseconds. The
ms_sleep(ms) sleeps for ms milliseconds. The ftrace_write() writes
to the ftrace buffer to help analyze via ftrace.

The higher the id, the higher the prio, the shorter it does the
busy loop, but the longer it spins. This is usually the case with
RT tasks, the lower priority tasks usually run longer than higher
priority tasks.

At the end of the test, it records the number of loops each thread
took, as well as the number of voluntary preemptions, non-voluntary
preemptions, and number of migrations each thread took, taking the
information from /proc/$$/sched and /proc/$$/status.

Running this on a 4 CPU processor, the results without changes to
the kernel looked like this:

Task        vol    nonvol   migrated     iterations
----        ---    ------   --------     ----------
  0:         53      3220       1470             98
  1:        562       773        724             98
  2:        752       933       1375             98
  3:        749        39        697             98
  4:        758         5        515             98
  5:        764         2        679             99
  6:        761         2        535             99
  7:        757         3        346             99

total:     5156       4977      6341            787

Each thread regardless of priority migrated a few hundred times.
The higher priority tasks, were a little better but still took
quite an impact.

By letting higher priority tasks bump the lower prio task from the
CPU, things changed a bit:

Task        vol    nonvol   migrated     iterations
----        ---    ------   --------     ----------
  0:         37      2835       1937             98
  1:        666      1821       1865             98
  2:        654      1003       1385             98
  3:        664       635        973             99
  4:        698       197        352             99
  5:        703       101        159             99
  6:        708         1         75             99
  7:        713         1          2             99

total:     4843       6594      6748            789

The total # of migrations did not change (several runs showed the
difference all within the noise). But we now see a dramatic
improvement to the higher priority tasks. (kernelshark showed that
the watchdog timer bumped the highest priority task to give it the
2 count. This was actually consistent with every run).

Notice that the # of iterations did not change either.

The above was with priority inheritance mutexes. That is, when the
higher prority task blocked on a lower priority task, the lower
priority task would inherit the higher priority task (which shows
why task 6 was bumped so many times). When not using priority
inheritance mutexes, the current kernel shows this:

Task        vol    nonvol   migrated     iterations
----        ---    ------   --------     ----------
  0:         56      3101       1892             95
  1:        594       713        937             95
  2:        625       188        618             95
  3:        628         4        491             96
  4:        640         7        468             96
  5:        631         2        501             96
  6:        641         1        466             96
  7:        643         2        497             96

total:     4458       4018      5870            765

Not much changed with or without priority inheritance mutexes. But
if we let the high priority task bump lower priority tasks on
wakeup we see:

Task        vol    nonvol   migrated     iterations
----        ---    ------   --------     ----------
  0:        115      3439       2782             98
  1:        633      1354       1583             99
  2:        652       919       1218             99
  3:        645       713        934             99
  4:        690         3          3             99
  5:        694         1          4             99
  6:        720         3          4             99
  7:        747         0          1            100

Which shows a even bigger change. The big difference between task 3
and task 4 is because we have only 4 CPUs on the machine, causing
the 4 highest prio tasks to always have preference.

Although I did not measure cache misses, and I'm sure there would
be little to measure since the test was not data intensive, I could
imagine large improvements for higher priority tasks when dealing
with lower priority tasks. Thus, I'm satisfied with making the
change and agreeing with what Gregory Haskins argued a few years
ago when we first had this discussion.

One final note. All tasks in the above tests were RT tasks. Any RT
task will always preempt a non RT task that is running on the CPU
the RT task wants to run on.

Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Gregory Haskins &lt;ghaskins@novell.com&gt;
LKML-Reference: &lt;20100921024138.605460343@goodmis.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: task_tick_rt: Remove the obsolete -&gt;signal != NULL check</title>
<updated>2010-06-18T08:46:56Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-06-10T23:09:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c32b4fce799d3a6157df9048d03e429956c58818'/>
<id>urn:sha1:c32b4fce799d3a6157df9048d03e429956c58818</id>
<content type='text'>
Remove the obsolete -&gt;signal != NULL check in watchdog().
Since ea6d290c -&gt;signal can't be NULL.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100610230948.GA25911@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Add enqueue/dequeue flags</title>
<updated>2010-04-02T18:12:05Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2010-03-24T15:38:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=371fd7e7a56a5c136d31aa980011bd2f131c3ef5'/>
<id>urn:sha1:371fd7e7a56a5c136d31aa980011bd2f131c3ef5</id>
<content type='text'>
In order to reduce the dependency on TASK_WAKING rework the enqueue
interface to support a proper flags field.

Replace the int wakeup, bool head arguments with an int flags argument
and create the following flags:

  ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
  ENQUEUE_WAKING - the enqueue has relative vruntime due to
                   having sched_class::task_waking() called,
  ENQUEUE_HEAD - the waking task should be places on the head
                 of the priority queue (where appropriate).

For symmetry also convert sched_class::dequeue() to a flags scheme.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Fix TASK_WAKING vs fork deadlock</title>
<updated>2010-04-02T18:12:03Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2010-03-24T17:34:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0017d735092844118bef006696a750a0e4ef6ebd'/>
<id>urn:sha1:0017d735092844118bef006696a750a0e4ef6ebd</id>
<content type='text'>
Oleg noticed a few races with the TASK_WAKING usage on fork.

 - since TASK_WAKING is basically a spinlock, it should be IRQ safe
 - since we set TASK_WAKING (*) without holding rq-&gt;lock it could
   be there still is a rq-&gt;lock holder, thereby not actually
   providing full serialization.

(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.

Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().

Cure the first by holding rq-&gt;lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq-&gt;lock release into select_task_rq_fair()'s cgroup stuff.

Because select_task_rq_fair() still needs to drop the rq-&gt;lock we
cannot fully get rid of TASK_WAKING.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>Merge branch 'linus' into sched/core</title>
<updated>2010-04-02T18:03:08Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2010-04-02T18:02:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c9494727cf293ae2ec66af57547a3e79c724fec2'/>
<id>urn:sha1:c9494727cf293ae2ec66af57547a3e79c724fec2</id>
<content type='text'>
Merge reason: update to latest upstream

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2010-03-13T22:46:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-03-13T22:46:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=80a186074e72e2cd61f6716d90cf32ce54981a56'/>
<id>urn:sha1:80a186074e72e2cd61f6716d90cf32ce54981a56</id>
<content type='text'>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix pick_next_highest_task_rt() for cgroups
  sched: Cleanup: remove unused variable in try_to_wake_up()
  x86: Fix sched_clock_cpu for systems with unsynchronized TSC
</content>
</entry>
<entry>
<title>sched: Implement group scheduler statistics in one struct</title>
<updated>2010-03-11T14:22:28Z</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.de.marchi@gmail.com</email>
</author>
<published>2010-03-11T02:37:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=41acab8851a0408c1d5ad6c21a07456f88b54d40'/>
<id>urn:sha1:41acab8851a0408c1d5ad6c21a07456f88b54d40</id>
<content type='text'>
Put all statistic fields of sched_entity in one struct, sched_statistics,
and embed it into sched_entity.

This change allows to memset the sched_statistics to 0 when needed (for
instance when forking), avoiding bugs of non initialized fields.

Signed-off-by: Lucas De Marchi &lt;lucas.de.marchi@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;1268275065-18542-1-git-send-email-lucas.de.marchi@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
