<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/debug.c, branch v4.20</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.20</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-09-10T08:13:45Z</updated>
<entry>
<title>sched/debug: Fix potential deadlock when writing to sched_features</title>
<updated>2018-09-10T08:13:45Z</updated>
<author>
<name>Jiada Wang</name>
<email>jiada_wang@mentor.com</email>
</author>
<published>2018-07-31T12:12:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e73e81975f2447e6f556100cada64a18ec631cbb'/>
<id>urn:sha1:e73e81975f2447e6f556100cada64a18ec631cbb</id>
<content type='text'>
The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features:

  ======================================================
  WARNING: possible circular locking dependency detected
  4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Not tainted
  ------------------------------------------------------
  sh/3358 is trying to acquire lock:
  000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30
  but task is already holding lock:
  00000000c1b31a88 (&amp;sb-&gt;s_type-&gt;i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
  which lock already depends on the new lock.
  the existing dependency chain (in reverse order) is:
  -&gt; #3 (&amp;sb-&gt;s_type-&gt;i_mutex_key#3){+.+.}:
         lock_acquire+0xb8/0x148
         down_write+0xac/0x140
         start_creating+0x5c/0x168
         debugfs_create_dir+0x18/0x220
         opp_debug_register+0x8c/0x120
         _add_opp_dev+0x104/0x1f8
         dev_pm_opp_get_opp_table+0x174/0x340
         _of_add_opp_table_v2+0x110/0x760
         dev_pm_opp_of_add_table+0x5c/0x240
         dev_pm_opp_of_cpumask_add_table+0x5c/0x100
         cpufreq_init+0x160/0x430
         cpufreq_online+0x1cc/0xe30
         cpufreq_add_dev+0x78/0x198
         subsys_interface_register+0x168/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -&gt; #2 (opp_table_lock){+.+.}:
         lock_acquire+0xb8/0x148
         __mutex_lock+0x104/0xf50
         mutex_lock_nested+0x1c/0x28
         _of_add_opp_table_v2+0xb4/0x760
         dev_pm_opp_of_add_table+0x5c/0x240
         dev_pm_opp_of_cpumask_add_table+0x5c/0x100
         cpufreq_init+0x160/0x430
         cpufreq_online+0x1cc/0xe30
         cpufreq_add_dev+0x78/0x198
         subsys_interface_register+0x168/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -&gt; #1 (subsys mutex#6){+.+.}:
         lock_acquire+0xb8/0x148
         __mutex_lock+0x104/0xf50
         mutex_lock_nested+0x1c/0x28
         subsys_interface_register+0xd8/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -&gt; #0 (cpu_hotplug_lock.rw_sem){++++}:
         __lock_acquire+0x203c/0x21d0
         lock_acquire+0xb8/0x148
         cpus_read_lock+0x58/0x1c8
         static_key_enable+0x14/0x30
         sched_feat_write+0x314/0x428
         full_proxy_write+0xa0/0x138
         __vfs_write+0xd8/0x388
         vfs_write+0xdc/0x318
         ksys_write+0xb4/0x138
         sys_write+0xc/0x18
         __sys_trace_return+0x0/0x4
  other info that might help us debug this:
  Chain exists of:
    cpu_hotplug_lock.rw_sem --&gt; opp_table_lock --&gt; &amp;sb-&gt;s_type-&gt;i_mutex_key#3
   Possible unsafe locking scenario:
         CPU0                    CPU1
         ----                    ----
    lock(&amp;sb-&gt;s_type-&gt;i_mutex_key#3);
                                 lock(opp_table_lock);
                                 lock(&amp;sb-&gt;s_type-&gt;i_mutex_key#3);
    lock(cpu_hotplug_lock.rw_sem);
   *** DEADLOCK ***
  2 locks held by sh/3358:
   #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318
   #1: 00000000c1b31a88 (&amp;sb-&gt;s_type-&gt;i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
  stack backtrace:
  CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18
  Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
  Call trace:
   dump_backtrace+0x0/0x288
   show_stack+0x14/0x20
   dump_stack+0x13c/0x1ac
   print_circular_bug.isra.10+0x270/0x438
   check_prev_add.constprop.16+0x4dc/0xb98
   __lock_acquire+0x203c/0x21d0
   lock_acquire+0xb8/0x148
   cpus_read_lock+0x58/0x1c8
   static_key_enable+0x14/0x30
   sched_feat_write+0x314/0x428
   full_proxy_write+0xa0/0x138
   __vfs_write+0xd8/0x388
   vfs_write+0xdc/0x318
   ksys_write+0xb4/0x138
   sys_write+0xc/0x18
   __sys_trace_return+0x0/0x4

This is because when loading the cpufreq_dt module we first acquire
cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking
the &amp;sb-&gt;s_type-&gt;i_mutex_key lock.

But when writing to /sys/kernel/debug/sched_features, the
cpu_hotplug_lock.rw_sem lock depends on the &amp;sb-&gt;s_type-&gt;i_mutex_key lock.

To fix this bug, reverse the lock acquisition order when writing to
sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on
&amp;sb-&gt;s_type-&gt;i_mutex_key.

Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Jiada Wang &lt;jiada_wang@mentor.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Eugeniu Rosca &lt;erosca@de.adit-jv.com&gt;
Cc: George G. Davis &lt;george_davis@mentor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2018-08-14T01:28:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-08-14T01:28:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13e091b6dd0e78a518a7d8756607d3acb8215768'/>
<id>urn:sha1:13e091b6dd0e78a518a7d8756607d3acb8215768</id>
<content type='text'>
Pull x86 timer updates from Thomas Gleixner:
 "Early TSC based time stamping to allow better boot time analysis.

  This comes with a general cleanup of the TSC calibration code which
  grew warts and duct taping over the years and removes 250 lines of
  code. Initiated and mostly implemented by Pavel with help from various
  folks"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/kvmclock: Mark kvm_get_preset_lpj() as __init
  x86/tsc: Consolidate init code
  sched/clock: Disable interrupts when calling generic_sched_clock_init()
  timekeeping: Prevent false warning when persistent clock is not available
  sched/clock: Close a hole in sched_clock_init()
  x86/tsc: Make use of tsc_calibrate_cpu_early()
  x86/tsc: Split native_calibrate_cpu() into early and late parts
  sched/clock: Use static key for sched_clock_running
  sched/clock: Enable sched clock early
  sched/clock: Move sched clock initialization and merge with generic clock
  x86/tsc: Use TSC as sched clock early
  x86/tsc: Initialize cyc2ns when tsc frequency is determined
  x86/tsc: Calibrate tsc only once
  ARM/time: Remove read_boot_clock64()
  s390/time: Remove read_boot_clock64()
  timekeeping: Default boot time offset to local_clock()
  timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
  s390/time: Add read_persistent_wall_and_boot_offset()
  x86/xen/time: Output xen sched_clock time from 0
  x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
  ...
</content>
</entry>
<entry>
<title>sched/debug: Reverse the order of printing faults</title>
<updated>2018-07-25T09:41:07Z</updated>
<author>
<name>Srikar Dronamraju</name>
<email>srikar@linux.vnet.ibm.com</email>
</author>
<published>2018-06-20T17:02:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67d9f6c256cd66e15f85c92670f52a7ad4689cff'/>
<id>urn:sha1:67d9f6c256cd66e15f85c92670f52a7ad4689cff</id>
<content type='text'>
Fix the order in which the private and shared numa faults are getting
printed.

No functional changes.

Running SPECjbb2005 on a 4 node machine and comparing bops/JVM
JVMS  LAST_PATCH  WITH_PATCH  %CHANGE
16    25215.7     25375.3     0.63
1     72107       72617       0.70

Signed-off-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1529514181-9842-7-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/clock: Use static key for sched_clock_running</title>
<updated>2018-07-19T22:02:43Z</updated>
<author>
<name>Pavel Tatashin</name>
<email>pasha.tatashin@oracle.com</email>
</author>
<published>2018-07-19T20:55:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=46457ea464f5341d1f9dad8dd213805d45f7f117'/>
<id>urn:sha1:46457ea464f5341d1f9dad8dd213805d45f7f117</id>
<content type='text'>
sched_clock_running may be read every time sched_clock_cpu() is called.
Yet, this variable is updated only twice during boot, and never changes
again, therefore it is better to make it a static key.

Signed-off-by: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-25-pasha.tatashin@oracle.com

</content>
</entry>
<entry>
<title>sched/debug: Use match_string() helper instead of open-coded logic</title>
<updated>2018-06-21T13:45:31Z</updated>
<author>
<name>Yisheng Xie</name>
<email>xieyisheng1@huawei.com</email>
</author>
<published>2018-05-31T11:11:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f894bf47dc9e8b77166125a084a7217693a28cd'/>
<id>urn:sha1:8f894bf47dc9e8b77166125a084a7217693a28cd</id>
<content type='text'>
match_string() returns the index of an array for a matching string,
which can be used instead of the open coded variant.

Signed-off-by: Yisheng Xie &lt;xieyisheng1@huawei.com&gt;
Reviewed-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/lkml/1527765086-19873-15-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>proc: introduce proc_create_seq{,_data}</title>
<updated>2018-05-16T05:23:35Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2018-04-13T17:44:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fddda2b7b521185f3aa018f9559eb33b0aee53a9'/>
<id>urn:sha1:fddda2b7b521185f3aa018f9559eb33b0aee53a9</id>
<content type='text'>
Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2018-04-02T18:49:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-04-02T18:49:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=46e0d28bdb8e6d00e27a0fe9e1d15df6098f0ffb'/>
<id>urn:sha1:46e0d28bdb8e6d00e27a0fe9e1d15df6098f0ffb</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
 "The main scheduler changes in this cycle were:

   - NUMA balancing improvements (Mel Gorman)

   - Further load tracking improvements (Patrick Bellasi)

   - Various NOHZ balancing cleanups and optimizations (Peter Zijlstra)

   - Improve blocked load handling, in particular we can now reduce and
     eventually stop periodic load updates on 'very idle' CPUs. (Vincent
     Guittot)

   - On isolated CPUs offload the final 1Hz scheduler tick as well, plus
     related cleanups and reorganization. (Frederic Weisbecker)

   - Core scheduler code cleanups (Ingo Molnar)"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  sched/core: Update preempt_notifier_key to modern API
  sched/cpufreq: Rate limits for SCHED_DEADLINE
  sched/fair: Update util_est only on util_avg updates
  sched/cpufreq/schedutil: Use util_est for OPP selection
  sched/fair: Use util_est in LB and WU paths
  sched/fair: Add util_est on top of PELT
  sched/core: Remove TASK_ALL
  sched/completions: Use bool in try_wait_for_completion()
  sched/fair: Update blocked load when newly idle
  sched/fair: Move idle_balance()
  sched/nohz: Merge CONFIG_NO_HZ_COMMON blocks
  sched/fair: Move rebalance_domains()
  sched/nohz: Optimize nohz_idle_balance()
  sched/fair: Reduce the periodic update duration
  sched/nohz: Stop NOHZ stats when decayed
  sched/cpufreq: Provide migration hint
  sched/nohz: Clean up nohz enter/exit
  sched/fair: Update blocked load from NEWIDLE
  sched/fair: Add NOHZ stats balancing
  sched/fair: Restructure nohz_balance_kick()
  ...
</content>
</entry>
<entry>
<title>sched/debug: Adjust newlines for better alignment</title>
<updated>2018-03-20T08:30:09Z</updated>
<author>
<name>Joe Lawrence</name>
<email>joe.lawrence@redhat.com</email>
</author>
<published>2018-03-19T18:35:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e9ca267096674eadd1fd479279bcb58df1486049'/>
<id>urn:sha1:e9ca267096674eadd1fd479279bcb58df1486049</id>
<content type='text'>
Scheduler debug stats include newlines that display out of alignment
when prefixed by timestamps.  For example, the dmesg utility:

  % echo t &gt; /proc/sysrq-trigger
  % dmesg
  ...
  [   83.124251]
  runnable tasks:
   S           task   PID         tree-key  switches  prio     wait-time
  sum-exec        sum-sleep
  -----------------------------------------------------------------------------------------------------------

At the same time, some syslog utilities (like rsyslog by default) don't
like the additional newlines control characters, saving lines like this
to /var/log/messages:

  Mar 16 16:02:29 localhost kernel: #012runnable tasks:#012 S           task   PID         tree-key ...
                                    ^^^^               ^^^^
Clean these up by moving newline characters to their own SEQ_printf
invocation.  This leaves the /proc/sched_debug unchanged, but brings the
entire output into alignment when prefixed:

  % echo t &gt; /proc/sysrq-trigger
  % dmesg
  ...
  [   62.410368] runnable tasks:
  [   62.410368]  S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
  [   62.410369] -----------------------------------------------------------------------------------------------------------
  [   62.410369]  I  kworker/u12:0     5      1932.215593       332   120         0.000000         3.621252         0.000000 0 0 /

and no escaped control characters from rsyslog in /var/log/messages:

  Mar 16 16:15:06 localhost kernel: runnable tasks:
  Mar 16 16:15:06 localhost kernel: S           task   PID         tree-key  ...

Signed-off-by: Joe Lawrence &lt;joe.lawrence@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/debug: Fix per-task line continuation for console output</title>
<updated>2018-03-20T08:30:09Z</updated>
<author>
<name>Joe Lawrence</name>
<email>joe.lawrence@redhat.com</email>
</author>
<published>2018-03-19T18:35:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a8c024cd9b9683d25ae1f459525dd2c6bec75e79'/>
<id>urn:sha1:a8c024cd9b9683d25ae1f459525dd2c6bec75e79</id>
<content type='text'>
When the SEQ_printf() macro prints to the console, it runs a simple
printk() without KERN_CONT "continued" line printing.  The result of
this is oddly wrapped task info, for example:

  % echo t &gt; /proc/sysrq-trigger
  % dmesg
  ...
  runnable tasks:
  ...
  [   29.608611]  I
  [   29.608613]       rcu_sched     8      3252.013846      4087   120
  [   29.608614]         0.000000        29.090111         0.000000
  [   29.608615]  0 0
  [   29.608616]  /

Modify SEQ_printf to use pr_cont() for expected one-line results:

  % echo t &gt; /proc/sysrq-trigger
  % dmesg
  ...
  runnable tasks:
  ...
  [  106.716329]  S        cpuhp/5    37      2006.315026        14   120         0.000000         0.496893         0.000000 0 0 /

Signed-off-by: Joe Lawrence &lt;joe.lawrence@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Add util_est on top of PELT</title>
<updated>2018-03-20T07:11:06Z</updated>
<author>
<name>Patrick Bellasi</name>
<email>patrick.bellasi@arm.com</email>
</author>
<published>2018-03-09T09:52:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f65ea42eb00bc902f1c37a71e984e4f4064cfa9'/>
<id>urn:sha1:7f65ea42eb00bc902f1c37a71e984e4f4064cfa9</id>
<content type='text'>
The util_avg signal computed by PELT is too variable for some use-cases.
For example, a big task waking up after a long sleep period will have its
utilization almost completely decayed. This introduces some latency before
schedutil will be able to pick the best frequency to run a task.

The same issue can affect task placement. Indeed, since the task
utilization is already decayed at wakeup, when the task is enqueued in a
CPU, this can result in a CPU running a big task as being temporarily
represented as being almost empty. This leads to a race condition where
other tasks can be potentially allocated on a CPU which just started to run
a big task which slept for a relatively long period.

Moreover, the PELT utilization of a task can be updated every [ms], thus
making it a continuously changing value for certain longer running
tasks. This means that the instantaneous PELT utilization of a RUNNING
task is not really meaningful to properly support scheduler decisions.

For all these reasons, a more stable signal can do a better job of
representing the expected/estimated utilization of a task/cfs_rq.
Such a signal can be easily created on top of PELT by still using it as
an estimator which produces values to be aggregated on meaningful
events.

This patch adds a simple implementation of util_est, a new signal built on
top of PELT's util_avg where:

    util_est(task) = max(task::util_avg, f(task::util_avg@dequeue))

This allows to remember how big a task has been reported by PELT in its
previous activations via f(task::util_avg@dequeue), which is the new
_task_util_est(struct task_struct*) function added by this patch.

If a task should change its behavior and it runs longer in a new
activation, after a certain time its util_est will just track the
original PELT signal (i.e. task::util_avg).

The estimated utilization of cfs_rq is defined only for root ones.
That's because the only sensible consumer of this signal are the
scheduler and schedutil when looking for the overall CPU utilization
due to FAIR tasks.

For this reason, the estimated utilization of a root cfs_rq is simply
defined as:

    util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued)

where:

    cfs_rq::util_est::enqueued = sum(_task_util_est(task))
                                 for each RUNNABLE task on that root cfs_rq

It's worth noting that the estimated utilization is tracked only for
objects of interests, specifically:

 - Tasks: to better support tasks placement decisions
 - root cfs_rqs: to better support both tasks placement decisions as
                 well as frequencies selection

Signed-off-by: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Joel Fernandes &lt;joelaf@google.com&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Rafael J . Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Steve Muckle &lt;smuckle@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Todd Kjos &lt;tkjos@android.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
