<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-09-04T17:51:30Z</updated>
<entry>
<title>sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code</title>
<updated>2019-09-04T17:51:30Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2019-09-04T07:55:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1251201c0d34fadf69d56efa675c2b7dd0a90eca'/>
<id>urn:sha1:1251201c0d34fadf69d56efa675c2b7dd0a90eca</id>
<content type='text'>
Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

  $ chrt -p $$
  chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

  a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

 - if user-attributes have the same size as kernel attributes then the
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply
   skip the extra bits, but set attr-&gt;size to the (smaller) kernel size
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This
   code is so far away from VFS read() semantics that the naming is
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr-&gt;size assignment from the head of sched_getattr() to the
   sched_attr_copy_to_user() function. Nothing else within the kernel
   should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Reported-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Tested-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Acked-by: Thadeu Lima de Souza Cascardo &lt;cascardo@canonical.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Don't assign runtime for throttled cfs_rq</title>
<updated>2019-09-03T06:55:07Z</updated>
<author>
<name>Liangyan</name>
<email>liangyan.peng@linux.alibaba.com</email>
</author>
<published>2019-08-26T12:16:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e2d2cc2588bd3307ce3937acbc2ed03c830a861'/>
<id>urn:sha1:5e2d2cc2588bd3307ce3937acbc2ed03c830a861</id>
<content type='text'>
do_sched_cfs_period_timer() will refill cfs_b runtime and call
distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b-&gt;runtime
will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs
attached to this cfs_b can't get runtime and will be throttled.

We find that one throttled cfs_rq has non-negative
cfs_rq-&gt;runtime_remaining and cause an unexpetced cast from s64 to u64
in snippet:

  distribute_cfs_runtime() {
    runtime = -cfs_rq-&gt;runtime_remaining + 1;
  }

The runtime here will change to a large number and consume all
cfs_b-&gt;runtime in this cfs_b period.

According to Ben Segall, the throttled cfs_rq can have
account_cfs_rq_runtime called on it because it is throttled before
idle_balance, and the idle_balance calls update_rq_clock to add time
that is accounted to the task.

This commit prevents cfs_rq to be assgined new runtime if it has been
throttled until that distribute_cfs_runtime is called.

Signed-off-by: Liangyan &lt;liangyan.peng@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: shanpeic@linux.alibaba.com
Cc: stable@vger.kernel.org
Cc: xlpang@linux.alibaba.com
Fixes: d3d9dc330236 ("sched: Throttle entities exceeding their allowed bandwidth")
Link: https://lkml.kernel.org/r/20190826121633.6538-1-liangyan.peng@linux.alibaba.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-08-25T17:06:12Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-08-25T17:06:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8a04c2ee62a4020cf1b7818c300626819d62ff5e'/>
<id>urn:sha1:8a04c2ee62a4020cf1b7818c300626819d62ff5e</id>
<content type='text'>
Pull scheduler fix from Thomas Gleixner:
 "Handle the worker management in situations where a task is scheduled
  out on a PI lock contention correctly and schedule a new worker if
  possible"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Schedule new worker even if PI-blocked
</content>
</entry>
<entry>
<title>psi: get poll_work to run when calling poll syscall next time</title>
<updated>2019-08-25T02:48:42Z</updated>
<author>
<name>Jason Xing</name>
<email>kerneljasonxing@linux.alibaba.com</email>
</author>
<published>2019-08-25T00:54:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7b2b55da1db10a5525460633ae4b6fb0be060c41'/>
<id>urn:sha1:7b2b55da1db10a5525460633ae4b6fb0be060c41</id>
<content type='text'>
Only when calling the poll syscall the first time can user receive
POLLPRI correctly.  After that, user always fails to acquire the event
signal.

Reproduce case:
 1. Get the monitor code in Documentation/accounting/psi.txt
 2. Run it, and wait for the event triggered.
 3. Kill and restart the process.

The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0).  And the answer is because the
scheduling side sees group-&gt;poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker.  The
cancel needs to pair with resetting the poll_scheduled flag.

Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Jason Xing &lt;kerneljasonxing@linux.alibaba.com&gt;
Signed-off-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Reviewed-by: Caspar Zhang &lt;caspar@linux.alibaba.com&gt;
Reviewed-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Schedule new worker even if PI-blocked</title>
<updated>2019-08-19T08:57:26Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2019-08-16T16:06:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0fdc01354f45d43f082025636ef808968a27b36'/>
<id>urn:sha1:b0fdc01354f45d43f082025636ef808968a27b36</id>
<content type='text'>
If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to
schedule a new kworker if we schedule out due to lock contention because !RT
does not do that as well. A spinning spinlock disables preemption and a worker
does not schedule out on lock contention (but spin).

On RT the RW-semaphore implementation uses an rtmutex so
tsk_is_pi_blocked() will return true if a task blocks on it. In this case we
will now start a new worker which may deadlock if one worker is waiting on
progress from another worker. Since a RW-semaphore starts a new worker on !RT,
we should do the same on RT.

XFS is able to trigger this deadlock.

Allow to schedule new worker if the current worker is PI-blocked.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20190816160626.12742-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'pm-cpufreq'</title>
<updated>2019-08-16T12:24:51Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2019-08-16T12:24:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a3ee2477c45f73184a64d9c6cf97855a52732dc6'/>
<id>urn:sha1:a3ee2477c45f73184a64d9c6cf97855a52732dc6</id>
<content type='text'>
* pm-cpufreq:
  cpufreq: schedutil: Don't skip freq update when limits change
  cpufreq: dev_pm_qos_update_request() can return 1 on success
</content>
</entry>
<entry>
<title>cpufreq: schedutil: Don't skip freq update when limits change</title>
<updated>2019-08-10T11:53:19Z</updated>
<author>
<name>Viresh Kumar</name>
<email>viresh.kumar@linaro.org</email>
</author>
<published>2019-08-07T07:06:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=600f5badb78c316146d062cfd7af4a2cfb655baa'/>
<id>urn:sha1:600f5badb78c316146d062cfd7af4a2cfb655baa</id>
<content type='text'>
To avoid reducing the frequency of a CPU prematurely, we skip reducing
the frequency if the CPU had been busy recently.

This should not be done when the limits of the policy are changed, for
example due to thermal throttling. We should always get the frequency
within the new limits as soon as possible.

Trying to fix this by using only one flag, i.e. need_freq_update, can
lead to a race condition where the flag gets cleared without forcing us
to change the frequency at least once. And so this patch introduces
another flag to avoid that race condition.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ &lt;stable@vger.kernel.org&gt; # v4.18+
Reported-by: Doug Smythies &lt;dsmythies@telus.net&gt;
Tested-by: Doug Smythies &lt;dsmythies@telus.net&gt;
Signed-off-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>sched/psi: Do not require setsched permission from the trigger creator</title>
<updated>2019-08-06T10:49:18Z</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2019-07-30T01:33:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04e048cf09d7b5fc995817cdc5ae1acd4482429c'/>
<id>urn:sha1:04e048cf09d7b5fc995817cdc5ae1acd4482429c</id>
<content type='text'>
When a process creates a new trigger by writing into /proc/pressure/*
files, permissions to write such a file should be used to determine whether
the process is allowed to do so or not. Current implementation would also
require such a process to have setsched capability. Setting of psi trigger
thread's scheduling policy is an implementation detail and should not be
exposed to the user level. Remove the permission check by using _nocheck
version of the function.

Suggested-by: Nick Kralevich &lt;nnk@google.com&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: lizefan@huawei.com
Cc: mingo@redhat.com
Cc: akpm@linux-foundation.org
Cc: kernel-team@android.com
Cc: dennisszhou@gmail.com
Cc: dennis@kernel.org
Cc: hannes@cmpxchg.org
Cc: axboe@kernel.dk
Link: https://lkml.kernel.org/r/20190730013310.162367-1-surenb@google.com
</content>
</entry>
<entry>
<title>sched/psi: Reduce psimon FIFO priority</title>
<updated>2019-08-06T10:49:18Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-08-01T10:41:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=14f5c7b46a41a595fc61db37f55721714729e59e'/>
<id>urn:sha1:14f5c7b46a41a595fc61db37f55721714729e59e</id>
<content type='text'>
PSI defaults to a FIFO-99 thread, reduce this to FIFO-1.

FIFO-99 is the very highest priority available to SCHED_FIFO and
it not a suitable default; it would indicate the psi work is the
most important work on the machine.

Since Real-Time tasks will have pre-allocated memory and locked it in
place, Real-Time tasks do not care about PSI. All it needs is to be
above OTHER.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Tested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix double accounting of rq/running bw in push &amp; pull</title>
<updated>2019-08-06T10:49:18Z</updated>
<author>
<name>Dietmar Eggemann</name>
<email>dietmar.eggemann@arm.com</email>
</author>
<published>2019-08-02T14:59:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f4904815f97a934258445a8f763f6b6c48f007e7'/>
<id>urn:sha1:f4904815f97a934258445a8f763f6b6c48f007e7</id>
<content type='text'>
{push,pull}_dl_task() always calls {de,}activate_task() with .flags=0
which sets p-&gt;on_rq=TASK_ON_RQ_MIGRATING.

{push,pull}_dl_task()-&gt;{de,}activate_task()-&gt;{de,en}queue_task()-&gt;
{de,en}queue_task_dl() calls {sub,add}_{running,rq}_bw() since
p-&gt;on_rq==TASK_ON_RQ_MIGRATING.
So {sub,add}_{running,rq}_bw() in {push,pull}_dl_task() is
double-accounting for that task.

Fix it by removing rq/running bw accounting in [push/pull]_dl_task().

Fixes: 7dd778841164 ("sched/core: Unify p-&gt;on_rq updates")
Signed-off-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Luca Abeni &lt;luca.abeni@santannapisa.it&gt;
Cc: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Qais Yousef &lt;qais.yousef@arm.com&gt;
Link: https://lkml.kernel.org/r/20190802145945.18702-2-dietmar.eggemann@arm.com
</content>
</entry>
</feed>
