<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cgroup, branch v5.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-11-15T16:44:08Z</updated>
<entry>
<title>Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2019-11-15T16:44:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-11-15T16:44:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4c0800e4285f96900b7aa4a13ae724fc1205f65'/>
<id>urn:sha1:b4c0800e4285f96900b7aa4a13ae724fc1205f65</id>
<content type='text'>
Pull misc vfs fixes from Al Viro:
 "Assorted fixes all over the place; some of that is -stable fodder,
  some regressions from the last window"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ecryptfs_lookup_interpose(): lower_dentry-&gt;d_parent is not stable either
  ecryptfs_lookup_interpose(): lower_dentry-&gt;d_inode is not stable
  ecryptfs: fix unlink and rmdir in face of underlying fs modifications
  audit_get_nd(): don't unlock parent too early
  exportfs_decode_fh(): negative pinned may become positive without the parent locked
  cgroup: don't put ERR_PTR() into fc-&gt;root
  autofs: fix a leak in autofs_expire_indirect()
  aio: Fix io_pgetevents() struct __compat_aio_sigset layout
  fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()
</content>
</entry>
<entry>
<title>cgroup: don't put ERR_PTR() into fc-&gt;root</title>
<updated>2019-11-10T16:53:27Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-11-10T16:53:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=630faf81b3e61bcc90dc6d8b497800657d2752a5'/>
<id>urn:sha1:630faf81b3e61bcc90dc6d8b497800657d2752a5</id>
<content type='text'>
the caller of -&gt;get_tree() expects NULL left there on error...

Reported-by: Thibaut Sautereau &lt;thibaut@sautereau.fr&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>sched/topology: Don't try to build empty sched domains</title>
<updated>2019-10-29T08:58:45Z</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2019-10-23T15:37:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f'/>
<id>urn:sha1:cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f</id>
<content type='text'>
Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
    Hardware name: ARM Juno development board (r0) (DT)
    Workqueue: events cpuset_hotplug_workfn
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    lr : build_sched_domains (kernel/sched/topology.c:1966)
    Call trace:
    build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    partition_sched_domains_locked (kernel/sched/topology.c:2250)
    rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
    rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
    cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
    process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
    worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:255)
    ret_from_fork (arch/arm64/kernel/entry.S:1167)
    Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

  $ cgcreate -g cpuset:asym
  $ cgset -r cpuset.cpus=0-3 asym
  $ cgset -r cpuset.mems=0 asym
  $ cgset -r cpuset.cpu_exclusive=1 asym

  $ cgcreate -g cpuset:smp
  $ cgset -r cpuset.cpus=4-5 smp
  $ cgset -r cpuset.mems=0 smp
  $ cgset -r cpuset.cpu_exclusive=1 smp

  $ cgset -r cpuset.sched_load_balance=0 .

  $ echo 0 &gt; /sys/devices/system/cpu/cpu4/online
  $ echo 0 &gt; /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2019-09-17T22:57:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-17T22:57:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ee8d6c592dc7fb240574b84e9f9a7f9db4d4b42'/>
<id>urn:sha1:3ee8d6c592dc7fb240574b84e9f9a7f9db4d4b42</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:
 "Three minor cleanup patches"

* 'for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Use kvmalloc in cgroups-v1
  cgroup: minor tweak for logic to get cgroup css
  cgroup: Replace a seq_printf() call by seq_puts() in cgroup_print_ss_mask()
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-09-17T00:25:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-17T00:25:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e67a859997aad47727aff9c5a32e160da079ce3'/>
<id>urn:sha1:7e67a859997aad47727aff9c5a32e160da079ce3</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
   Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
   Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

   As perf and the scheduler is getting bigger and more complex,
   document the status quo of current responsibilities and interests,
   and spread the review pain^H^H^H^H fun via an increase in the Cc:
   linecount generated by scripts/get_maintainer.pl. :-)

 - Add another series of patches that brings the -rt (PREEMPT_RT) tree
   closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
   into a new CONFIG_PREEMPTION category that will allow the eventual
   introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
   to go though.

 - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
   allow the finer shaping of CPU bandwidth usage.

 - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

 - Improve the behavior of high CPU count, high thread count
   applications running under cpu.cfs_quota_us constraints.

 - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

 - Improve CPU isolation housekeeping CPU allocation NUMA locality.

 - Fix deadline scheduler bandwidth calculations and logic when cpusets
   rebuilds the topology, or when it gets deadline-throttled while it's
   being offlined.

 - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
   setscheduler() system calls without creating global serialization.
   Add new synchronization between cpuset topology-changing events and
   the deadline acceptance tests in setscheduler(), which were broken
   before.

 - Rework the active_mm state machine to be less confusing and more
   optimal.

 - Rework (simplify) the pick_next_task() slowpath.

 - Improve load-balancing on AMD EPYC systems.

 - ... and misc cleanups, smaller fixes and improvements - please see
   the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  sched/psi: Correct overly pessimistic size calculation
  sched/fair: Speed-up energy-aware wake-ups
  sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
  sched/uclamp: Update CPU's refcount on TG's clamp changes
  sched/uclamp: Use TG's clamps to restrict TASK's clamps
  sched/uclamp: Propagate system defaults to the root group
  sched/uclamp: Propagate parent clamps
  sched/uclamp: Extend CPU's cgroup controller
  sched/topology: Improve load balancing on AMD EPYC systems
  arch, ia64: Make NUMA select SMP
  sched, perf: MAINTAINERS update, add submaintainers and reviewers
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  cpufreq: schedutil: fix equation in comment
  sched: Rework pick_next_task() slow-path
  sched: Allow put_prev_task() to drop rq-&gt;lock
  sched/fair: Expose newidle_balance()
  sched: Add task_struct pointer to sched_class::set_curr_task
  sched: Rework CPU hotplug task selection
  sched/{rt,deadline}: Fix set_next_task vs pick_next_task
  sched: Fix kerneldoc comment for ia64_set_curr_task
  ...
</content>
</entry>
<entry>
<title>cgroup: freezer: fix frozen state inheritance</title>
<updated>2019-09-12T21:04:45Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2019-09-12T17:56:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=97a61369830ab085df5aed0ff9256f35b07d425a'/>
<id>urn:sha1:97a61369830ab085df5aed0ff9256f35b07d425a</id>
<content type='text'>
If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.

The problem can be reproduced with the test_cgfreezer_mkdir test.

This is the output before this patch:
  ~/test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
  not ok 4 test_cgfreezer_mkdir
  ok 5 test_cgfreezer_rmdir
  ok 6 test_cgfreezer_migrate
  ok 7 test_cgfreezer_ptrace
  ok 8 test_cgfreezer_stopped
  ok 9 test_cgfreezer_ptraced
  ok 10 test_cgfreezer_vfork

And with this patch:
  ~/test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  ok 4 test_cgfreezer_mkdir
  ok 5 test_cgfreezer_rmdir
  ok 6 test_cgfreezer_migrate
  ok 7 test_cgfreezer_ptrace
  ok 8 test_cgfreezer_stopped
  ok 9 test_cgfreezer_ptraced
  ok 10 test_cgfreezer_vfork

Reported-by: Mark Crossen &lt;mcrossen@fb.com&gt;
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Use kvmalloc in cgroups-v1</title>
<updated>2019-08-07T18:37:58Z</updated>
<author>
<name>Marc Koderer</name>
<email>marc@koderer.com</email>
</author>
<published>2019-08-06T13:24:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=653a23ca7e1e1ae907654e91d028bc6dfc7fce0c'/>
<id>urn:sha1:653a23ca7e1e1ae907654e91d028bc6dfc7fce0c</id>
<content type='text'>
Instead of using its own logic for k-/vmalloc rely on
kvmalloc which is actually doing quite the same.

Signed-off-by: Marc Koderer &lt;marc@koderer.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Prevent race condition between cpuset and __sched_setscheduler()</title>
<updated>2019-07-25T13:55:04Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2019-07-19T14:00:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=710da3c8ea7dfbd327920afd3831d8c82c42789d'/>
<id>urn:sha1:710da3c8ea7dfbd327920afd3831d8c82c42789d</id>
<content type='text'>
No synchronisation mechanism exists between the cpuset subsystem and
calls to function __sched_setscheduler(). As such, it is possible that
new root domains are created on the cpuset side while a deadline
acceptance test is carried out in __sched_setscheduler(), leading to a
potential oversell of CPU bandwidth.

Grab cpuset_rwsem read lock from core scheduler, so to prevent
situations such as the one described above from happening.

The only exception is normalize_rt_tasks() which needs to work under
tasklist_lock and can't therefore grab cpuset_rwsem. We are fine with
this, as this function is only called by sysrq and, if that gets
triggered, DEADLINE guarantees are already gone out of the window
anyway.

Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-9-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Change cpuset_rwsem and hotplug lock order</title>
<updated>2019-07-25T13:55:03Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2019-07-19T13:59:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d74b27d63a8bebe2fe634944e4ebdc7b10db7a39'/>
<id>urn:sha1:d74b27d63a8bebe2fe634944e4ebdc7b10db7a39</id>
<content type='text'>
cpuset_rwsem is going to be acquired from sched_setscheduler() with a
following patch. There are however paths (e.g., spawn_ksoftirqd) in
which sched_scheduler() is eventually called while holding hotplug lock;
this creates a dependecy between hotplug lock (to be always acquired
first) and cpuset_rwsem (to be always acquired after hotplug lock).

Fix paths which currently take the two locks in the wrong order (after
a following patch is applied).

Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-7-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem</title>
<updated>2019-07-25T13:55:02Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2019-07-19T13:59:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1243dc518c9da467da6635313a2dbb41b8ffc275'/>
<id>urn:sha1:1243dc518c9da467da6635313a2dbb41b8ffc275</id>
<content type='text'>
Holding cpuset_mutex means that cpusets are stable (only the holder can
make changes) and this is required for fixing a synchronization issue
between cpusets and scheduler core.  However, grabbing cpuset_mutex from
setscheduler() hotpath (as implemented in a later patch) is a no-go, as
it would create a bottleneck for tasks concurrently calling
setscheduler().

Convert cpuset_mutex to be a percpu_rwsem (cpuset_rwsem), so that
setscheduler() will then be able to read lock it and avoid concurrency
issues.

Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-6-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
