<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/locking, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-02-21T18:45:51Z</updated>
<entry>
<title>Merge tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu</title>
<updated>2023-02-21T18:45:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-21T18:45:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8cc01d43f882fa1f44d8aa6727a6ea783d8fbe3f'/>
<id>urn:sha1:8cc01d43f882fa1f44d8aa6727a6ea783d8fbe3f</id>
<content type='text'>
Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes, perhaps most notably:

      - Throttling callback invocation based on the number of callbacks
        that are now ready to invoke instead of on the total number of
        callbacks

      - Several patches that suppress false-positive boot-time
        diagnostics, for example, due to lockdep not yet being
        initialized

      - Make expedited RCU CPU stall warnings dump stacks of any tasks
        that are blocking the stalled grace period. (Normal RCU CPU
        stall warnings have done this for many years)

      - Lazy-callback fixes to avoid delays during boot, suspend, and
        resume. (Note that lazy callbacks must be explicitly enabled, so
        this should not (yet) affect production use cases)

 - Make kfree_rcu() and friends take advantage of polled grace periods,
   thus reducing memory footprint by almost two orders of magnitude,
   admittedly on a microbenchmark

   This also begins the transition from kfree_rcu(p) to
   kfree_rcu_mightsleep(p). This transition was motivated by bugs where
   kfree_rcu(p), which can block, was typed instead of the intended
   kfree_rcu(p, rh)

 - SRCU updates, perhaps most notably fixing a bug that causes SRCU to
   fail when booted on a system with a non-zero boot CPU. This
   surprising situation actually happens for kdump kernels on the
   powerpc architecture

   This also adds an srcu_down_read() and srcu_up_read(), which act like
   srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side
   critical section to be handed off from one task to another

 - Clean up the now-useless SRCU Kconfig option

   There are a few more commits that are not yet acked or pulled into
   maintainer trees, and these will be in a pull request for a later
   merge window

 - RCU-tasks updates, perhaps most notably these fixes:

      - A strange interaction between PID-namespace unshare and the
        RCU-tasks grace period that results in a low-probability but
        very real hang

      - A race between an RCU tasks rude grace period on a single-CPU
        system and CPU-hotplug addition of the second CPU that can
        result in a too-short grace period

      - A race between shrinking RCU tasks down to a single callback
        list and queuing a new callback to some other CPU, but where
        that queuing is delayed for more than an RCU grace period. This
        can result in that callback being stranded on the non-boot CPU

 - Torture-test updates and fixes

 - Torture-test scripting updates and fixes

 - Provide additional RCU CPU stall-warning information in kernels built
   with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute
   timeout limit for expedited RCU CPU stall warnings

* tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
  rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep()
  kernel/notifier: Remove CONFIG_SRCU
  init: Remove "select SRCU"
  fs/quota: Remove "select SRCU"
  fs/notify: Remove "select SRCU"
  fs/btrfs: Remove "select SRCU"
  fs: Remove CONFIG_SRCU
  drivers/pci/controller: Remove "select SRCU"
  drivers/net: Remove "select SRCU"
  drivers/md: Remove "select SRCU"
  drivers/hwtracing/stm: Remove "select SRCU"
  drivers/dax: Remove "select SRCU"
  drivers/base: Remove CONFIG_SRCU
  rcu: Disable laziness if lazy-tracking says so
  rcu: Track laziness during boot and suspend
  rcu: Remove redundant call to rcu_boost_kthread_setaffinity()
  rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts
  rcu: Align the output of RCU CPU stall warning messages
  rcu: Add RCU stall diagnosis information
  sched: Add helper nr_context_switches_cpu()
  ...
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2023-02-21T01:41:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-21T01:41:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1f2d9ffc7a5f916935749ffc6e93fb33bfe94d2f'/>
<id>urn:sha1:1f2d9ffc7a5f916935749ffc6e93fb33bfe94d2f</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix &amp; rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs &amp; quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code &amp; self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
</content>
</entry>
<entry>
<title>Merge tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2023-02-21T01:18:23Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-21T01:18:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e649d08568220ee88deef0a1ad8b3a935420cf2'/>
<id>urn:sha1:6e649d08568220ee88deef0a1ad8b3a935420cf2</id>
<content type='text'>
Pull locking updates from Ingo Molnar:

 - rwsem micro-optimizations

 - spinlock micro-optimizations

 - cleanups, simplifications

* tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vduse: Remove include of rwlock.h
  locking/lockdep: Remove lockdep_init_map_crosslock.
  x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
  x86/PAT: Use try_cmpxchg() in set_page_memtype()
  locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
  locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
  locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  locking/qspinlock: Micro-optimize pending state waiting for unlock
</content>
</entry>
<entry>
<title>rtmutex: Ensure that the top waiter is always woken up</title>
<updated>2023-02-06T13:49:13Z</updated>
<author>
<name>Wander Lairson Costa</name>
<email>wander@redhat.com</email>
</author>
<published>2023-02-02T12:30:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=db370a8b9f67ae5f17e3d5482493294467784504'/>
<id>urn:sha1:db370a8b9f67ae5f17e3d5482493294467784504</id>
<content type='text'>
Let L1 and L2 be two spinlocks.

Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
waiter of L2.

Let T2 be the task holding L2.

Let T3 be a task trying to acquire L1.

The following events will lead to a state in which the wait queue of L2
isn't empty, but no task actually holds the lock.

T1                T2                                  T3
==                ==                                  ==

                                                      spin_lock(L1)
                                                      | raw_spin_lock(L1-&gt;wait_lock)
                                                      | rtlock_slowlock_locked(L1)
                                                      | | task_blocks_on_rt_mutex(L1, T3)
                                                      | | | orig_waiter-&gt;lock = L1
                                                      | | | orig_waiter-&gt;task = T3
                                                      | | | raw_spin_unlock(L1-&gt;wait_lock)
                                                      | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)
                  spin_unlock(L2)                     | | | |
                  | rt_mutex_slowunlock(L2)           | | | |
                  | | raw_spin_lock(L2-&gt;wait_lock)    | | | |
                  | | wakeup(T1)                      | | | |
                  | | raw_spin_unlock(L2-&gt;wait_lock)  | | | |
                                                      | | | | waiter = T1-&gt;pi_blocked_on
                                                      | | | | waiter == rt_mutex_top_waiter(L2)
                                                      | | | | waiter-&gt;task == T1
                                                      | | | | raw_spin_lock(L2-&gt;wait_lock)
                                                      | | | | dequeue(L2, waiter)
                                                      | | | | update_prio(waiter, T1)
                                                      | | | | enqueue(L2, waiter)
                                                      | | | | waiter != rt_mutex_top_waiter(L2)
                                                      | | | | L2-&gt;owner == NULL
                                                      | | | | wakeup(T1)
                                                      | | | | raw_spin_unlock(L2-&gt;wait_lock)
T1 wakes up
T1 != top_waiter(L2)
schedule_rtlock()

If the deadline of T1 is updated before the call to update_prio(), and the
new deadline is greater than the deadline of the second top waiter, then
after the requeue, T1 is no longer the top waiter, and the wrong task is
woken up which will then go back to sleep because it is not the top waiter.

This can be reproduced in PREEMPT_RT with stress-ng:

while true; do
    stress-ng --sched deadline --sched-period 1000000000 \
    	    --sched-runtime 800000000 --sched-deadline \
    	    1000000000 --mmapfork 23 -t 20
done

A similar issue was pointed out by Thomas versus the cases where the top
waiter drops out early due to a signal or timeout, which is a general issue
for all regular rtmutex use cases, e.g. futex.

The problematic code is in rt_mutex_adjust_prio_chain():

    	// Save the top waiter before dequeue/enqueue
	prerequeue_top_waiter = rt_mutex_top_waiter(lock);

	rt_mutex_dequeue(lock, waiter);
	waiter_update_prio(waiter, task);
	rt_mutex_enqueue(lock, waiter);

	// Lock has no owner?
	if (!rt_mutex_owner(lock)) {
	   	// Top waiter changed		      			   
  ----&gt;		if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
  ----&gt;			wake_up_state(waiter-&gt;task, waiter-&gt;wake_state);

This only takes the case into account where @waiter is the new top waiter
due to the requeue operation.

But it fails to handle the case where @waiter is not longer the top
waiter due to the requeue operation.

Ensure that the new top waiter is woken up so in all cases so it can take
over the ownerless lock.

[ tglx: Amend changelog, add Fixes tag ]

Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter")
Signed-off-by: Wander Lairson Costa &lt;wander@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com
Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com
</content>
</entry>
<entry>
<title>cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG</title>
<updated>2023-01-31T14:01:45Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-01-26T15:08:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5a5d7e9badd2cb8065db171961bd30bd3595e4b6'/>
<id>urn:sha1:5a5d7e9badd2cb8065db171961bd30bd3595e4b6</id>
<content type='text'>
In order to avoid WARN/BUG from generating nested or even recursive
warnings, force rcu_is_watching() true during
WARN/lockdep_rcu_suspicious().

Notably things like unwinding the stack can trigger rcu_dereference()
warnings, which then triggers more unwinding which then triggers more
warnings etc..

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20230126151323.408156109@infradead.org
</content>
</entry>
<entry>
<title>locking/rwsem: Disable preemption in all down_write*() and up_write() code paths</title>
<updated>2023-01-26T10:46:46Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-01-26T00:36:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1d61659ced6bd8881cf2fb5cbcb28f9541fc7430'/>
<id>urn:sha1:1d61659ced6bd8881cf2fb5cbcb28f9541fc7430</id>
<content type='text'>
The previous patch has disabled preemption in all the down_read() and
up_read() code paths. For symmetry, this patch extends commit:

  48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock")

... to have preemption disabled in all the down_write() and up_write()
code paths, including downgrade_write().

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20230126003628.365092-4-longman@redhat.com
</content>
</entry>
<entry>
<title>locking/rwsem: Disable preemption in all down_read*() and up_read() code paths</title>
<updated>2023-01-26T10:46:46Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-01-26T00:36:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3f5245538a1964ae186ab7e1636020a41aa63143'/>
<id>urn:sha1:3f5245538a1964ae186ab7e1636020a41aa63143</id>
<content type='text'>
Commit:

  91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner")

... assumes that when the owner field is changed to NULL, the lock will
become free soon. But commit:

  48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock")

... disabled preemption when acquiring rwsem for write.

However, preemption has not yet been disabled when acquiring a read lock
on a rwsem.  So a reader can add a RWSEM_READER_BIAS to count without
setting owner to signal a reader, got preempted out by a RT task which
then spins in the writer slowpath as owner remains NULL leading to live lock.

One easy way to fix this problem is to disable preemption at all the
down_read*() and up_read() code paths as implemented in this patch.

Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner")
Reported-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20230126003628.365092-3-longman@redhat.com
</content>
</entry>
<entry>
<title>locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath</title>
<updated>2023-01-26T10:46:45Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-01-26T00:36:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b613c7f31476c44316bfac1af7cac714b7d6bef9'/>
<id>urn:sha1:b613c7f31476c44316bfac1af7cac714b7d6bef9</id>
<content type='text'>
A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

  Non-first RT waiter    First waiter      Lock holder
  -------------------    ------------      -----------
  Acquire wait_lock
  rwsem_try_write_lock():
    Set handoff bit if RT or
      wait too long
    Set waiter-&gt;handoff_set
  Release wait_lock
                         Acquire wait_lock
                         Inherit waiter-&gt;handoff_set
                         Release wait_lock
					   Clear owner
                                           Release lock
  if (waiter.handoff_set) {
    rwsem_spin_on_owner(();
    if (OWNER_NULL)
      goto trylock_again;
  }
  trylock_again:
  Acquire wait_lock
  rwsem_try_write_lock():
     if (first-&gt;handoff_set &amp;&amp; (waiter != first))
	return false;
  Release wait_lock

A non-first waiter cannot really acquire the rwsem even if it mistakenly
believes that it can spin on OWNER_NULL value. If that waiter happens
to be an RT task running on the same CPU as the first waiter, it can
block the first waiter from acquiring the rwsem leading to live lock.
Fix this problem by making sure that a non-first waiter cannot spin in
the slowpath loop without sleeping.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Reviewed-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230126003628.365092-2-longman@redhat.com
</content>
</entry>
<entry>
<title>locktorture: Make the rt_boost factor a tunable</title>
<updated>2023-01-05T20:10:35Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2022-12-13T20:48:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c24501b240741907ddfce52ffc186792db5ad3a5'/>
<id>urn:sha1:c24501b240741907ddfce52ffc186792db5ad3a5</id>
<content type='text'>
The rt boosting in locktorture has a factor variable s currently large enough
that boosting only happens once every minute or so. Add a tunable to reduce the
factor so that boosting happens more often, to test paths and arrive at failure
modes earlier. With this change, I can set the factor to like 50 and have the
boosting happens every 10 seconds or so.

Tested with boot parameters:
locktorture.torture_type=mutex_lock
locktorture.onoff_interval=1
locktorture.nwriters_stress=8
locktorture.stutter=0
locktorture.rt_boost=1
locktorture.rt_boost_factor=50
locktorture.nlocks=3

Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>locktorture: Allow non-rtmutex lock types to be boosted</title>
<updated>2023-01-05T20:10:35Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2022-12-13T20:48:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e01f3a1a589e314cf27bd2cb27d9c2c58e105a27'/>
<id>urn:sha1:e01f3a1a589e314cf27bd2cb27d9c2c58e105a27</id>
<content type='text'>
Currently RT boosting is only done for rtmutex_lock, however with proxy
execution, we also have the mutex_lock participating in priorities. To
exercise the testing better, add RT boosting to other lock testing types
as well, using a new knob (rt_boost).

Tested with boot parameters:
locktorture.torture_type=mutex_lock
locktorture.onoff_interval=1
locktorture.nwriters_stress=8
locktorture.stutter=0
locktorture.rt_boost=1
locktorture.rt_boost_factor=1
locktorture.nlocks=3

Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
</feed>
