From 95bcade33a8af38755c9b0636e36a36ad3789fe6 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 13 Feb 2018 13:22:56 +0000 Subject: locking/qspinlock: Ensure node is initialised before updating prev->next When a locker ends up queuing on the qspinlock locking slowpath, we initialise the relevant mcs node and publish it indirectly by updating the tail portion of the lock word using xchg_tail. If we find that there was a pre-existing locker in the queue, we subsequently update their ->next field to point at our node so that we are notified when it's our turn to take the lock. This can be roughly illustrated as follows: /* Initialise the fields in node and encode a pointer to node in tail */ tail = initialise_node(node); /* * Exchange tail into the lockword using an atomic read-modify-write * operation with release semantics */ old = xchg_tail(lock, tail); /* If there was a pre-existing waiter ... */ if (old & _Q_TAIL_MASK) { prev = decode_tail(old); smp_read_barrier_depends(); /* ... then update their ->next field to point to node. WRITE_ONCE(prev->next, node); } The conditional update of prev->next therefore relies on the address dependency from the result of xchg_tail ensuring order against the prior initialisation of node. However, since the release semantics of the xchg_tail operation apply only to the write portion of the RmW, then this ordering is not guaranteed and it is possible for the CPU to return old before the writes to node have been published, consequently allowing us to point prev->next to an uninitialised node. This patch fixes the problem by making the update of prev->next a RELEASE operation, which also removes the reliance on dependency ordering. Signed-off-by: Will Deacon Acked-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar --- kernel/locking/qspinlock.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'kernel/locking') diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 38ece035039e..348c8cec1042 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -408,14 +408,15 @@ queue: */ if (old & _Q_TAIL_MASK) { prev = decode_tail(old); + /* - * The above xchg_tail() is also a load of @lock which - * generates, through decode_tail(), a pointer. The address - * dependency matches the RELEASE of xchg_tail() such that - * the subsequent access to @prev happens after. + * We must ensure that the stores to @node are observed before + * the write to prev->next. The address dependency from + * xchg_tail is not sufficient to ensure this because the read + * component of xchg_tail is unordered with respect to the + * initialisation of @node. */ - - WRITE_ONCE(prev->next, node); + smp_store_release(&prev->next, node); pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked); -- cgit v1.2.3 From 11dc13224c975efcec96647a4768a6f1bb7a19a8 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 13 Feb 2018 13:22:57 +0000 Subject: locking/qspinlock: Ensure node->count is updated before initialising node When queuing on the qspinlock, the count field for the current CPU's head node is incremented. This needn't be atomic because locking in e.g. IRQ context is balanced and so an IRQ will return with node->count as it found it. However, the compiler could in theory reorder the initialisation of node[idx] before the increment of the head node->count, causing an IRQ to overwrite the initialised node and potentially corrupt the lock state. Avoid the potential for this harmful compiler reordering by placing a barrier() between the increment of the head node->count and the subsequent node initialisation. Signed-off-by: Will Deacon Acked-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar --- kernel/locking/qspinlock.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'kernel/locking') diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 348c8cec1042..d880296245c5 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -379,6 +379,14 @@ queue: tail = encode_tail(smp_processor_id(), idx); node += idx; + + /* + * Ensure that we increment the head node->count before initialising + * the actual node. If the compiler is kind enough to reorder these + * stores, then an IRQ could overwrite our assignments. + */ + barrier(); + node->locked = 0; node->next = NULL; pv_init_node(node); -- cgit v1.2.3 From 6b0ef92fee2a3189eba6d6b827b247cb4f6da7e9 Mon Sep 17 00:00:00 2001 From: Boqun Feng Date: Fri, 9 Mar 2018 14:56:28 +0800 Subject: rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a HARDIRQ-safe->HARDIRQ-unsafe deadlock: ================================ WARNING: inconsistent lock state 4.16.0-rc4+ #1 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. takes: __schedule+0xbe/0xaf0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock+0x2a/0x40 scheduler_tick+0x47/0xf0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rq->lock); lock(&rq->lock); *** DEADLOCK *** 1 lock held by rcu_torture_rea/724: rcu_torture_read_lock+0x0/0x70 stack backtrace: CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: lock_acquire+0x90/0x200 ? __schedule+0xbe/0xaf0 _raw_spin_lock+0x2a/0x40 ? __schedule+0xbe/0xaf0 __schedule+0xbe/0xaf0 preempt_schedule_irq+0x2f/0x60 retint_kernel+0x1b/0x2d RIP: 0010:rcu_read_unlock_special+0x0/0x680 ? rcu_torture_read_unlock+0x60/0x60 __rcu_read_unlock+0x64/0x70 rcu_torture_read_unlock+0x17/0x60 rcu_torture_reader+0x275/0x450 ? rcutorture_booster_init+0x110/0x110 ? rcu_torture_stall+0x230/0x230 ? kthread+0x10e/0x130 kthread+0x10e/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ? call_usermodehelper_exec_async+0x11a/0x150 ret_from_fork+0x3a/0x50 This happens with the following even sequence: preempt_schedule_irq(); local_irq_enable(); __schedule(): local_irq_disable(); // irq off ... rcu_note_context_switch(): rcu_note_preempt_context_switch(): rcu_read_unlock_special(): local_irq_save(flags); ... raw_spin_unlock_irqrestore(...,flags); // irq remains off rt_mutex_futex_unlock(): raw_spin_lock_irq(); ... raw_spin_unlock_irq(); // accidentally set irq on rq_lock(): raw_spin_lock(); // acquiring rq->lock with irq on which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause deadlocks in scheduler code. This problem was introduced by commit 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints"). That brought the user of rt_mutex_futex_unlock() with irq off. To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with *lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock() with irq off. Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints") Signed-off-by: Boqun Feng Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Lai Jiangshan Cc: Steven Rostedt Cc: Josh Triplett Cc: Mathieu Desnoyers Cc: "Paul E . McKenney" Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.feng@gmail.com --- kernel/locking/rtmutex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'kernel/locking') diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 65cc0cb984e6..940633c63254 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1616,11 +1616,12 @@ bool __sched __rt_mutex_futex_unlock(struct rt_mutex *lock, void __sched rt_mutex_futex_unlock(struct rt_mutex *lock) { DEFINE_WAKE_Q(wake_q); + unsigned long flags; bool postunlock; - raw_spin_lock_irq(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); postunlock = __rt_mutex_futex_unlock(lock, &wake_q); - raw_spin_unlock_irq(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); if (postunlock) rt_mutex_postunlock(&wake_q); -- cgit v1.2.3 From 45dbac0e288350f9a4226a5b4b651ed434dd9f85 Mon Sep 17 00:00:00 2001 From: Matthew Wilcox Date: Thu, 15 Mar 2018 04:58:12 -0700 Subject: locking/mutex: Improve documentation On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the kernel-doc for mutex_lock_interruptible(). Signed-off-by: Matthew Wilcox Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Jonathan Corbet Cc: Kirill Tkhai Cc: Linus Torvalds Cc: Mauro Carvalho Chehab Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: cl@linux.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20180315115812.GA9949@bombadil.infradead.org Signed-off-by: Ingo Molnar --- kernel/locking/mutex.c | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-) (limited to 'kernel/locking') diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 858a07590e39..2048359f33d2 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -1082,15 +1082,16 @@ static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock); /** - * mutex_lock_interruptible - acquire the mutex, interruptible - * @lock: the mutex to be acquired + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. + * @lock: The mutex to be acquired. * - * Lock the mutex like mutex_lock(), and return 0 if the mutex has - * been acquired or sleep until the mutex becomes available. If a - * signal arrives while waiting for the lock then this function - * returns -EINTR. + * Lock the mutex like mutex_lock(). If a signal is delivered while the + * process is sleeping, this function will return without acquiring the + * mutex. * - * This function is similar to (but not equivalent to) down_interruptible(). + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * signal arrived. */ int __sched mutex_lock_interruptible(struct mutex *lock) { @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) EXPORT_SYMBOL(mutex_lock_interruptible); +/** + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). If a signal which will be fatal to + * the current process is delivered while the process is sleeping, this + * function will return without acquiring the mutex. + * + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * fatal signal arrived. + */ int __sched mutex_lock_killable(struct mutex *lock) { might_sleep(); @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) } EXPORT_SYMBOL(mutex_lock_killable); +/** + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). While the task is waiting for this + * mutex, it will be accounted as being in the IO wait state by the + * scheduler. + * + * Context: Process context. + */ void __sched mutex_lock_io(struct mutex *lock) { int token; -- cgit v1.2.3