From e76946110137703c16423baf6ee177b751a34b7e Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Thu, 23 Jan 2025 16:25:35 +0800 Subject: workqueue: Put the pwq after detaching the rescuer from the pool MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The commit 68f83057b913("workqueue: Reap workers via kthread_stop() and remove detach_completion") adds code to reap the normal workers but mistakenly does not handle the rescuer and also removes the code waiting for the rescuer in put_unbound_pool(), which caused a use-after-free bug reported by Cheung Wall. To avoid the use-after-free bug, the pool’s reference must be held until the detachment is complete. Therefore, move the code that puts the pwq after detaching the rescuer from the pool. Reported-by: cheung wall Cc: cheung wall Link: https://lore.kernel.org/lkml/CAKHoSAvP3iQW+GwmKzWjEAOoPvzeWeoMO0Gz7Pp3_4kxt-RMoA@mail.gmail.com/ Fixes: 68f83057b913("workqueue: Reap workers via kthread_stop() and remove detach_completion") Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo --- kernel/workqueue.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'kernel/workqueue.c') diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 33a23c7b2274..ccad33001c58 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3516,12 +3516,6 @@ repeat: } } - /* - * Put the reference grabbed by send_mayday(). @pool won't - * go away while we're still attached to it. - */ - put_pwq(pwq); - /* * Leave this pool. Notify regular workers; otherwise, we end up * with 0 concurrency and stalling the execution. @@ -3532,6 +3526,12 @@ repeat: worker_detach_from_pool(rescuer); + /* + * Put the reference grabbed by send_mayday(). @pool might + * go away any time after it. + */ + put_pwq_unlocked(pwq); + raw_spin_lock_irq(&wq_mayday_lock); } -- cgit v1.2.3 From 8221fd1a73044adef712a5c9346a23c2447f629c Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 14 Feb 2025 16:43:49 +0000 Subject: workqueue: Log additional details when rejecting work Syzbot regularly runs into the following warning on arm64: | WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline] | WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline] | WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 | Modules linked in: | CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 | pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69 | lr : current_wq_worker kernel/workqueue_internal.h:69 [inline] | lr : is_chained_work kernel/workqueue.c:2199 [inline] | lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 [...] | __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L) | delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485 | call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793 | expire_timers kernel/time/timer.c:1839 [inline] | __run_timers kernel/time/timer.c:2418 [inline] | __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430 | run_timer_base kernel/time/timer.c:2439 [inline] | run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449 The warning is probably because we are trying to queue work into a destroyed workqueue, but the softirq context makes it hard to pinpoint the problematic caller. Extend the warning diagnostics to print both the function we are trying to queue as well as the name of the workqueue. Cc: Tejun Heo Cc: Catalin Marinas Cc: Marc Zyngier Cc: Lai Jiangshan Link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c Signed-off-by: Will Deacon Signed-off-by: Tejun Heo --- kernel/workqueue.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'kernel/workqueue.c') diff --git a/kernel/workqueue.c b/kernel/workqueue.c index ccad33001c58..902df3253598 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2254,8 +2254,10 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, * queues a new work item to a wq after destroy_workqueue(wq). */ if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) && - WARN_ON_ONCE(!is_chained_work(wq)))) + WARN_ONCE(!is_chained_work(wq), "workqueue: cannot queue %ps on wq %s\n", + work->func, wq->name))) { return; + } rcu_read_lock(); retry: /* pwq which will be used unless @work is executing elsewhere */ -- cgit v1.2.3