<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/workqueue.c, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-06-23T19:08:14Z</updated>
<entry>
<title>workqueue: clean up WORK_* constant types, clarify masking</title>
<updated>2023-06-23T19:08:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-06-23T19:08:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=afa4bb778e48d79e4a642ed41e3b4e0de7489a6c'/>
<id>urn:sha1:afa4bb778e48d79e4a642ed41e3b4e0de7489a6c</id>
<content type='text'>
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:

  kernel/workqueue.c: In function ‘get_work_pwq’:
  kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    713 |                 return (void *)(data &amp; WORK_STRUCT_WQ_DATA_MASK);
        |                        ^
  [ ... a couple of other cases ... ]

and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.

Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.

The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused.  The mask needs to be 'unsigned long', not some unspecified
enum type.

To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.

That's now how we roll in the kernel.

So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.

Incidentally, this magically makes clang generate better code.  That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.

Reported-by: Dave Airlie &lt;airlied@gmail.com&gt;
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2023-04-29T16:48:52Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-04-29T16:48:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd546fa325161fbe374480a5081b6ebb7d1bec95'/>
<id>urn:sha1:cd546fa325161fbe374480a5081b6ebb7d1bec95</id>
<content type='text'>
Pull workqueue updates from Tejun Heo:
 "Mostly changes from Petr to improve warning and error reporting.

  Workqueue now reports more of the relevant failures with better
  context which should help debugging"

* tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Introduce show_freezable_workqueues
  workqueue: Print backtraces from CPUs with hung CPU bound workqueues
  workqueue: Warn when a rescuer could not be created
  workqueue: Interrupted create_worker() is not a repeated event
  workqueue: Warn when a new worker could not be created
  workqueue: Fix hung time report of worker pools
  workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()
  MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry
</content>
</entry>
<entry>
<title>workqueue: Introduce show_freezable_workqueues</title>
<updated>2023-03-24T01:55:38Z</updated>
<author>
<name>Jungseung Lee</name>
<email>js07.lee@samsung.com</email>
</author>
<published>2023-03-20T03:29:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=704bc669e1dda3eb8f6d5cb462b21e85558a3912'/>
<id>urn:sha1:704bc669e1dda3eb8f6d5cb462b21e85558a3912</id>
<content type='text'>
Currently show_all_workqueue is called if freeze fails at the time of
freeze the workqueues, which shows the status of all workqueues and of
all worker pools. In this cases we may only need to dump state of only
workqueues that are freezable and busy.

This patch defines show_freezable_workqueues, which uses
show_one_workqueue, a granular function that shows the state of individual
workqueues, so that dump only the state of freezable workqueues
at that time.

tj: Minor message adjustment.

Signed-off-by: Jungseung Lee &lt;js07.lee@samsung.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Print backtraces from CPUs with hung CPU bound workqueues</title>
<updated>2023-03-17T22:03:47Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-03-07T12:53:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd2440d66fec7d1bdb4f605b64c27c63c9141989'/>
<id>urn:sha1:cd2440d66fec7d1bdb4f605b64c27c63c9141989</id>
<content type='text'>
The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

Worker pools for unbound workqueues always wake up an idle worker and
try to process the work immediately. The last idle worker has to create
new worker first. The stall might happen only when a new worker could
not be created in which case an error should get printed. Another problem
might be too high load. In this case, workers are victims of a global
system problem.

Worker pools for CPU bound workqueues are designed for lightweight
work items that do not need much CPU time. They are proceed one by
one on a single worker. New worker is used only when a work is sleeping.
It creates one additional scenario. The stall might happen when
the CPU-bound workqueue is used for CPU-intensive work.

More precisely, the stall is detected when a CPU-bound worker is in
the TASK_RUNNING state for too long. In this case, it might be useful
to see the backtrace from the problematic worker.

The information how long a worker is in the running state is not available.
But the CPU-bound worker pools do not have many workers in the running
state by definition. And only few pools are typically blocked.

It should be acceptable to print backtraces from all workers in
TASK_RUNNING state in the stalled worker pools. The number of false
positives should be very low.

Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Warn when a rescuer could not be created</title>
<updated>2023-03-17T22:03:46Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-03-07T12:53:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4c0736a76a186e5df2cd2afda3e7a04d2a427d1b'/>
<id>urn:sha1:4c0736a76a186e5df2cd2afda3e7a04d2a427d1b</id>
<content type='text'>
Rescuers are created when a workqueue with WQ_MEM_RECLAIM is allocated.
It typically happens during the system boot.

systemd switches the root filesystem from initrd to the booted system
during boot. It kills processes that block the switch for too long.
One of the process might be modprobe that tries to create a workqueue.

These problems are hard to reproduce. Also alloc_workqueue() does not
pass the error code. Make the debugging easier by printing an error,
similar to create_worker().

Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Interrupted create_worker() is not a repeated event</title>
<updated>2023-03-17T22:03:46Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-03-07T12:53:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60f540389a5d2df25ddc7ad511b4fa2880dea521'/>
<id>urn:sha1:60f540389a5d2df25ddc7ad511b4fa2880dea521</id>
<content type='text'>
kthread_create_on_node() might get interrupted(). It is rare but realistic.
For example, when an unbound workqueue is allocated in module_init()
callback. It is done in the context of the "modprobe" process. And,
for example, systemd might kill pending processes when switching root
from initrd to the booted system.

The interrupt is a one-off event and the race might be hard to reproduce.
It is always worth printing.

Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Warn when a new worker could not be created</title>
<updated>2023-03-17T22:03:46Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-03-07T12:53:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3f0ea0b864562c6bd1cee892026067eaea7be242'/>
<id>urn:sha1:3f0ea0b864562c6bd1cee892026067eaea7be242</id>
<content type='text'>
The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

The progress is guaranteed by using idle workers or creating new workers
for pending work items.

There are several reasons why a new worker could not be created:

   + there is not enough memory

   + there is no free pool ID (IDR API)

   + the system reached PID limit

   + the process creating the new worker was interrupted

   + the last idle worker (manager) has not been scheduled for a long
     time. It was not able to even start creating the kthread.

None of these failures is reported at the moment. The only clue is that
show_one_worker_pool() prints that there is a manager. It is the last
idle worker that is responsible for creating a new one. But it is not
clear if create_worker() is failing and why.

Make the debugging easier by printing errors in create_worker().

The error code is important, especially from kthread_create_on_node().
It helps to distinguish the various reasons. For example, reaching
memory limit (-ENOMEM), other system limits (-EAGAIN), or process
interrupted (-EINTR).

Use pr_once() to avoid repeating the same error every CREATE_COOLDOWN
for each stuck worker pool.

Ratelimited printk() might be better. It would help to know if the problem
remains. It would be more clear if the create_worker() errors and workqueue
stalls are related. Also old messages might get lost when the internal log
buffer is full. The problem is that printk() might touch the watchdog.
For example, see touch_nmi_watchdog() in serial8250_console_write().
It would require synchronization of the begin and length of the ratelimit
interval with the workqueue watchdog. Otherwise, the error messages
might break the watchdog. This does not look worth the complexity.

Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Fix hung time report of worker pools</title>
<updated>2023-03-17T22:03:46Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-03-07T12:53:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=335a42ebb0ca8ee9997a1731aaaae6dcd704c113'/>
<id>urn:sha1:335a42ebb0ca8ee9997a1731aaaae6dcd704c113</id>
<content type='text'>
The workqueue watchdog prints a warning when there is no progress in
a worker pool. Where the progress means that the pool started processing
a pending work item.

Note that it is perfectly fine to process work items much longer.
The progress should be guaranteed by waking up or creating idle
workers.

show_one_worker_pool() prints state of non-idle worker pool. It shows
a delay since the last pool-&gt;watchdog_ts.

The timestamp is updated when a first pending work is queued in
__queue_work(). Also it is updated when a work is dequeued for
processing in worker_thread() and rescuer_thread().

The delay is misleading when there is no pending work item. In this
case it shows how long the last work item is being proceed. Show
zero instead. There is no stall if there is no pending work.

Fixes: 82607adcf9cdf40fb7b ("workqueue: implement lockup detector")
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()</title>
<updated>2023-03-17T21:49:03Z</updated>
<author>
<name>Ammar Faizi</name>
<email>ammarfaizi2@gnuweeb.org</email>
</author>
<published>2023-02-26T16:53:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a8ec5880bd82b834717770cba4596381ffd50545'/>
<id>urn:sha1:a8ec5880bd82b834717770cba4596381ffd50545</id>
<content type='text'>
Use pr_warn_once() to achieve the same thing. It's simpler.

Signed-off-by: Ammar Faizi &lt;ammarfaizi2@gnuweeb.org&gt;
Reviewed-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: move to use bus_get_dev_root()</title>
<updated>2023-03-17T14:29:23Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2023-03-13T18:28:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=686f669780276da534e93ba769e02bdcf1f89f8d'/>
<id>urn:sha1:686f669780276da534e93ba769e02bdcf1f89f8d</id>
<content type='text'>
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/20230313182918.1312597-8-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
