linux/kernel/smp.c, branch v6.12

smp: print only local CPU info when sched_clock goes backward

2024-08-14T18:36:48Z

About 40% of all csd_lock warnings observed in our fleet appear to be due to sched_clock() going backward in time (usually only a little bit), resulting in ts0 being larger than ts2. When the local CPU is at fault, we should print out a message reflecting that, rather than trying to get the remote CPU's stack trace. Signed-off-by: Rik van Riel Tested-by: "Paul E. McKenney" Signed-off-by: Neeraj Upadhyay

locking/csd-lock: Use backoff for repeated reports of same incident

2024-08-14T18:36:48Z

Currently, the CSD-lock diagnostics in CONFIG_CSD_LOCK_WAIT_DEBUG=y kernels are emitted at five-second intervals. Although this has proven to be a good time interval for the first diagnostic, if the target CPU keeps interrupts disabled for way longer than five seconds, the ratio of useful new information to pointless repetition increases considerably. Therefore, back off the time period for repeated reports of the same incident, increasing linearly with the number of reports and logarithmicly with the number of online CPUs. [ paulmck: Apply Dan Carpenter feedback. ] Signed-off-by: Paul E. McKenney Cc: Imran Khan Cc: Ingo Molnar Cc: Leonardo Bras Cc: "Peter Zijlstra (Intel)" Cc: Rik van Riel Reviewed-by: Rik van Riel Signed-off-by: Neeraj Upadhyay

locking/csd_lock: Provide an indication of ongoing CSD-lock stall

2024-08-14T18:35:39Z

If a CSD-lock stall goes on long enough, it will cause an RCU CPU stall warning. This additional warning provides much additional console-log traffic and little additional information. Therefore, provide a new csd_lock_is_stuck() function that returns true if there is an ongoing CSD-lock stall. This function will be used by the RCU CPU stall warnings to provide a one-line indication of the stall when this function returns true. [ neeraj.upadhyay: Apply Rik van Riel feedback. ] [ neeraj.upadhyay: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney Cc: Imran Khan Cc: Ingo Molnar Cc: Leonardo Bras Cc: "Peter Zijlstra (Intel)" Cc: Rik van Riel Signed-off-by: Neeraj Upadhyay

locking/csd_lock: Print large numbers as negatives

2024-07-29T02:14:38Z

The CSD-lock-hold diagnostics from CONFIG_CSD_LOCK_WAIT_DEBUG are printed in nanoseconds as unsigned long longs, which is a bit obtuse for human readers when timing bugs result in negative CSD-lock hold times. Yes, there are some people to whom it is immediately obvious that 18446744073709551615 is really -1, but for the rest of us... Therefore, print these numbers as signed long longs, making the negative hold times immediately apparent. Reported-by: Rik van Riel Signed-off-by: Paul E. McKenney Cc: Imran Khan Cc: Ingo Molnar Cc: Leonardo Bras Cc: "Peter Zijlstra (Intel)" Cc: Rik van Riel Reviewed-by: Rik van Riel Signed-off-by: Neeraj Upadhyay

smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()

2024-07-10T20:40:39Z

For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for the debug check in __init_work() to work correctly. But this lacks the counterpart to remove the tracked object from debug objects again, which will cause a debug object warning once the stack is freed. Add the missing destroy_work_on_stack() invocation to cure that. [ tglx: Massaged changelog ] Signed-off-by: Zqiang Signed-off-by: Thomas Gleixner Tested-by: Paul E. McKenney Link: https://lore.kernel.org/r/20240704065213.13559-1-qiang.zhang1211@gmail.com

smp: Use str_plural() to fix Coccinelle warnings

2024-06-17T13:17:44Z

Fixes the following two Coccinelle/coccicheck warnings reported by string_choices.cocci: opportunity for str_plural(num_cpus) opportunity for str_plural(num_nodes) Signed-off-by: Thorsten Blum Signed-off-by: Thomas Gleixner Acked-by: Paul E. McKenney Link: https://lore.kernel.org/r/20240508154225.309703-2-thorsten.blum@toblux.com

Merge tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

2023-10-31T03:56:53Z

Pull CSD lock update from Paul McKenney: "This adds a kernel boot parameter that causes the kernel to panic if one of the call_smp_function() APIs is stalled for more than the specified duration. This is useful in deployments in which a clean panic is preferable to an indefinite stall" * tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: smp,csd: Throw an error if a CSD lock is stuck for too long

Merge branch 'linus' into smp/core

2023-10-17T19:40:46Z

Pull in upstream to get the fixes so depending changes can be applied.

smp,csd: Throw an error if a CSD lock is stuck for too long

2023-10-16T23:06:37Z

The CSD lock seems to get stuck in 2 "modes". When it gets stuck temporarily, it usually gets released in a few seconds, and sometimes up to one or two minutes. If the CSD lock stays stuck for more than several minutes, it never seems to get unstuck, and gradually more and more things in the system end up also getting stuck. In the latter case, we should just give up, so the system can dump out a little more information about what went wrong, and, with panic_on_oops and a kdump kernel loaded, dump a whole bunch more information about what might have gone wrong. In addition, there is an smp.panic_on_ipistall kernel boot parameter that by default retains the old behavior, but when set enables the panic after the CSD lock has been stuck for more than the specified number of milliseconds, as in 300,000 for five minutes. [ paulmck: Apply Imran Khan feedback. ] [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/lkml/bc7cc8b0-f587-4451-8bcd-0daae627bcc7@paulmck-laptop/ Signed-off-by: Rik van Riel Signed-off-by: Paul E. McKenney Reviewed-by: Imran Khan Reviewed-by: Leonardo Bras Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juergen Gross Cc: Jonathan Corbet Cc: Randy Dunlap

smp: Change function signatures to use call_single_data_t

2023-09-13T12:59:24Z

call_single_data_t is a size-aligned typedef of struct __call_single_data. This alignment is desirable in order to have smp_call_function*() avoid bouncing an extra cacheline in case of an unaligned csd, given this would hurt performance. Since the removal of struct request->csd in commit 660e802c76c8 ("blk-mq: use percpu csd to remote complete instead of per-rq csd") there are no current users of smp_call_function*() with unaligned csd. Change every 'struct __call_single_data' function parameter to 'call_single_data_t', so we have warnings if any new code tries to introduce an smp_call_function*() call with unaligned csd. Signed-off-by: Leonardo Bras Reviewed-by: Guo Ren Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230831063129.335425-1-leobras@redhat.com