linux/include/asm-generic/bitops, branch v5.14

lib: add fast path for find_first_*_bit() and find_last_bit()

2021-05-07T02:24:12Z

Similarly to bitmap functions, users would benefit if we'll handle a case of small-size bitmaps that fit into a single word. While here, move the find_last_bit() declaration to bitops/find.h where other find_*_bit() functions sit. Link: https://lkml.kernel.org/r/20210401003153.97325-11-yury.norov@gmail.com Signed-off-by: Yury Norov Acked-by: Rasmus Villemoes Acked-by: Andy Shevchenko Cc: Alexey Klimov Cc: Arnd Bergmann Cc: David Sterba Cc: Dennis Zhou Cc: Geert Uytterhoeven Cc: Jianpeng Ma Cc: Joe Perches Cc: John Paul Adrian Glaubitz Cc: Josh Poimboeuf Cc: Rich Felker Cc: Stefano Brivio Cc: Wei Yang Cc: Wolfram Sang Cc: Yoshinori Sato Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib: add fast path for find_next_*_bit()

2021-05-07T02:24:12Z

Similarly to bitmap functions, find_next_*_bit() users will benefit if we'll handle a case of bitmaps that fit into a single word inline. In the very best case, the compiler may replace a function call with a few instructions. This is the quite typical find_next_bit() user: unsigned int cpumask_next(int n, const struct cpumask *srcp) { /* -1 is a legal arg here. */ if (n != -1) cpumask_check(n); return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1); } EXPORT_SYMBOL(cpumask_next); Currently, on ARM64 the generated code looks like this: 0000000000000000 : 0: a9bf7bfd stp x29, x30, [sp, #-16]! 4: 11000402 add w2, w0, #0x1 8: aa0103e0 mov x0, x1 c: d2800401 mov x1, #0x40 // #64 10: 910003fd mov x29, sp 14: 93407c42 sxtw x2, w2 18: 94000000 bl 0 1c: a8c17bfd ldp x29, x30, [sp], #16 20: d65f03c0 ret 24: d503201f nop After applying this patch: 0000000000000140 : 140: 11000400 add w0, w0, #0x1 144: 93407c00 sxtw x0, w0 148: f100fc1f cmp x0, #0x3f 14c: 54000168 b.hi 178 // b.pmore 150: f9400023 ldr x3, [x1] 154: 92800001 mov x1, #0xffffffffffffffff // #-1 158: 9ac02020 lsl x0, x1, x0 15c: 52800802 mov w2, #0x40 // #64 160: 8a030001 and x1, x0, x3 164: dac00020 rbit x0, x1 168: f100003f cmp x1, #0x0 16c: dac01000 clz x0, x0 170: 1a800040 csel w0, w2, w0, eq // eq = none 174: d65f03c0 ret 178: 52800800 mov w0, #0x40 // #64 17c: d65f03c0 ret find_next_bit() call is replaced with 6 instructions. find_next_bit() itself is 41 instructions plus function call overhead. Despite inlining, the scripts/bloat-o-meter report smaller .text size after applying the series: add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988) Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com Signed-off-by: Yury Norov Acked-by: Rasmus Villemoes Acked-by: Andy Shevchenko Cc: Alexey Klimov Cc: Arnd Bergmann Cc: David Sterba Cc: Dennis Zhou Cc: Geert Uytterhoeven Cc: Jianpeng Ma Cc: Joe Perches Cc: John Paul Adrian Glaubitz Cc: Josh Poimboeuf Cc: Rich Felker Cc: Stefano Brivio Cc: Wei Yang Cc: Wolfram Sang Cc: Yoshinori Sato Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

lib: inline _find_next_bit() wrappers

2021-05-07T02:24:12Z

lib/find_bit.c declares five single-line wrappers for _find_next_bit(). We may turn those wrappers to inline functions. It eliminates unneeded function calls and opens room for compile-time optimizations. Link: https://lkml.kernel.org/r/20210401003153.97325-8-yury.norov@gmail.com Signed-off-by: Yury Norov Acked-by: Rasmus Villemoes Acked-by: Andy Shevchenko Cc: Alexey Klimov Cc: Arnd Bergmann Cc: David Sterba Cc: Dennis Zhou Cc: Geert Uytterhoeven Cc: Jianpeng Ma Cc: Joe Perches Cc: John Paul Adrian Glaubitz Cc: Josh Poimboeuf Cc: Rich Felker Cc: Stefano Brivio Cc: Wei Yang Cc: Wolfram Sang Cc: Yoshinori Sato Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

arm64: make atomic helpers __always_inline

2021-01-13T15:09:06Z

With UBSAN enabled and building with clang, there are occasionally warnings like WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed The function arch_atomic64_or() references the variable __initdata numa_nodes_parsed. This is often because arch_atomic64_or lacks a __initdata annotation or the annotation of numa_nodes_parsed is wrong. for functions that end up not being inlined as intended but operating on __initdata variables. Mark these as __always_inline, along with the corresponding asm-generic wrappers. Signed-off-by: Arnd Bergmann Acked-by: Will Deacon Link: https://lore.kernel.org/r/20210108092024.4034860-1-arnd@kernel.org Signed-off-by: Catalin Marinas

asm-generic: fix ffs -Wshadow warning

2020-10-26T16:00:29Z

gcc -Wshadow warns about the ffs() definition that has the same name as the global ffs() built-in: include/asm-generic/bitops/builtin-ffs.h:13:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow] This is annoying because 'make W=2' warns every time this header gets included. Change it to use a #define instead, making callers directly reference the builtin. Signed-off-by: Arnd Bergmann

bitops, kcsan: Partially revert instrumentation for non-atomic bitops

2020-08-24T22:10:24Z

Previous to the change to distinguish read-write accesses, when CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y is set, KCSAN would consider the non-atomic bitops as atomic. We want to partially revert to this behaviour, but with one important distinction: report racing modifications, since lost bits due to non-atomicity are certainly possible. Given the operations here only modify a single bit, assuming non-atomicity of the writer is sufficient may be reasonable for certain usage (and follows the permissible nature of the "assume plain writes atomic" rule). In other words: 1. We want non-atomic read-modify-write races to be reported; this is accomplished by kcsan_check_read(), where any concurrent write (atomic or not) will generate a report. 2. We do not want to report races with marked readers, but -do- want to report races with unmarked readers; this is accomplished by the instrument_write() ("assume atomic write" with Kconfig option set). With the above rules, when KCSAN_ASSUME_PLAIN_WRITES_ATOMIC is selected, it is hoped that KCSAN's reporting behaviour is better aligned with current expected permissible usage for non-atomic bitops. Note that, a side-effect of not telling KCSAN that the accesses are read-writes, is that this information is not displayed in the access summary in the report. It is, however, visible in inline-expanded stack traces. For now, it does not make sense to introduce yet another special case to KCSAN's runtime, only to cater to the case here. Cc: Dmitry Vyukov Cc: Paul E. McKenney Cc: Will Deacon Cc: Arnd Bergmann Cc: Daniel Axtens Cc: Michael Ellerman Cc: Signed-off-by: Marco Elver Signed-off-by: Paul E. McKenney

asm-generic/bitops: Use instrument_read_write() where appropriate

2020-08-24T22:09:59Z

Use the new instrument_read_write() where appropriate. Acked-by: Peter Zijlstra (Intel) Signed-off-by: Marco Elver Signed-off-by: Paul E. McKenney

asm-generic, kcsan: Add KCSAN instrumentation for bitops

2020-03-21T08:41:46Z

Add explicit KCSAN checks for bitops. Acked-by: Arnd Bergmann Signed-off-by: Marco Elver Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar

Merge tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

2019-12-06T21:36:31Z

Pull more powerpc updates from Michael Ellerman: "A few commits splitting the KASAN instrumented bitops header in three, to match the split of the asm-generic bitops headers. This is needed on powerpc because we use the generic bitops for the non-atomic case only, whereas the existing KASAN instrumented bitops assume all the underlying operations are provided by the arch as arch_foo() versions. Thanks to: Daniel Axtens & Christophe Leroy" * tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: docs/core-api: Remove possibly confusing sub-headings from Bit Operations powerpc: support KASAN instrumentation of bitops kasan: support instrumented bitops combined with generic bitops

bitops: introduce the for_each_set_clump8 macro

2019-12-05T03:44:12Z

Pach series "Introduce the for_each_set_clump8 macro", v18. While adding GPIO get_multiple/set_multiple callback support for various drivers, I noticed a pattern of looping manifesting that would be useful standardized as a macro. This patchset introduces the for_each_set_clump8 macro and utilizes it in several GPIO drivers. The for_each_set_clump macro8 facilitates a for-loop syntax that iterates over a memory region entire groups of set bits at a time. For example, suppose you would like to iterate over a 32-bit integer 8 bits at a time, skipping over 8-bit groups with no set bit, where XXXXXXXX represents the current 8-bit group: Example: 10111110 00000000 11111111 00110011 First loop: 10111110 00000000 11111111 XXXXXXXX Second loop: 10111110 00000000 XXXXXXXX 00110011 Third loop: XXXXXXXX 00000000 11111111 00110011 Each iteration of the loop returns the next 8-bit group that has at least one set bit. The for_each_set_clump8 macro has four parameters: * start: set to the bit offset of the current clump * clump: set to the current clump value * bits: bitmap to search within * size: bitmap size in number of bits In this version of the patchset, the for_each_set_clump macro has been reimplemented and simplified based on the suggestions provided by Rasmus Villemoes and Andy Shevchenko in the version 4 submission. In particular, the function of the for_each_set_clump macro has been restricted to handle only 8-bit clumps; the drivers that use the for_each_set_clump macro only handle 8-bit ports so a generic for_each_set_clump implementation is not necessary. Thus, a solution for large clumps (i.e. those larger than the width of a bitmap word) can be postponed until a driver appears that actually requires such a generic for_each_set_clump implementation. For what it's worth, a semi-generic for_each_set_clump (i.e. for clumps smaller than the width of a bitmap word) can be implemented by simply replacing the hardcoded '8' and '0xFF' instances with respective variables. I have not yet had a need for such an implementation, and since it falls short of a true generic for_each_set_clump function, I have decided to forgo such an implementation for now. In addition, the bitmap_get_value8 and bitmap_set_value8 functions are introduced to get and set 8-bit values respectively. Their use is based on the behavior suggested in the patchset version 4 review. This patch (of 14): This macro iterates for each 8-bit group of bits (clump) with set bits, within a bitmap memory region. For each iteration, "start" is set to the bit offset of the found clump, while the respective clump value is stored to the location pointed by "clump". Additionally, the bitmap_get_value8 and bitmap_set_value8 functions are introduced to respectively get and set an 8-bit value in a bitmap memory region. [gustavo@embeddedor.com: fix potential sign-extension overflow] Link: http://lkml.kernel.org/r/20191015184657.GA26541@embeddedor [akpm@linux-foundation.org: s/ULL/UL/, per Joe] [vilhelm.gray@gmail.com: add for_each_set_clump8 documentation] Link: http://lkml.kernel.org/r/20191016161825.301082-1-vilhelm.gray@gmail.com Link: http://lkml.kernel.org/r/893c3b4f03266c9496137cc98ac2b1bd27f92c73.1570641097.git.vilhelm.gray@gmail.com Signed-off-by: William Breathitt Gray Signed-off-by: Gustavo A. R. Silva Suggested-by: Andy Shevchenko Suggested-by: Rasmus Villemoes Suggested-by: Lukas Wunner Tested-by: Andy Shevchenko Cc: Arnd Bergmann Cc: Linus Walleij Cc: Bartosz Golaszewski Cc: Masahiro Yamada Cc: Geert Uytterhoeven Cc: Phil Reid Cc: Geert Uytterhoeven Cc: Mathias Duckeck Cc: Morten Hein Tiljeset Cc: Sean Nyekjaer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds