summaryrefslogtreecommitdiffstats
path: root/lib/crc/arm64
AgeCommit message (Collapse)AuthorLines
2026-04-02lib/crc: arm64: Simplify intrinsics implementationArd Biesheuvel-45/+32
NEON intrinsics are useful because they remove the need for manual register allocation, and the resulting code can be re-compiled and optimized for different micro-architectures, and shared between arm64 and 32-bit ARM. However, the strong typing of the vector variables can lead to incomprehensible gibberish, as is the case with the new CRC64 implementation. To address this, let's repaint all variables as uint64x2_t to minimize the number of vreinterpretq_xxx() calls, and to be able to rely on the ^ operator for exclusive OR operations. This makes the code much more concise and readable. While at it, wrap the calls to vmull_p64() et al in order to have a more consistent calling convention, and encapsulate any remaining vreinterpret() calls that are still needed. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260330144630.33026-11-ardb@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-02lib/crc: arm64: Drop unnecessary chunking logic from crc64Ard Biesheuvel-7/+5
On arm64, kernel mode NEON executes with preemption enabled, so there is no need to chunk the input by hand. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260330144630.33026-8-ardb@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-02lib/crc: arm64: Assume a little-endian kernelEric Biggers-35/+30
Since support for big-endian arm64 kernels was removed, the CPU_LE() macro now unconditionally emits the code it is passed, and the CPU_BE() macro now unconditionally discards the code it is passed. Simplify the assembly code in lib/crc/arm64/ accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401004431.151432-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-29lib/crc: arm64: add NEON accelerated CRC64-NVMe implementationDemian Shulhan-0/+108
Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems. The acceleration is implemented using C intrinsics (<arm_neon.h>) rather than raw assembly for better readability and maintainability. Key highlights of this implementation: - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers. - Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling. - Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128. Performance results (kunit crc_benchmark on Cortex-A72): - Generic (len=4096): ~268 MB/s - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement) Signed-off-by: Demian Shulhan <demyansh@gmail.com> Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard APIArd Biesheuvel-23/+12
Before modifying the prototypes of kernel_neon_begin() and kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers allocated on the stack, move arm64 to the new 'ksimd' scoped guard API, which encapsulates the calls to those functions. For symmetry, do the same for 32-bit ARM too. Reviewed-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-08-15lib/crc: Drop inline from all *_mod_init_arch() functionsEric Biggers-1/+1
Drop 'inline' from all the *_mod_init_arch() functions so that the compiler will warn about any bugs where they are unused due to not being wired up properly. (There are no such bugs currently, so this just establishes a more robust convention for the future. Of course, these functions also tend to get inlined anyway, regardless of the keyword.) Link: https://lore.kernel.org/r/20250816020240.431545-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-11lib/crc: Use underlying functions instead of crypto_simd_usable()Eric Biggers-9/+8
Since crc_kunit now tests the fallback code paths without using crypto_simd_disabled_for_test, make the CRC code just use the underlying may_use_simd() and irq_fpu_usable() functions directly instead of crypto_simd_usable(). This eliminates an unnecessary layer. Take the opportunity to add likely() and unlikely() annotations as well. Link: https://lore.kernel.org/r/20250811182631.376302-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30lib/crc: arm64: Migrate optimized CRC code into lib/crc/Eric Biggers-0/+976
Move the arm64-optimized CRC code from arch/arm64/lib/crc* into its new location in lib/crc/arm64/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/". Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>