<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/lib/crc/arm64, branch master</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=master</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2026-04-02T23:14:53Z</updated>
<entry>
<title>lib/crc: arm64: Simplify intrinsics implementation</title>
<updated>2026-04-02T23:14:53Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2026-03-30T14:46:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8fdef85d601db670e9c178314eedffe7bbb07e52'/>
<id>urn:sha1:8fdef85d601db670e9c178314eedffe7bbb07e52</id>
<content type='text'>
NEON intrinsics are useful because they remove the need for manual
register allocation, and the resulting code can be re-compiled and
optimized for different micro-architectures, and shared between arm64
and 32-bit ARM.

However, the strong typing of the vector variables can lead to
incomprehensible gibberish, as is the case with the new CRC64
implementation. To address this, let's repaint all variables as
uint64x2_t to minimize the number of vreinterpretq_xxx() calls, and to
be able to rely on the ^ operator for exclusive OR operations. This
makes the code much more concise and readable.

While at it, wrap the calls to vmull_p64() et al in order to have a more
consistent calling convention, and encapsulate any remaining
vreinterpret() calls that are still needed.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260330144630.33026-11-ardb@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: arm64: Drop unnecessary chunking logic from crc64</title>
<updated>2026-04-02T23:14:53Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2026-03-30T14:46:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e0718ed60d60299840cfc2a408eb26042a20d186'/>
<id>urn:sha1:e0718ed60d60299840cfc2a408eb26042a20d186</id>
<content type='text'>
On arm64, kernel mode NEON executes with preemption enabled, so there is
no need to chunk the input by hand.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260330144630.33026-8-ardb@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: arm64: Assume a little-endian kernel</title>
<updated>2026-04-02T23:13:18Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-04-01T00:44:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5276ea17a23c829d4e4417569abff71a1c8342d9'/>
<id>urn:sha1:5276ea17a23c829d4e4417569abff71a1c8342d9</id>
<content type='text'>
Since support for big-endian arm64 kernels was removed, the CPU_LE()
macro now unconditionally emits the code it is passed, and the CPU_BE()
macro now unconditionally discards the code it is passed.

Simplify the assembly code in lib/crc/arm64/ accordingly.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260401004431.151432-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation</title>
<updated>2026-03-29T20:22:13Z</updated>
<author>
<name>Demian Shulhan</name>
<email>demyansh@gmail.com</email>
</author>
<published>2026-03-29T07:43:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63432fd625372a0e79fb00a4009af204f4edc013'/>
<id>urn:sha1:63432fd625372a0e79fb00a4009af204f4edc013</id>
<content type='text'>
Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON
Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR
software implementation is slow, which creates a bottleneck in NVMe and
other storage subsystems.

The acceleration is implemented using C intrinsics (&lt;arm_neon.h&gt;) rather
than raw assembly for better readability and maintainability.

Key highlights of this implementation:
- Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency
  spikes on large buffers.
- Pre-calculates and loads fold constants via vld1q_u64() to minimize
  register spilling.
- Benchmarks show the break-even point against the generic implementation
  is around 128 bytes. The PMULL path is enabled only for len &gt;= 128.

Performance results (kunit crc_benchmark on Cortex-A72):
- Generic (len=4096): ~268 MB/s
- PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)

Signed-off-by: Demian Shulhan &lt;demyansh@gmail.com&gt;
Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API</title>
<updated>2025-11-12T08:52:01Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2025-10-01T11:27:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4fb623074ea537524d06598acbb5517f027f3b53'/>
<id>urn:sha1:4fb623074ea537524d06598acbb5517f027f3b53</id>
<content type='text'>
Before modifying the prototypes of kernel_neon_begin() and
kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
which encapsulates the calls to those functions.

For symmetry, do the same for 32-bit ARM too.

Reviewed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: Drop inline from all *_mod_init_arch() functions</title>
<updated>2025-08-16T02:06:08Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-16T02:02:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5ff74f5f71f83cce3c920cd17940df0fe0401865'/>
<id>urn:sha1:5ff74f5f71f83cce3c920cd17940df0fe0401865</id>
<content type='text'>
Drop 'inline' from all the *_mod_init_arch() functions so that the
compiler will warn about any bugs where they are unused due to not being
wired up properly.  (There are no such bugs currently, so this just
establishes a more robust convention for the future.  Of course, these
functions also tend to get inlined anyway, regardless of the keyword.)

Link: https://lore.kernel.org/r/20250816020240.431545-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: Use underlying functions instead of crypto_simd_usable()</title>
<updated>2025-08-11T18:28:00Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-08-11T18:26:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2a0c5156a40c40edb0cce80ce11c97ab39c67e3'/>
<id>urn:sha1:c2a0c5156a40c40edb0cce80ce11c97ab39c67e3</id>
<content type='text'>
Since crc_kunit now tests the fallback code paths without using
crypto_simd_disabled_for_test, make the CRC code just use the underlying
may_use_simd() and irq_fpu_usable() functions directly instead of
crypto_simd_usable().  This eliminates an unnecessary layer.

Take the opportunity to add likely() and unlikely() annotations as well.

Link: https://lore.kernel.org/r/20250811182631.376302-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/crc: arm64: Migrate optimized CRC code into lib/crc/</title>
<updated>2025-06-30T16:31:57Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-06-07T20:04:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2b7531b2a2037959ac81ff2c95e4557b30cfd253'/>
<id>urn:sha1:2b7531b2a2037959ac81ff2c95e4557b30cfd253</id>
<content type='text'>
Move the arm64-optimized CRC code from arch/arm64/lib/crc* into its new
location in lib/crc/arm64/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: "Jason A. Donenfeld" &lt;Jason@zx2c4.com&gt;
Link: https://lore.kernel.org/r/20250607200454.73587-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
</entry>
</feed>
