summaryrefslogtreecommitdiffstats
path: root/src/wc_neon.c
AgeCommit message (Collapse)AuthorLines
8 dayswc: improve aarch64 Neon optimization for 'wc -l'Collin Funk-56/+56
$ yes abcdefghijklmnopqrstuvwxyz | head -n 200000000 > input $ time ./src/wc-prev -l input 200000000 input real 0m1.240s user 0m0.456s sys 0m0.784s $ time ./src/wc -l input 200000000 input real 0m0.936s user 0m0.141s sys 0m0.795s * configure.ac: Use unsigned char for the buffer to avoid potential compiler warnings. Check for the functions being used in src/wc_neon.c after this patch. * src/wc_neon.c (wc_lines_neon): Use vreinterpretq_s8_u8 to convert 0xff into -1 instead of bitwise AND instructions into convert it into 1. Perform the pairwise addition and lane extraction once every 8192 bytes instead of once every 64 bytes. Thanks to Lasse Collin for spotting this and reviewing a draft of this patch.
2026-02-18wc: add aarch64 Neon optimization for wc -lCollin Funk-0/+109
Here is an example of the performance improvement: $ yes abcdefghijklmnopqrstuvwxyz | head -n 100000000 > input $ time ./src/wc-prev -l < input 100000000 real 0m0.793s user 0m0.630s sys 0m0.162s $ time ./src/wc -l < input 100000000 real 0m0.230s user 0m0.065s sys 0m0.164s * NEWS: Mention the performance improvement. * gnulib: Update to the latest commit. * configure.ac: Check the the necessary intrinsics and functions. * src/local.mk (noinst_LIBRARIES) [USE_NEON_WC_LINECOUNT]: Add src/libwc_neon.a. (src_libwc_neon_a_SOURCES, wc_neon_ldadd, src_libwc_neon_a_CFLAGS) [USE_NEON_WC_LINECOUNT]: New variables. (src_wc_LDADD) [USE_NEON_WC_LINECOUNT]: Add $(wc_neon_ldadd). * src/wc.c [USE_NEON_WC_LINECOUNT]: Include sys/auxv.h and asm/hwcap.h. (neon_supported) [USE_NEON_WC_LINECOUNT]: New function. (wc_lines) [USE_NEON_WC_LINECOUNT]: Use neon_supported and wc_lines_neon. * src/wc.h (wc_lines_neon): Add declaration. * src/wc_neon.c: New file. * doc/coreutils.texi (Hardware Acceleration): Document the "-ASIMD" hwcap and the variable used in ./configure to override detection of Neon instructions. * tests/wc/wc-cpu.sh: Also add "-ASIMD" to disable the use of Neon instructions.