coreutils - Mirror of https://https.git.savannah.gnu.org/git/coreutils.git/

Age	Commit message (Collapse)	Author	Lines
14 days	yes: make operation independent of pipe size	Pádraig Brady	-18/+18
	* src/yes.c (splice_write): Always drain what we've written to an internal pipe, so there is no possibility of vmsplice() blocking. I.e., be defensive in the case that fcntl() fails, and our default buffer size (currently 16kiB) is larger than the pipe. https://github.com/coreutils/coreutils/issues/253
2026-04-16	build: fix build failure on AIX	Pádraig Brady	-1/+6
	* m4/jm-macros.m4: AIX has a splice() function for TCP, so check for vmsplice() instead. * src/splice.h: Define HAVE_SPLICE if vmsplice available. Reported by Bruno Haible.
2026-04-16	cksum: fix --length validation on 32 bit platforms	Pádraig Brady	-1/+1
	Fix an unreleased issue due to the recent change to using idx_t in commit v9.10-91-g02983e493 * src/cksum.c (main): Limit the possible return to the range supported by idx_t. Reported by Bruno Haible.
2026-04-15	df: improve detection of duplicate entries	Lukáš Zaoral	-2/+5
	Do not compare only with the latest entry for given device id but also all previously saved entries with the same id. * src/df.c (struct devlist): Add next_same_dev struct member. (filter_mount_list): Iterate over next_same_dev to find duplicates. * tests/df/skip-duplicates.sh: Add test cases. * NEWS: Mention the improvement. https://redhat.atlassian.net/browse/RHEL-5649
2026-04-11	mktemp: prefer rmdir and unlink to remove	Collin Funk	-1/+1
	This avoids the following behavior: $ strace -e silence=exit -e trace=unlink,rmdir \ mktemp -d > /dev/full unlink("/tmp/tmp.ZBuPmS9ZGD") = -1 EISDIR (Is a directory) rmdir("/tmp/tmp.ZBuPmS9ZGD") = 0 mktemp: write error: No space left on device In the above invocation we know that we created a directory, so we should not remove a regular file that must have been created by another process: $ strace -e silence=exit -e trace=unlink,rmdir \ ./src/mktemp -d > /dev/full rmdir("/tmp/tmp.hGbME1HmJr") = 0 mktemp: write error: No space left on device * src/mktemp.c (main): Prefer rmdir and unlink depending on whether we created a directory or regular file. * bootstrap.conf (gnulib_modules): Remove the remove module.
2026-04-10	cat: avoid redundant pipe creation and resizing	Collin Funk	-20/+28
	* src/cat.c (splice_cat): Don't bother resizing input as it generally doesn't help perf, and also save an fstat per input. Don't close the intermediate pipe once created, unless there is an error reading from it. Co-authored-by: Pádraig Brady <P@draigBrady.com>
2026-04-09	env: avoid locking standard output for each printed variable	Collin Funk	-3/+5
	* src/env.c (main): Use fputs and putchar instead of printf.
2026-04-09	printenv: avoid locking standard output for each printed variable	Collin Funk	-3/+6
	* src/printenv.c (main): Use fputs and putchar instead of printf.
2026-04-09	maint: remove last remaining assert()	Pádraig Brady	-1/+1
	* src/split.c (bytes_chunk_extract): Prefer affirm to assert, as it allows for better static checking when compiling with -DNDEBUG.
2026-04-09	maint: cat: avoid coverity NULL dreference warning	Pádraig Brady	-0/+3
	* src/cat.c (ensure_buf_size): Affirm we won't return NULL;
2026-04-09	cat: avoid memory allocation per file	Pádraig Brady	-18/+32
	* src/cat.c (main): Only resize the allocated buffer when needed, which avoids per file heap manipulation and mmap/munmap syscalls.
2026-04-09	cat: fix splice() from empty input	Pádraig Brady	-0/+2
	* src/cat.c (splice_cat): Ensure we don't retry a read() after splice() completes, as this is significant on a tty.
2026-04-08	maint: touch: prefer timespec_cmp	Collin Funk	-4/+2
	* src/touch.c (main): Use timespec_cmp instead of comparing each member of the timespec.
2026-04-07	maint: cut: avoid discarded-qualifiers warnings	Pádraig Brady	-5/+5
	Seen on GCC 15.2.1 with GLIBC 2.43 on Arch Not seen on GCC 15.2.1 on GLIBC 2.42 on Fedora * src/cut.c (search_bytes): Cast the return from memchr() to avoid const propagation. (find_field_delim): Adjust the return from strstr() similarly. https://github.com/coreutils/coreutils/issues/244
2026-04-06	cat: use splice if operating on pipes or if copy_file_range fails	Collin Funk	-16/+166
	On a AMD Ryzen 7 3700X system: $ timeout 10 taskset 1 ./src/cat-prev /dev/zero \ \| taskset 2 pv -r > /dev/null [1.67GiB/s] $ timeout 10 taskset 1 ./src/cat /dev/zero \ \| taskset 2 pv -r > /dev/null [9.03GiB/s] On a Power10 system: $ taskset 1 ./src/yes \| timeout 10 taskset 2 ./src/cat-prev \ \| taskset 3 pv -r > /dev/null [12.9GiB/s] $ taskset 1 ./src/yes \| timeout 10 taskset 2 ./src/cat \ \| taskset 3 pv -r > /dev/null [81.8GiB/s] * NEWS: Mention the improvement. * src/cat.c: Include isapipe.h, splice.h, and unistd--.h. (splice_cat): New function. (main): Use it. * src/local.mk (noinst_HEADERS): Add src/splice.h. * src/splice.h: New file, based on definitions from src/yes.c. * src/yes.c: Include splice.h. (pipe_splice_size): Use increase_pipe_size from src/splice.h. (SPLICE_PIPE_SIZE): Remove definition, moved to src/splice.h. * tests/cat/splice.sh: New file, based on some tests in tests/misc/yes.sh. * tests/local.mk (all_tests): Add the new test.
2026-04-06	build: cut: fix compilation error on non C23 compilers	Pádraig Brady	-8/+10
	* src/cut.c (main): Add curly brackets around variable declaration in case label. Reported by Bruno Haible.
2026-04-06	maint: simplify c32issep	Paul Eggert	-1/+1
	* src/system.h (c32issep): Avoid unnecessary ‘!!’.
2026-04-06	maint: cut: refactor delimiter handling	Paul Eggert	-44/+14
	* src/cut.c: Use mcel_scanz() to parse in all cases, and avoid redundant storage of delimiter_length and the single byte delim.
2026-04-06	cut: -f: fix handling of multi-byte delimiters that span buffers	Pádraig Brady	-0/+25
	* src/cut.c (cut_fields_bytesearch): Ensure up to delim_bytes -1 is left for the next refill. * tests/cut/cut.pl: Add a test case.
2026-04-06	cut,fold,expand,unexpand: ensure we process all available characters	Pádraig Brady	-1/+1
	* gl/lib/mbbuf.h: Adjust mbbuf_fill() to process full characters in the slop at the end of a read(). Previously valid characters in the last MCEL_LEN_MAX bytes were ignored until the next read(). * src/cut.c (cut_fields_bytesearch): Adjust to the new naming. * NEWS: Mention the fold(1) responsiveness fix, which was improved with the change from fread() to read(), and completed with this patch.
2026-04-05	cut: -b: avoid function calls in hot loop	Pádraig Brady	-5/+19
	$ time LC_ALL=C src/cut-before -b1 sl.in >/dev/null real 0m0.115s $ time LC_ALL=C src/cut-after -b1 sl.in >/dev/null real 0m0.076s * src/cut.c (cut_bytes): Hoist the fileno() invariant outside the loop. Avoid memchr for very short lines. (search_bytes): Similar to copy_bytes() and write_bytes() helpers. Note adding code to probe 3 or 4 bytes resulted in worse register allocation. I.e. slower operation even if the input was only 2 bytes.
2026-04-05	cut: fix logic issue with field delim in last byte of buffer	Pádraig Brady	-89/+108
	With field delimiter = line delimiter we need to know if there is any more data to be read, as field delimiter in the last byte of the file is treated differently. So reiterate the loop to ensure enough read()s to make the appropriate determination.
2026-04-05	cut: ensure responsive input processing	Pádraig Brady	-57/+20
	* gl/lib/mbbuf.h (fill_buf): Switch from fread() to read() as the former retries read() internally to fill the buffer. * src/cut.c: Adjust accordingly, and avoid getc() interface entirely. * bootstrap.h: Depend explicitly on fseterr. This is already depended on transitively, so should not introduce new build portability issues.
2026-04-05	maint: cut: rename line_in to bytes_in	Pádraig Brady	-9/+9
	* src/cut.c: We're not reading a line, rather a buffer of bytes. Suggested by Collin Funk.
2026-04-05	cut: make the dependency on memchr2 explicit	Pádraig Brady	-1/+0
	* bootstrap.conf: Remove now unused getndelim2, add memchr2. * src/cut.c: Remove now unused getndelim2.h.
2026-04-05	cut: combine cut_bytes_no_split and cut_characters	Pádraig Brady	-37/+22
	per character based so merge.
2026-04-05	maint: cut: various code cleanups and comments	Pádraig Brady	-28/+26
	* src/cut.c: Document some functions, and remove extraneous abstractions.
2026-04-05	cut: support no delimiter match fast path with -s	Pádraig Brady	-11/+16
	* src/cut.c (cut_fields_bytesearch): Just skip the data with -s.
2026-04-05	maint: cut: cleanup context management for byte search	Pádraig Brady	-18/+22
	* src/cut.c: Hoist at_eof into context so we're not querying it multiple times. Also add a helper to explicitly init bytesearch_context.
2026-04-05	cut: optimize UTF-8 input with 0xF5-0xFF delimiters	Pádraig Brady	-1/+5
	* src/cut.c (bytesearch_field_delim_ok): Expand the range of bytes that can be simply searched for. 0xF5-0xFF can't appear in valid UTF-8 characters, and so may be used as delimiters in UTF-8 input, so it's worth optimizing for. * tests/cut/cut.pl: Add a test case (mainly as documentation).
2026-04-05	maint: cut: simplify mbbuf_fill	Pádraig Brady	-20/+2
	We can only byte search with uni-byte or utf-8. utf-8 implicitly can't false match a delimiter at buffer boundary. So don't worry about finding the exact utf8 boundary at end of buffer, rather just ensuring the buffer always starts with a valid character (by ensuring MCEL_LEN_MAX-1 moved to start of buffer on each refill).
2026-04-05	maint: refactor is_utf8_charset helper to system.h	Pádraig Brady	-28/+14
	* src/cut.c: Move from here. * src/numfmt.c: Likeise. * src/system.h: To here.
2026-04-05	cut: optimize per character memcpy	Pádraig Brady	-3/+16
	$ time src/cut-before -f10 -w ll.in >/dev/null real 0m4.309s $ time src/cut-after -f10 -w ll.in >/dev/null real 0m3.136s * src/cut.c (cut_bytes): Add a new helper that avoids the memcpy call in the common case of adding characters to a buffer.
2026-04-05	cut: refactor skip_line_remainder logic	Pádraig Brady	-25/+9
	Simplify and optimize field exhaustion logic: $ time LC_ALL=C src/cut-before -f1 -dc as.in >/dev/null real 0m0.057s $ time LC_ALL=C src/cut-after -f1 -dc as.in >/dev/null real 0m0.023s * src/cut.c (cut_fields_bytesearch): Refactor.
2026-04-05	maint: cut simplify cut_fields_bytesearch	Pádraig Brady	-50/+38
	refactor line delimiter output, and resetting of parse record state.
2026-04-05	cut: enable fast path for all delimiter lengths	Pádraig Brady	-2/+8
	1. Removed !have_pending_line from the fast path condition. This is safe because: - field_1_n_bytes == 0 already ensures we haven't started buffering field 1 content - The fast path correctly continues any pending partial line by writing raw bytes including the completing \n 2. Added have_pending_line = false after the fast path write, since all lines up to the last \n are now complete. $ time src/cut.before -f1 -dç sl.in >/dev/null real 0m0.081s $ time src/cut.after -f1 -dç sl.in >/dev/null real 0m0.012s $ time src/cut.before -f10 -dç sl.in >/dev/null real 0m0.081s $ time src/cut.after -f10 -dç sl.in >/dev/null real 0m0.012s
2026-04-05	cut: optimize -f with -d longer than lines	Pádraig Brady	-0/+3
	$ time src/cut.before -f1 -dç sl.in >/dev/null real 0m0.157s $ time src/cut.after -f1 -dç sl.in >/dev/null real 0m0.084s
2026-04-05	cut: optimize -b for short lines	Pádraig Brady	-18/+62
	For a 40% performance increase it's worth reinstating the simple original cut_bytes() which avoids data copying and function calls. Once a longer line is encountered we defer to the buffered variant. $ time src/cut.before -b2 sl.in >/dev/null real 0m0.101s $ time src/cut.after -b2 sl.in >/dev/null real 0m0.060s
2026-04-05	cut: optimize per character field scanning	Pádraig Brady	-33/+47
	Use specialized loops rather than branching per character, giving a 28% increase. $ time src/cut -f1 -w ll.in >/dev/null real 0m7.199s $ time src/cut -f1 -w ll.in >/dev/null real 0m5.204s
2026-04-05	cut: prefer c_isblank() to c32issep()	Pádraig Brady	-2/+10
	12% perf increase with: $ time src/cut -f2 -w ll.in >/dev/null real 0m6.469s $ time src/cut -f2 -w ll.in >/dev/null real 0m5.689s
2026-04-05	cut: avoid fwrite calls for smaller amounts of data	Pádraig Brady	-0/+10
	This is quite significant: yes abcdfeg \| head -n1MB > big-file $ time src/cut-before -b1,3 big-file >/dev/null real 0m0.050s $ time src/cut-after -b1,3 big-file >/dev/null real 0m0.029s
2026-04-05	cut: optimize -b by avoiding per byte iteration	Pádraig Brady	-19/+49
	Always memchr(line_delim) which is fast and allows: - skipping whole segments when the next selected byte is beyond them - skipping unselected prefixes in bulk - writing contiguous selected spans in bulk This wins for lines >= 4 characters, but is slower lines <= 3 characters, especially if selecting bytes 1-3. That is unusual though.
2026-04-05	cut: optimize when no delimiter in input	Pádraig Brady	-0/+25
	This is about 20x faster. Note we only do the delimiter search once per chunk, and it's usually quick as delimiters wouldn't be too far into the a chunk if present, so we don't bother to cache the found delimiter.
2026-04-05	cut: optimize -w for uni-byte case	Pádraig Brady	-5/+31
	* src/cut.c: Limit search to SPACE and TAB
2026-04-05	doc: cut: document the -w option	Pádraig Brady	-2/+2
	* src/cut.c (usage): Mention blank characters are used to separate. * doc/coreutils.texi (cut invocation): Likewise. Also describe the 'trimmed' argument and the relation to -F.
2026-04-05	cut: refactor find_bytesearch_field_terminator to be stateful	Pádraig Brady	-100/+43
	Allows better/simpler avoidance of repeated line/delim scans TODO: speed up our really slow cut_fields_mb_any. Compare for example: time src/cut -w -f1 ll.in >/dev/null #14s time src/cut -d, -f1 ll.in >/dev/null #.1s Could adjust so that LC_ALL=C does memchr2(space,tab) ?
2026-04-05	cut: avoid repeated searchs for line_delim in the multi-byte delim case	Pádraig Brady	-15/+53
	TODO: Refactor all this into find_bytesearch_field_terminator. Also handle in the delim_length==1 case.
2026-04-05	cut: refactor all byte search to find_bytesearch_field_terminator	Pádraig Brady	-7/+27
	TODO: Perhaps also add search only fields mode to avoid rescans of very long lines
2026-04-05	cut: optimize -f when finished processing fields for a line	Pádraig Brady	-0/+53
	TODO: simplify and compare perf
2026-04-05	cut: optimize -f for fhe common case of single byte delimiters	Pádraig Brady	-0/+44
	* TODO: perf comparison