| Age | Commit message (Collapse) | Author | Lines |
|
* src/yes.c (splice_write): Always drain what we've written
to an internal pipe, so there is no possibility of vmsplice() blocking.
I.e., be defensive in the case that fcntl() fails, and
our default buffer size (currently 16kiB) is larger than the pipe.
https://github.com/coreutils/coreutils/issues/253
|
|
* m4/jm-macros.m4: AIX has a splice() function for TCP,
so check for vmsplice() instead.
* src/splice.h: Define HAVE_SPLICE if vmsplice available.
Reported by Bruno Haible.
|
|
Fix an unreleased issue due to the recent change
to using idx_t in commit v9.10-91-g02983e493
* src/cksum.c (main): Limit the possible return to
the range supported by idx_t.
Reported by Bruno Haible.
|
|
Do not compare only with the latest entry for given device id but also
all previously saved entries with the same id.
* src/df.c (struct devlist): Add next_same_dev struct member.
(filter_mount_list): Iterate over next_same_dev to find duplicates.
* tests/df/skip-duplicates.sh: Add test cases.
* NEWS: Mention the improvement.
https://redhat.atlassian.net/browse/RHEL-5649
|
|
This avoids the following behavior:
$ strace -e silence=exit -e trace=unlink,rmdir \
mktemp -d > /dev/full
unlink("/tmp/tmp.ZBuPmS9ZGD") = -1 EISDIR (Is a directory)
rmdir("/tmp/tmp.ZBuPmS9ZGD") = 0
mktemp: write error: No space left on device
In the above invocation we know that we created a directory, so we
should not remove a regular file that must have been created by another
process:
$ strace -e silence=exit -e trace=unlink,rmdir \
./src/mktemp -d > /dev/full
rmdir("/tmp/tmp.hGbME1HmJr") = 0
mktemp: write error: No space left on device
* src/mktemp.c (main): Prefer rmdir and unlink depending on whether we
created a directory or regular file.
* bootstrap.conf (gnulib_modules): Remove the remove module.
|
|
* src/cat.c (splice_cat): Don't bother resizing input as it generally
doesn't help perf, and also save an fstat per input. Don't close the
intermediate pipe once created, unless there is an error reading from
it.
Co-authored-by: Pádraig Brady <P@draigBrady.com>
|
|
* src/env.c (main): Use fputs and putchar instead of printf.
|
|
* src/printenv.c (main): Use fputs and putchar instead of printf.
|
|
* src/split.c (bytes_chunk_extract): Prefer affirm to assert,
as it allows for better static checking when compiling with -DNDEBUG.
|
|
* src/cat.c (ensure_buf_size): Affirm we won't return NULL;
|
|
* src/cat.c (main): Only resize the allocated buffer when needed,
which avoids per file heap manipulation and mmap/munmap syscalls.
|
|
* src/cat.c (splice_cat): Ensure we don't retry a read() after
splice() completes, as this is significant on a tty.
|
|
* src/touch.c (main): Use timespec_cmp instead of comparing each member
of the timespec.
|
|
Seen on GCC 15.2.1 with GLIBC 2.43 on Arch
Not seen on GCC 15.2.1 on GLIBC 2.42 on Fedora
* src/cut.c (search_bytes): Cast the return from memchr()
to avoid const propagation.
(find_field_delim): Adjust the return from strstr() similarly.
https://github.com/coreutils/coreutils/issues/244
|
|
On a AMD Ryzen 7 3700X system:
$ timeout 10 taskset 1 ./src/cat-prev /dev/zero \
| taskset 2 pv -r > /dev/null
[1.67GiB/s]
$ timeout 10 taskset 1 ./src/cat /dev/zero \
| taskset 2 pv -r > /dev/null
[9.03GiB/s]
On a Power10 system:
$ taskset 1 ./src/yes | timeout 10 taskset 2 ./src/cat-prev \
| taskset 3 pv -r > /dev/null
[12.9GiB/s]
$ taskset 1 ./src/yes | timeout 10 taskset 2 ./src/cat \
| taskset 3 pv -r > /dev/null
[81.8GiB/s]
* NEWS: Mention the improvement.
* src/cat.c: Include isapipe.h, splice.h, and unistd--.h.
(splice_cat): New function.
(main): Use it.
* src/local.mk (noinst_HEADERS): Add src/splice.h.
* src/splice.h: New file, based on definitions from src/yes.c.
* src/yes.c: Include splice.h.
(pipe_splice_size): Use increase_pipe_size from src/splice.h.
(SPLICE_PIPE_SIZE): Remove definition, moved to src/splice.h.
* tests/cat/splice.sh: New file, based on some tests in
tests/misc/yes.sh.
* tests/local.mk (all_tests): Add the new test.
|
|
* src/cut.c (main): Add curly brackets around variable
declaration in case label.
Reported by Bruno Haible.
|
|
* src/system.h (c32issep): Avoid unnecessary ‘!!’.
|
|
* src/cut.c: Use mcel_scanz() to parse in all cases,
and avoid redundant storage of delimiter_length and
the single byte delim.
|
|
* src/cut.c (cut_fields_bytesearch): Ensure up to delim_bytes -1
is left for the next refill.
* tests/cut/cut.pl: Add a test case.
|
|
* gl/lib/mbbuf.h: Adjust mbbuf_fill() to process full characters
in the slop at the end of a read(). Previously valid characters
in the last MCEL_LEN_MAX bytes were ignored until the next read().
* src/cut.c (cut_fields_bytesearch): Adjust to the new naming.
* NEWS: Mention the fold(1) responsiveness fix, which was
improved with the change from fread() to read(),
and completed with this patch.
|
|
$ time LC_ALL=C src/cut-before -b1 sl.in >/dev/null
real 0m0.115s
$ time LC_ALL=C src/cut-after -b1 sl.in >/dev/null
real 0m0.076s
* src/cut.c (cut_bytes): Hoist the fileno() invariant outside the loop.
Avoid memchr for very short lines.
(search_bytes): Similar to copy_bytes() and write_bytes() helpers.
Note adding code to probe 3 or 4 bytes resulted in worse register
allocation. I.e. slower operation even if the input was only 2 bytes.
|
|
With field delimiter = line delimiter we need to know
if there is any more data to be read, as field delimiter
in the last byte of the file is treated differently.
So reiterate the loop to ensure enough read()s to make
the appropriate determination.
|
|
* gl/lib/mbbuf.h (fill_buf): Switch from fread() to read()
as the former retries read() internally to fill the buffer.
* src/cut.c: Adjust accordingly, and avoid getc() interface entirely.
* bootstrap.h: Depend explicitly on fseterr. This is already depended
on transitively, so should not introduce new build portability issues.
|
|
* src/cut.c: We're not reading a line, rather a buffer of bytes.
Suggested by Collin Funk.
|
|
* bootstrap.conf: Remove now unused getndelim2, add memchr2.
* src/cut.c: Remove now unused getndelim2.h.
|
|
per character based so merge.
|
|
* src/cut.c: Document some functions, and remove extraneous
abstractions.
|
|
* src/cut.c (cut_fields_bytesearch): Just skip the data with -s.
|
|
* src/cut.c: Hoist at_eof into context so we're not
querying it multiple times. Also add a helper
to explicitly init bytesearch_context.
|
|
* src/cut.c (bytesearch_field_delim_ok): Expand the range
of bytes that can be simply searched for. 0xF5-0xFF can't
appear in valid UTF-8 characters, and so may be used as
delimiters in UTF-8 input, so it's worth optimizing for.
* tests/cut/cut.pl: Add a test case (mainly as documentation).
|
|
We can only byte search with uni-byte or utf-8.
utf-8 implicitly can't false match a delimiter at buffer boundary.
So don't worry about finding the exact utf8 boundary at end of buffer,
rather just ensuring the buffer always starts with a valid character
(by ensuring MCEL_LEN_MAX-1 moved to start of buffer on each refill).
|
|
* src/cut.c: Move from here.
* src/numfmt.c: Likeise.
* src/system.h: To here.
|
|
$ time src/cut-before -f10 -w ll.in >/dev/null
real 0m4.309s
$ time src/cut-after -f10 -w ll.in >/dev/null
real 0m3.136s
* src/cut.c (cut_bytes): Add a new helper that avoids
the memcpy call in the common case of adding characters to a buffer.
|
|
Simplify and optimize field exhaustion logic:
$ time LC_ALL=C src/cut-before -f1 -dc as.in >/dev/null
real 0m0.057s
$ time LC_ALL=C src/cut-after -f1 -dc as.in >/dev/null
real 0m0.023s
* src/cut.c (cut_fields_bytesearch): Refactor.
|
|
refactor line delimiter output,
and resetting of parse record state.
|
|
1. Removed !have_pending_line from the fast path condition.
This is safe because:
- field_1_n_bytes == 0 already ensures we haven't started
buffering field 1 content
- The fast path correctly continues any pending partial line
by writing raw bytes including the completing \n
2. Added have_pending_line = false after the fast path write,
since all lines up to the last \n are now complete.
$ time src/cut.before -f1 -dç sl.in >/dev/null
real 0m0.081s
$ time src/cut.after -f1 -dç sl.in >/dev/null
real 0m0.012s
$ time src/cut.before -f10 -dç sl.in >/dev/null
real 0m0.081s
$ time src/cut.after -f10 -dç sl.in >/dev/null
real 0m0.012s
|
|
$ time src/cut.before -f1 -dç sl.in >/dev/null
real 0m0.157s
$ time src/cut.after -f1 -dç sl.in >/dev/null
real 0m0.084s
|
|
For a 40% performance increase it's worth reinstating the simple
original cut_bytes() which avoids data copying and function calls.
Once a longer line is encountered we defer to the buffered variant.
$ time src/cut.before -b2 sl.in >/dev/null
real 0m0.101s
$ time src/cut.after -b2 sl.in >/dev/null
real 0m0.060s
|
|
Use specialized loops rather than branching per character,
giving a 28% increase.
$ time src/cut -f1 -w ll.in >/dev/null
real 0m7.199s
$ time src/cut -f1 -w ll.in >/dev/null
real 0m5.204s
|
|
12% perf increase with:
$ time src/cut -f2 -w ll.in >/dev/null
real 0m6.469s
$ time src/cut -f2 -w ll.in >/dev/null
real 0m5.689s
|
|
This is quite significant:
yes abcdfeg | head -n1MB > big-file
$ time src/cut-before -b1,3 big-file >/dev/null
real 0m0.050s
$ time src/cut-after -b1,3 big-file >/dev/null
real 0m0.029s
|
|
Always memchr(line_delim) which is fast and allows:
- skipping whole segments when the next selected byte is beyond them
- skipping unselected prefixes in bulk
- writing contiguous selected spans in bulk
This wins for lines >= 4 characters,
but is slower lines <= 3 characters, especially if selecting bytes 1-3.
That is unusual though.
|
|
This is about 20x faster.
Note we only do the delimiter search once per chunk,
and it's usually quick as delimiters wouldn't be too far
into the a chunk if present, so we don't bother
to cache the found delimiter.
|
|
* src/cut.c: Limit search to SPACE and TAB
|
|
* src/cut.c (usage): Mention blank characters are used to separate.
* doc/coreutils.texi (cut invocation): Likewise. Also describe
the 'trimmed' argument and the relation to -F.
|
|
Allows better/simpler avoidance of repeated line/delim scans
TODO: speed up our really slow cut_fields_mb_any.
Compare for example:
time src/cut -w -f1 ll.in >/dev/null #14s
time src/cut -d, -f1 ll.in >/dev/null #.1s
Could adjust so that LC_ALL=C does memchr2(space,tab) ?
|
|
TODO: Refactor all this into find_bytesearch_field_terminator.
Also handle in the delim_length==1 case.
|
|
TODO: Perhaps also add search only fields mode
to avoid rescans of very long lines
|
|
TODO: simplify and compare perf
|
|
* TODO: perf comparison
|