summaryrefslogtreecommitdiffstats
path: root/xdiff
AgeCommit message (Collapse)AuthorLines
2026-02-20Merge branch 'pw/diff-anchored-optim'Junio C Hamano-12/+6
"git diff --anchored=<text>" has been optimized. * pw/diff-anchored-optim: diff --anchored: avoid checking unmatched lines
2026-02-20Merge branch 'pw/xdiff-cleanups'Junio C Hamano-9/+12
Small clean-up of xdiff library to remove unnecessary data duplication. * pw/xdiff-cleanups: xdiff: remove unused data from xdlclass_t xdiff: remove "line_hash" field from xrecord_t
2026-02-12diff --anchored: avoid checking unmatched linesPhillip Wood-12/+6
For a line to be an anchor it has to appear in each of the files being diffed exactly once. With that in mind lets delay checking whether a line is an anchor until we know there is exactly one instance of the line in each file. As each line is checked at most once, there is no need to cache the result of is_anchor() and we can drop that field from the hashmap entries. When diffing 5000 recent commits in git.git this gives a modest speedup of ~2%. In the (rather extreme) example below that consists largely of deletions the speedup is ~16%. seq 0 10000000 >old printf '%s\n' 300000 100000 200000 >new git diff --no-index --anchored=300000 old new Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26xdiff: remove unused data from xdlclass_tPhillip Wood-3/+5
Prior to commit 6d507bd41a (xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t, 2025-09-26) xdlclass_t carried a copy of all the fields in xrecord_t. That commit embedded xrecord_t in xdlclass_t to make it easier to change the types of the fields in xrecord_t. However commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18) added the "minimal_perfect_hash" field to xrecord_t which is not used by xdlclass_t. To avoid wasting space stop copying the whole of xrecord_t and just copy the pointer and length that we need to intern the line. Together with the previous commit this effectively reverts 6d507bd41a. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26xdiff: remove "line_hash" field from xrecord_tPhillip Wood-6/+7
Prior to commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18) the "ha" field of xrecord_t initially held the "line_hash" value and once the line had been interned that field was updated to hold the "minimal_perfect_hash". The "line_hash" is only used to intern the line so there is no point in storing it after all the input lines have been interned. Removing the "line_hash" field from xrecord_t and storing it in xdlclass_t where it is actually used makes it clearer that it is a temporary value and it should not be used once we're calculated the "minimal_perfect_hash". This also reduces the size of xrecord_t by 25% on 64-bit platforms and 40% on 32-bit platforms. While the struct is small we create one instance per input line so any saving is welcome. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-09Merge branch 'yc/xdiff-patience-optim'Junio C Hamano-1/+4
The way patience diff finds LCS has been optimized. * yc/xdiff-patience-optim: xdiff: optimize patience diff's LCS search
2025-12-05Merge branch 'en/xdiff-cleanup-2'Junio C Hamano-109/+109
Code clean-up. * en/xdiff-cleanup-2: xdiff: rename rindex -> reference_index xdiff: change rindex from long to size_t in xdfile_t xdiff: make xdfile_t.nreff a size_t instead of long xdiff: make xdfile_t.nrec a size_t instead of long xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash xdiff: use unambiguous types in xdl_hash_record() xdiff: use size_t for xrecord_t.size xdiff: make xrecord_t.ptr a uint8_t instead of char xdiff: use ptrdiff_t for dstart/dend doc: define unambiguous type mappings across C and Rust
2025-11-27xdiff: optimize patience diff's LCS searchYee Cheng Chin-1/+4
The find_longest_common_sequence() function in patience diff is inefficient as it calls binary_search() for every unique line it encounters when deciding where to put it in the sequence. From instrumentation (using xctrace) on popular repositories, binary_search() takes up 50-60% of the run time within patience_diff() when performing a diff. To optimize this, add a boundary condition check before binary_search() is called to see if the encountered unique line is located after the entire currently tracked longest subsequence. If so, skip the unnecessary binary search and simply append the entry to the end of sequence. Given that most files compared in a diff are usually quite similar to each other, this condition is very common, and should be hit much more frequently than the binary search. Below are some end-to-end performance results by timing `git log --shortstat --oneline -500 --patience` on different repositories with the old and new code. Generally speaking this seems to give at least 8-10% speed up. The "binary search hit %" column describes how often the algorithm enters the binary search path instead of the new faster path. Even in the WebKit case we can see that it's quite rare (1.46%). | Repo | Speed difference | binary search hit % | |----------|------------------|---------------------| | vim | 1.27x | 0.01% | | pytorch | 1.16x | 0.02% | | cpython | 1.14x | 0.06% | | ripgrep | 1.14x | 0.03% | | git | 1.13x | 0.12% | | vscode | 1.09x | 0.10% | | WebKit | 1.08x | 1.46% | The benchmarks were done using hyperfine, on an Apple M1 Max laptop, with git compiled with `-O3 -flto`. Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: rename rindex -> reference_indexEzekiel Newren-9/+9
The classic diff adds only the lines that it's going to consider, during the diff, to an array. A mapping between the compacted array, and the lines of the file that they reference, is facilitated by this array. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: change rindex from long to size_t in xdfile_tEzekiel Newren-1/+1
The field rindex describes an index offset for other arrays. Change it to size_t. Changing the type of rindex from long to size_t has no cascading refactor impact because it is only ever used to directly index other arrays. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: make xdfile_t.nreff a size_t instead of longEzekiel Newren-8/+8
size_t is used because nreff describes the number of elements in memory for rindex. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: make xdfile_t.nrec a size_t instead of longEzekiel Newren-26/+26
size_t is used because nrec describes the number of elements for both recs, and for 'changed' + 2. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hashEzekiel Newren-20/+21
The ha field is serving two different purposes, which makes the code harder to read. At first glance, it looks like many places assume there could never be hash collisions between lines of the two input files. In reality, line_hash is used together with xdl_recmatch() to ensure correct comparisons of lines, even when collisions occur. To make this clearer, the old ha field has been split: * line_hash: a straightforward hash of a line, independent of any external context. Its type is uint64_t, as it comes from a fixed width hash function. * minimal_perfect_hash: Not a new concept, but now a separate field. It comes from the classifier's general-purpose hash table, which assigns each line a unique and minimal hash across the two files. A size_t is used here because it's meant to be used to index an array. This also avoids ` as usize` casts on the Rust side when using it to index a slice. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: use unambiguous types in xdl_hash_record()Ezekiel Newren-20/+20
Convert the function signature and body to use unambiguous types. char is changed to uint8_t because this function processes bytes in memory. unsigned long to uint64_t so that the hash output is consistent across platforms. `flags` was changed from long to uint64_t to ensure the high order bits are not dropped on platforms that treat long as 32 bits. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: use size_t for xrecord_t.sizeEzekiel Newren-20/+19
size_t is the appropriate type because size is describing the number of elements, bytes in this case, in memory. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: make xrecord_t.ptr a uint8_t instead of charEzekiel Newren-21/+21
Make xrecord_t.ptr uint8_t because it's referring to bytes in memory. In order to avoid a refactor avalanche, many uses of this field were cast to char* or similar. Places where casting was unnecessary: xemit.c:156 xmerge.c:124 xmerge.c:127 xmerge.c:164 xmerge.c:169 xmerge.c:172 xmerge.c:178 Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: use ptrdiff_t for dstart/dendEzekiel Newren-1/+1
ptrdiff_t is appropriate for dstart and dend because they both describe positive or negative offsets relative to a pointer. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASKAntonin Delpeuch-1/+1
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience and histogram diffs, not for the minimal one. This means that when reseting the diff algorithm to the default one, one needs to separately clear the bit for the minimal diff. There are places in the code that fail to do that: merge-ort.c and builtin/merge-file.c. Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate clearing of this bit in the places where it hasn't been forgotten. Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-14Merge branch 'en/xdiff-cleanup'Junio C Hamano-337/+269
A lot of code clean-up of xdiff. Split out of a larger topic. * en/xdiff-cleanup: xdiff: change type of xdfile_t.changed from char to bool xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c xdiff: rename rchg -> changed in xdfile_t xdiff: delete chastore from xdfile_t xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t xdiff: delete redundant array xdfile_t.ha xdiff: delete struct diffdata_t xdiff: delete local variables that alias fields in xrecord_t xdiff: delete superfluous function xdl_get_rec() in xemit xdiff: delete unnecessary fields from xrecord_t and xdfile_t xdiff: delete local variables and initialize/free xdfile_t directly xdiff: delete static forward declarations in xprepare
2025-10-03xdiff: change type of xdfile_t.changed from char to boolEzekiel Newren-22/+22
The only values possible for 'changed' is 1 and 0, which exactly maps to a bool type. It might not look like this because action1 and action2 (which use to be dis1, and dis2) were also of type char and were assigned numerical values within a few lines of 'changed' (what used to be rchg). Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false for changed[k] makes it clear to future readers that these are logically separate concepts. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-03xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.cEzekiel Newren-37/+69
This commit is refactor-only; no behavior is changed. A future commit will use bool literals for changed[i]. The functions xdl_clean_mmatch() and xdl_cleanup_records() will be cleaned up more in a future patch series. The changes to xdl_cleanup_records(), in this patch, are just to make it clear why `char rchg` is refactored to `bool changed`. Rename dis* to action* and replace literal numericals with macros. The old names came from when dis* (which I think was short for discard) was treated like a boolean, but over time it grew into a ternary state machine. The result was confusing because dis* and rchg* both used 0/1 values with different meanings. The new names and macros make the states explicit. nm is short for number of matches, and mlim is a heuristic limit: nm == 0 -> action[i] = DISCARD -> changed[i] = true 0 < nm < mlim -> action[i] = KEEP -> changed[i] = false nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch() When need_min is true, only DISCARD and KEEP occur because the limit is effectively infinite. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: rename rchg -> changed in xdfile_tEzekiel Newren-32/+32
The field rchg (now 'changed') declares if a line in a file is changed or not. A later commit will change it's type from 'char' to 'bool' to make its purpose even more clear. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete chastore from xdfile_tEzekiel Newren-69/+63
xdfile_t currently uses chastore_t which is an arena allocator. I think that xrecord_t used to be a linked list and recs didn't exist originally. When recs was added I think they forgot to remove xdfile_t.next, but was overlooked. This dual data structure setup makes the code somewhat confusing. Additionally the C type chastore_t isn't FFI friendly, and provides little to no performance benefit over using realloc to grow an array. Performance impact of deleting fields from xdfile_t: Deleting ha is about 5% slower. Deleting cha is about 5% faster. Delete ha, but keep cha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s] Range (min … max): 1.249 s … 1.286 s 10 runs Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s] Range (min … max): 1.320 s … 1.358 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete cha, but keep ha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s] Range (min … max): 1.288 s … 1.292 s 10 runs Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s] Range (min … max): 1.205 s … 1.249 s 10 runs Summary build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete ha AND chastore time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s] Range (min … max): 1.287 s … 1.295 s 10 runs Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s] Range (min … max): 1.305 s … 1.308 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_tEzekiel Newren-10/+4
The fields from xdlclass_t are aliases of xrecord_t: xdlclass_t.line -> xrecord_t.ptr xdlclass_t.size -> xrecord_t.size xdlclass_t.ha -> xrecord_t.ha xdlclass_t carries a copy of the data in xrecord_t, but instead of embedding xrecord_t it duplicates the individual fields. A future commit will change the types used in xrecord_t so embed it in xdlclass_t first, so we don't have to remember to change the types here as well. Best-viewed-with: --color-words Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete redundant array xdfile_t.haEzekiel Newren-21/+16
When 0 <= i < xdfile_t.nreff the following is true: xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]] This makes the code about 5% slower. The fields rindex and ha are specific to the classic diff (myers and minimal). I plan on creating a struct for classic diff, but there's a lot of cleanup that needs to be done before that can happen and leaving ha in would make those cleanups harder to follow. A subsequent commit will delete the chastore cha from xdfile_t. That later commit will investigate deleting ha and cha independently and together. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete struct diffdata_tEzekiel Newren-33/+10
Every field in this struct is an alias for a certain field in xdfile_t. diffdata_t.nrec -> xdfile_t.nreff diffdata_t.ha -> xdfile_t.ha diffdata_t.rindex -> xdfile_t.rindex diffdata_t.rchg -> xdfile_t.rchg I think this struct existed before xdfile_t, and was kept for backward compatibility reasons. I think xdiffi should have been refactored to use the new (xdfile_t) struct, but was easier to alias it instead. The local variables rchg* and rindex* don't shorten the lines by much, nor do they really need to be there to make the code more readable. Delete them. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete local variables that alias fields in xrecord_tEzekiel Newren-16/+13
Use the type xrecord_t as the local variable for the functions in the file xdiff/xemit.c. Most places directly reference the fields inside of this struct, doing that here makes it more consistent with the rest of the code. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete superfluous function xdl_get_rec() in xemitEzekiel Newren-16/+7
When xrecord_t was a linked list, and recs didn't exist, I assume this function walked the list until it found the right record. Accessing a contiguous array is so trivial that this function is now superfluous. Delete it. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-26xdiff: delete unnecessary fields from xrecord_t and xdfile_tEzekiel Newren-16/+2
xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized, but never used for anything by the code. Remove them. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-26xdiff: delete local variables and initialize/free xdfile_t directlyEzekiel Newren-48/+30
These local variables are essentially a hand-rolled additional implementation of xdl_free_ctx() inlined into xdl_prepare_ctx(). Modify the code to use the existing xdl_free_ctx() function so there aren't two ways to free such variables. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-26xdiff: delete static forward declarations in xprepareEzekiel Newren-66/+50
Move xdl_prepare_env() later in the file to avoid the need for static forward declarations. Best-viewed-with: --color-moved Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-18xdiff: optimize xdl_hash_record_verbatimAlexander Monakov-4/+55
xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD for combining. The ADD-based variant is used as the basis of the modern ("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an optimized version of this hash function thanks to Noah Goldstein [1]. Switch xdl_hash_record_verbatim to additive hashing and implement an optimized loop following the scheme suggested by Noah. Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got version | cycles, bn | instructions, bn --------------------------------------- A 6.38 11.3 B 6.21 10.89 C 5.80 9.95 D 5.83 8.74 --------------------------------------- A: baseline (git master at e4ef0485fd78) B: plus 'xdiff: refactor xdl_hash_record()' C: and plus this patch D: with 'xdiff: use xxhash' by Phillip Wood The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x. [1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/ Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-28xdiff: refactor xdl_hash_record()Phillip Wood-6/+11
Inline the check for whitespace flags so that the compiler can hoist it out of the loop in xdl_prepare_ctx(). This improves the performance by 8%. $ hyperfine --warmup=1 -L rev HEAD,HEAD^ --setup='git checkout {rev} -- :/ && make git' ': {rev}; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0' Benchmark 1: : HEAD; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0 Time (mean ± σ): 1.670 s ± 0.044 s [User: 1.473 s, System: 0.196 s] Range (min … max): 1.619 s … 1.754 s 10 runs Benchmark 2: : HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0 Time (mean ± σ): 1.801 s ± 0.021 s [User: 1.605 s, System: 0.192 s] Range (min … max): 1.766 s … 1.831 s 10 runs Summary ': HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0' ran 1.08 ± 0.03 times faster than ': HEAD^^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0' Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29xdiff: disable cleanup_records heuristic with --minimalNiels Glodny-2/+3
The cleanup_records function marks some lines as changed before running the actual diff algorithm. For most lines, this is a good performance optimization, but it also marks lines that are surrounded by many changed lines as changed as well. This can cause redundant changes and longer-than-necessary diffs. Whether this results in better-looking diffs is subjective. However, the --minimal flag explicitly requests the shortest possible diff. The change results in shorter diffs in about 1.3% of all diffs in Git's history. Performance wise, I have measured the impact on "git log -p -3000 --minimal > /dev/null". With this change, I get Time (mean ± σ): 2.363 s ± 0.023 s (25 runs) and without this patch I measured Time (mean ± σ): 2.362 s ± 0.035 s (25 runs). As the difference is well within the margin of error, this does not seem to have an impact on performance. Signed-off-by: Niels Glodny <n.glodny@campus.lmu.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15Merge branch 'js/comma-semicolon-confusion'Junio C Hamano-4/+8
Code clean-up. * js/comma-semicolon-confusion: detect-compiler: detect clang even if it found CUDA clang: warn when the comma operator is used compat/regex: explicitly mark intentional use of the comma operator wildmatch: avoid using of the comma operator diff-delta: avoid using the comma operator xdiff: avoid using the comma operator unnecessarily clar: avoid using the comma operator unnecessarily kwset: avoid using the comma operator unnecessarily rebase: avoid using the comma operator unnecessarily remote-curl: avoid using the comma operator unnecessarily
2025-03-29Merge branch 'rs/xdiff-context-length-fix'Junio C Hamano-1/+7
The xdiff code on 32-bit platform misbehaved when an insanely large context size is given, which has been corrected. * rs/xdiff-context-length-fix: xdiff: avoid arithmetic overflow in xdl_get_hunk()
2025-03-28xdiff: avoid using the comma operator unnecessarilyJohannes Schindelin-4/+8
The comma operator is a somewhat obscure C feature that is often used by mistake and can even cause unintentional code flow. While the code in this patch used the comma operator intentionally (to avoid curly brackets around two statements, each, that want to be guarded by a condition), it is better to surround it with curly brackets and to use a semicolon instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-14xdiff: avoid arithmetic overflow in xdl_get_hunk()René Scharfe-1/+7
xdl_get_hunk() calculates the maximum number of common lines between two changes that would fit into the same hunk for the given context options. It involves doubling and addition and thus can overflow if the terms are huge. The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while the type of the corresponding context and interhunkcontext in struct diff_options is int. On many platforms longs are bigger that ints, which prevents the overflow. On Windows they have the same range and the overflow manifests as hunks that are split erroneously and lines being repeated between them. Fix the overflow by checking and not going beyond LONG_MAX. This allows specifying a huge context line count and getting all lines of a changed files in a single hunk, as expected. Reported-by: Jason Cho <jason11choca@proton.me> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03xdiff: *.txt -> *.adoc fixesTodd Zullinger-2/+2
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12xdiff: avoid signed vs. unsigned comparisons in xutils.cDavid Aguilar-4/+2
The comparisons all involve comparisons against unsigned values. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12xdiff: avoid signed vs. unsigned comparisons in xpatience.cDavid Aguilar-3/+1
The loop iteration variable is non-negative and used in comparisons against a size_t value. Use size_t to eliminate the mismatch. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12xdiff: avoid signed vs. unsigned comparisons in xhistogram.cDavid Aguilar-6/+4
The comparisons all involve unsigned variables. Cast the comparison to unsigned to eliminate the mismatch. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12xdiff: avoid signed vs. unsigned comparisons in xemit.cDavid Aguilar-3/+1
The unsigned `ignored` variable causes expressions to promote to unsigned. Use a signed value to make comparisons use the same types. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12xdiff: avoid signed vs. unsigned comparisons in xdiffi.cDavid Aguilar-2/+1
The loop iteration variable is non-negative and only used in comparisons against other size_t values. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12xdiff: move sign comparison warning guard into each fileDavid Aguilar-2/+9
Allow each file to fix the warnings guarded by the macro separately by moving the definition from the shared xinclude.h into each file that needs it. xmerge.c and xprepare.c do not contain any signed vs. unsigned comparisons so the definition was not included in these files. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt-0/+3
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13xdiff: mark unused parameter in xdl_call_hunk_func()Jeff King-1/+1
This function is used interchangeably with xdl_emit via a function pointer, so we can't just drop the unused parameter. Mark it to silence -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13xdiff: drop unused parameter in def_ff()Jeff King-2/+2
The def_ff() function is the default "find_func" for finding hunk headers. It has never used its "priv" argument since it was introduced in f258475a6e (Per-path attribute based hunk header selection., 2007-07-06). But back then we used a function pointer to switch between a caller-provided function and the default, so the two had to conform to the same interface. In ff2981f724 (xdiff: factor out match_func_rec(), 2016-05-28), that pointer indirection went away in favor of code which directly calls either of the two functions. So there's no need for def_ff() to retain this unused parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-20xdiff: drop unused mmfile parameters from xdl_do_patience_diff()Jeff King-19/+9
The entry point to the patience-diff algorithm takes two mmfile_t structs with the original file contents, but it doesn't actually do anything useful with them. This is similar to the case recently cleaned up in the histogram code via f1d019071e (xdiff: drop unused mmfile parameters from xdl_do_histogram_diff(), 2022-08-19), but there's a bit more subtlety going on. We pass them into the recursive patience_diff(), which in turn passes them into fill_hashmap(), which stuffs the pointers into a struct. But the only thing which reads the struct fields is our recursion into patience_diff()! So it's unlikely that something like -Wunused-parameter could find this case: it would have to detect the circular dependency caused by the recursion (not to mention tracing across struct field assignments). But once found, it's easy to have the compiler confirm what's going on: 1. Drop the "file1" and "file2" fields from the hashmap struct definition. Remove the assignments in fill_hashmap(), and temporarily substitute NULL in the recursive call to patience_diff(). Compiling shows that no other code touched those fields. 2. Now fill_hashmap() will trigger -Wunused-parameter. Drop "file1" and "file2" from its definition and callsite. 3. Now patience_diff() will trigger -Wunused-parameter. Drop them there, too. One of the callsites is the recursion with our NULL values, so those temporary values go away. 4. Now xdl_do_patience_diff() will trigger -Wunused-parameter. Drop them there. And we're done. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19xdiff: drop unused mmfile parameters from xdl_do_histogram_diff()Jeff King-5/+3
These are no longer used since 9df0fc3d57 (xdiff: fix a memory leak, 2022-02-16), as the caller is expected to call xdl_prepare_env() itself. After that change the histogram code only examines the prepared xdfenv_t, not the original buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>