<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/xdiff, branch master</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=master</id>
<link rel='self' href='https://git.shady.money/git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2026-02-20T19:36:18Z</updated>
<entry>
<title>Merge branch 'pw/diff-anchored-optim'</title>
<updated>2026-02-20T19:36:18Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-02-20T19:36:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8ceb69f85d0b68f5558c8c72ca104036523644fa'/>
<id>urn:sha1:8ceb69f85d0b68f5558c8c72ca104036523644fa</id>
<content type='text'>
"git diff --anchored=&lt;text&gt;" has been optimized.

* pw/diff-anchored-optim:
  diff --anchored: avoid checking unmatched lines
</content>
</entry>
<entry>
<title>Merge branch 'pw/xdiff-cleanups'</title>
<updated>2026-02-20T19:36:17Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-02-20T19:36:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5465d3683a97a950358152925204f16b98739fad'/>
<id>urn:sha1:5465d3683a97a950358152925204f16b98739fad</id>
<content type='text'>
Small clean-up of xdiff library to remove unnecessary data
duplication.

* pw/xdiff-cleanups:
  xdiff: remove unused data from xdlclass_t
  xdiff: remove "line_hash" field from xrecord_t
</content>
</entry>
<entry>
<title>diff --anchored: avoid checking unmatched lines</title>
<updated>2026-02-12T17:28:49Z</updated>
<author>
<name>Phillip Wood</name>
<email>phillip.wood@dunelm.org.uk</email>
</author>
<published>2026-02-12T15:53:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dd2a4c0c7acb588abbf3c3a39ca755ce8aeee3b0'/>
<id>urn:sha1:dd2a4c0c7acb588abbf3c3a39ca755ce8aeee3b0</id>
<content type='text'>
For a line to be an anchor it has to appear in each of the files being
diffed exactly once. With that in mind lets delay checking whether
a line is an anchor until we know there is exactly one instance of
the line in each file. As each line is checked at most once, there
is no need to cache the result of is_anchor() and we can drop that
field from the hashmap entries. When diffing 5000 recent commits in
git.git this gives a modest speedup of ~2%. In the (rather extreme)
example below that consists largely of deletions the speedup is ~16%.

    seq 0 10000000 &gt;old
    printf '%s\n' 300000 100000 200000 &gt;new
    git diff --no-index --anchored=300000 old new

Signed-off-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>xdiff: remove unused data from xdlclass_t</title>
<updated>2026-01-26T16:38:29Z</updated>
<author>
<name>Phillip Wood</name>
<email>phillip.wood@dunelm.org.uk</email>
</author>
<published>2026-01-26T10:48:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5086213bd2f44fdc793fd8a081fd1c40a3267c44'/>
<id>urn:sha1:5086213bd2f44fdc793fd8a081fd1c40a3267c44</id>
<content type='text'>
Prior to commit 6d507bd41a (xdiff: delete fields ha, line, size
in xdlclass_t in favor of an xrecord_t, 2025-09-26) xdlclass_t
carried a copy of all the fields in xrecord_t. That commit embedded
xrecord_t in xdlclass_t to make it easier to change the types of
the fields in xrecord_t. However commit 6a26019c81 (xdiff: split
xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18)
added the "minimal_perfect_hash" field to xrecord_t which is not
used by xdlclass_t. To avoid wasting space stop copying the whole
of xrecord_t and just copy the pointer and length that we need to
intern the line. Together with the previous commit this effectively
reverts 6d507bd41a.

Signed-off-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>xdiff: remove "line_hash" field from xrecord_t</title>
<updated>2026-01-26T16:38:29Z</updated>
<author>
<name>Phillip Wood</name>
<email>phillip.wood@dunelm.org.uk</email>
</author>
<published>2026-01-26T10:48:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c27afcbfd0f440f410758432e2fe11a16fb2b360'/>
<id>urn:sha1:c27afcbfd0f440f410758432e2fe11a16fb2b360</id>
<content type='text'>
Prior to commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash
and minimal_perfect_hash, 2025-11-18) the "ha" field of xrecord_t
initially held the "line_hash" value and once the line had been
interned that field was updated to hold the "minimal_perfect_hash". The
"line_hash" is only used to intern the line so there is no point in
storing it after all the input lines have been interned.

Removing the "line_hash" field from xrecord_t and storing it in
xdlclass_t where it is actually used makes it clearer that it is a
temporary value and it should not be used once we're calculated the
"minimal_perfect_hash". This also reduces the size of xrecord_t by 25%
on 64-bit platforms and 40% on 32-bit platforms. While the struct is
small we create one instance per input line so any saving is welcome.

Signed-off-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'yc/xdiff-patience-optim'</title>
<updated>2025-12-08T22:54:55Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-12-08T22:54:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7fc0b33b5d9ced74819132a094528154f83f4a6a'/>
<id>urn:sha1:7fc0b33b5d9ced74819132a094528154f83f4a6a</id>
<content type='text'>
The way patience diff finds LCS has been optimized.

* yc/xdiff-patience-optim:
  xdiff: optimize patience diff's LCS search
</content>
</entry>
<entry>
<title>Merge branch 'en/xdiff-cleanup-2'</title>
<updated>2025-12-05T05:49:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-12-05T05:49:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0c6707687f818fa43dca7c9381c55e611ba6c51e'/>
<id>urn:sha1:0c6707687f818fa43dca7c9381c55e611ba6c51e</id>
<content type='text'>
Code clean-up.

* en/xdiff-cleanup-2:
  xdiff: rename rindex -&gt; reference_index
  xdiff: change rindex from long to size_t in xdfile_t
  xdiff: make xdfile_t.nreff a size_t instead of long
  xdiff: make xdfile_t.nrec a size_t instead of long
  xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
  xdiff: use unambiguous types in xdl_hash_record()
  xdiff: use size_t for xrecord_t.size
  xdiff: make xrecord_t.ptr a uint8_t instead of char
  xdiff: use ptrdiff_t for dstart/dend
  doc: define unambiguous type mappings across C and Rust
</content>
</entry>
<entry>
<title>xdiff: optimize patience diff's LCS search</title>
<updated>2025-11-28T03:11:41Z</updated>
<author>
<name>Yee Cheng Chin</name>
<email>ychin.git@gmail.com</email>
</author>
<published>2025-11-27T02:16:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c7e3b8085bb2f74371f5017f42c58b0acf01b915'/>
<id>urn:sha1:c7e3b8085bb2f74371f5017f42c58b0acf01b915</id>
<content type='text'>
The find_longest_common_sequence() function in patience diff is
inefficient as it calls binary_search() for every unique line it
encounters when deciding where to put it in the sequence. From
instrumentation (using xctrace) on popular repositories, binary_search()
takes up 50-60% of the run time within patience_diff() when performing a
diff.

To optimize this, add a boundary condition check before binary_search()
is called to see if the encountered unique line is located after the
entire currently tracked longest subsequence. If so, skip the
unnecessary binary search and simply append the entry to the end of
sequence. Given that most files compared in a diff are usually quite
similar to each other, this condition is very common, and should be hit
much more frequently than the binary search.

Below are some end-to-end performance results by timing `git log
--shortstat --oneline -500 --patience` on different repositories with
the old and new code. Generally speaking this seems to give at least
8-10% speed up. The "binary search hit %" column describes how often the
algorithm enters the binary search path instead of the new faster path.
Even in the WebKit case we can see that it's quite rare (1.46%).

| Repo     | Speed difference | binary search hit % |
|----------|------------------|---------------------|
| vim      | 1.27x            | 0.01%               |
| pytorch  | 1.16x            | 0.02%               |
| cpython  | 1.14x            | 0.06%               |
| ripgrep  | 1.14x            | 0.03%               |
| git      | 1.13x            | 0.12%               |
| vscode   | 1.09x            | 0.10%               |
| WebKit   | 1.08x            | 1.46%               |

The benchmarks were done using hyperfine, on an Apple M1 Max laptop,
with git compiled with `-O3 -flto`.

Signed-off-by: Yee Cheng Chin &lt;ychin.git@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>xdiff: rename rindex -&gt; reference_index</title>
<updated>2025-11-18T22:53:11Z</updated>
<author>
<name>Ezekiel Newren</name>
<email>ezekielnewren@gmail.com</email>
</author>
<published>2025-11-18T22:34:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=22ce0cb6397d3d15c21c217696f262c4b8eb44b3'/>
<id>urn:sha1:22ce0cb6397d3d15c21c217696f262c4b8eb44b3</id>
<content type='text'>
The classic diff adds only the lines that it's going to consider,
during the diff, to an array. A mapping between the compacted
array, and the lines of the file that they reference, is
facilitated by this array.

Signed-off-by: Ezekiel Newren &lt;ezekielnewren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>xdiff: change rindex from long to size_t in xdfile_t</title>
<updated>2025-11-18T22:53:11Z</updated>
<author>
<name>Ezekiel Newren</name>
<email>ezekielnewren@gmail.com</email>
</author>
<published>2025-11-18T22:34:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5004a8da14e2aa80b5697b0a3a60e594af1c8292'/>
<id>urn:sha1:5004a8da14e2aa80b5697b0a3a60e594af1c8292</id>
<content type='text'>
The field rindex describes an index offset for other arrays. Change it
to size_t.

Changing the type of rindex from long to size_t has no cascading
refactor impact because it is only ever used to directly index other
arrays.

Signed-off-by: Ezekiel Newren &lt;ezekielnewren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
