<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diff.c, branch v2.34.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.34.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.34.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-09-27T21:47:59Z</updated>
<entry>
<title>*.[ch] *_INIT macros: use { 0 } for a "zero out" idiom</title>
<updated>2021-09-27T21:47:59Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-09-27T12:54:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9865b6e6a4ca1e895fd473c827cf1822f3bd8249'/>
<id>urn:sha1:9865b6e6a4ca1e895fd473c827cf1822f3bd8249</id>
<content type='text'>
In C it isn't required to specify that all members of a struct are
zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will
accomplish that.

Let's also change code that provided N zero'd fields to just
provide one, and change e.g. "{ NULL }" to "{ 0 }" for
consistency. I.e. even if the first member is a pointer let's use "0"
instead of "NULL". The point of using "0" consistently is to pick one,
and to not have the reader wonder why we're not using the same pattern
everywhere.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: ignore sparse paths in diffstat</title>
<updated>2021-09-09T22:49:04Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-09-08T11:23:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ad90da73513d7b0055e47c6918e1d75c6165fa69'/>
<id>urn:sha1:ad90da73513d7b0055e47c6918e1d75c6165fa69</id>
<content type='text'>
The diff_populate_filespec() method is used to describe the diff after a
merge operation is complete. In order to avoid expanding a sparse index,
the reuse_worktree_file() needs to be adapted to ignore files that are
outside of the sparse-checkout cone. The file names and OIDs used for
this check come from the merged tree in the case of the ORT strategy,
not the index, hence the ability to look into these paths without having
already expanded the index.

The work done by reuse_worktree_file() is only an optimization, and
requires the file being on disk for it to be of any value. Thus, it is
safe to exit the method early if we do not expect the file on disk.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Reviewed-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/pickaxe-pcre2'</title>
<updated>2021-08-06T19:52:15Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-08-06T19:52:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=55194925e62b34a3f62b31034f73a6bcfb063bc5'/>
<id>urn:sha1:55194925e62b34a3f62b31034f73a6bcfb063bc5</id>
<content type='text'>
* ab/pickaxe-pcre2:
  diff: --pickaxe-all typofix
</content>
</entry>
<entry>
<title>diff: --pickaxe-all typofix</title>
<updated>2021-08-04T17:34:47Z</updated>
<author>
<name>Bagas Sanjaya</name>
<email>bagasdotme@gmail.com</email>
</author>
<published>2021-08-04T12:24:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=11c649b8911eb641a41169957dba525efe50eaa0'/>
<id>urn:sha1:11c649b8911eb641a41169957dba525efe50eaa0</id>
<content type='text'>
When I was fixing fuzzies as I updating po/id.po for 2.33.0 l10n round,
I noticed a triple-dash typo (--pickaxe-all) at diff.c, which according
to git-diff(1) manpage, the correct option name should be --pickaxe-all.

Fix the typo.

Signed-off-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Acked-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/rename-limits-doc'</title>
<updated>2021-07-28T20:18:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-07-28T20:18:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=268055bfdec5cc2228ade6f3ff7094632568c82a'/>
<id>urn:sha1:268055bfdec5cc2228ade6f3ff7094632568c82a</id>
<content type='text'>
Documentation on "git diff -l&lt;n&gt;" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.

* en/rename-limits-doc:
  rename: bump limit defaults yet again
  diffcore-rename: treat a rename_limit of 0 as unlimited
  doc: clarify documentation for rename/copy limits
  diff: correct warning message when renameLimit exceeded
</content>
</entry>
<entry>
<title>rename: bump limit defaults yet again</title>
<updated>2021-07-15T23:54:34Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-07-15T00:45:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=94b82d56866793018a7a9bcbe20c1c061fa41aa8'/>
<id>urn:sha1:94b82d56866793018a7a9bcbe20c1c061fa41aa8</id>
<content type='text'>
These were last bumped in commit 92c57e5c1d29 (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400-&gt;1000, and from 1000-&gt;7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo &amp;&amp; cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" &lt;../sample
         echo tag: $1
        ) &gt;$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: correct warning message when renameLimit exceeded</title>
<updated>2021-07-15T23:54:24Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-07-15T00:45:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=05d2c61c6744212cdef6085832a84b49da77591c'/>
<id>urn:sha1:05d2c61c6744212cdef6085832a84b49da77591c</id>
<content type='text'>
The warning when quadratic rename detection was skipped referred to
"inexact rename detection".  For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection.  However, that changed with commit bd24aa2f97a0
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14).  Let's instead use the term "exhaustive rename detection"
to refer to the quadratic portion.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/pickaxe-pcre2'</title>
<updated>2021-07-13T23:52:50Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-07-13T23:52:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4da281e84d87b099ba8d0a255534dc1251be968e'/>
<id>urn:sha1:4da281e84d87b099ba8d0a255534dc1251be968e</id>
<content type='text'>
Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.

* ab/pickaxe-pcre2: (22 commits)
  xdiff-interface: replace discard_hunk_line() with a flag
  xdiff users: use designated initializers for out_line
  pickaxe -G: don't special-case create/delete
  pickaxe -G: terminate early on matching lines
  xdiff-interface: allow early return from xdiff_emit_line_fn
  xdiff-interface: prepare for allowing early return
  pickaxe -S: slightly optimize contains()
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe: refactor function selection in diffcore-pickaxe()
  perf: add performance test for pickaxe
  pickaxe/style: consolidate declarations and assignments
  diff.h: move pickaxe fields together again
  pickaxe: die when --find-object and --pickaxe-all are combined
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe tests: add missing test for --no-pickaxe-regex being an error
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: add test for diffgrep_consume() internals
  ...
</content>
</entry>
<entry>
<title>Merge branch 'pw/word-diff-zero-width-matches'</title>
<updated>2021-05-13T23:26:06Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-05-13T23:26:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=65c18913deeba4c5f0ca9488b1d8b6757ffc56e1'/>
<id>urn:sha1:65c18913deeba4c5f0ca9488b1d8b6757ffc56e1</id>
<content type='text'>
The word-diff mode has been taught to work better with a word
regexp that can match an empty string.

* pw/word-diff-zero-width-matches:
  word diff: handle zero length matches
</content>
</entry>
<entry>
<title>xdiff-interface: replace discard_hunk_line() with a flag</title>
<updated>2021-05-11T03:47:31Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-04-12T17:15:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5d93460024541909337d6b08a8bec10b71caaf73'/>
<id>urn:sha1:5d93460024541909337d6b08a8bec10b71caaf73</id>
<content type='text'>
Remove the dummy discard_hunk_line() function added in
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.

Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.

It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.

It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.

The reason we're putting a flag in the gap between 1&lt;&lt;0 and 1&lt;&lt;2 is
that the old 1&lt;&lt;1 flag was removed in 907681e940d (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
