<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diff.h, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-11-02T21:17:44Z</updated>
<entry>
<title>Merge branch 'mk/diff-ignore-regex'</title>
<updated>2020-11-02T21:17:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-11-02T21:17:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1ae0949a036864de1f5caa7b375b875284d1c69e'/>
<id>urn:sha1:1ae0949a036864de1f5caa7b375b875284d1c69e</id>
<content type='text'>
"git diff" family of commands learned the "-I&lt;regex&gt;" option to
ignore hunks whose changed lines all match the given pattern.

* mk/diff-ignore-regex:
  diff: add -I&lt;regex&gt; that ignores matching changes
  merge-base, xdiff: zero out xpparam_t structures
</content>
</entry>
<entry>
<title>Merge branch 'dl/diff-merge-base'</title>
<updated>2020-11-02T21:17:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-11-02T21:17:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b6fb70c985a6fc113e971a6696f328abf8315dce'/>
<id>urn:sha1:b6fb70c985a6fc113e971a6696f328abf8315dce</id>
<content type='text'>
"git diff A...B" learned "git diff --merge-base A B", which is a
longer short-hand to say the same thing.

* dl/diff-merge-base:
  contrib/completion: complete `git diff --merge-base`
  builtin/diff-tree: learn --merge-base
  builtin/diff-index: learn --merge-base
  t4068: add --merge-base tests
  diff-lib: define diff_get_merge_base()
  diff-lib: accept option flags in run_diff_index()
  contrib/completion: extract common diff/difftool options
  git-diff.txt: backtick quote command text
  git-diff-index.txt: make --cached description a proper sentence
  t4068: remove unnecessary &gt;tmp
</content>
</entry>
<entry>
<title>diff: add -I&lt;regex&gt; that ignores matching changes</title>
<updated>2020-10-20T19:53:26Z</updated>
<author>
<name>Michał Kępień</name>
<email>michal@isc.org</email>
</author>
<published>2020-10-20T06:48:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=296d4a94e7231a1d57356889f51bff57a1a3c5a1'/>
<id>urn:sha1:296d4a94e7231a1d57356889f51bff57a1a3c5a1</id>
<content type='text'>
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień &lt;michal@isc.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'so/combine-diff-simplify'</title>
<updated>2020-10-05T21:01:51Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-10-05T21:01:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=34415c76c8f67c8692090ef4911d1626689e8f38'/>
<id>urn:sha1:34415c76c8f67c8692090ef4911d1626689e8f38</id>
<content type='text'>
Code simplification.

* so/combine-diff-simplify:
  diff: get rid of redundant 'dense' argument
</content>
</entry>
<entry>
<title>Merge branch 'tb/bloom-improvements'</title>
<updated>2020-09-29T21:01:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-09-29T21:01:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=288ed98bf768f4df9b569d51a52c233a1402c0f5'/>
<id>urn:sha1:288ed98bf768f4df9b569d51a52c233a1402c0f5</id>
<content type='text'>
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=&lt;n&gt;'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&amp;&amp;'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
</content>
</entry>
<entry>
<title>diff: get rid of redundant 'dense' argument</title>
<updated>2020-09-29T18:54:53Z</updated>
<author>
<name>Sergey Organov</name>
<email>sorganov@gmail.com</email>
</author>
<published>2020-09-29T11:31:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d01141de5ab02cf4a156183ef4dc5ee8bf2638a3'/>
<id>urn:sha1:d01141de5ab02cf4a156183ef4dc5ee8bf2638a3</id>
<content type='text'>
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev-&gt;dense_combined_merges' field.

The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs-&gt;dense_combined_merges' to 1 in this local instance.

Signed-off-by: Sergey Organov &lt;sorganov@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/diff-index: learn --merge-base</title>
<updated>2020-09-21T04:30:26Z</updated>
<author>
<name>Denton Liu</name>
<email>liu.denton@gmail.com</email>
</author>
<published>2020-09-20T11:22:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0f5a1d449b9538c2765de9d6683abbb83a7fb4e2'/>
<id>urn:sha1:0f5a1d449b9538c2765de9d6683abbb83a7fb4e2</id>
<content type='text'>
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.

Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.

Signed-off-by: Denton Liu &lt;liu.denton@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-lib: define diff_get_merge_base()</title>
<updated>2020-09-21T04:30:26Z</updated>
<author>
<name>Denton Liu</name>
<email>liu.denton@gmail.com</email>
</author>
<published>2020-09-20T11:22:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=177a8302683da62a62b5b03e8192fcfa897db750'/>
<id>urn:sha1:177a8302683da62a62b5b03e8192fcfa897db750</id>
<content type='text'>
In a future commit, we will be using this function to implement
--merge-base functionality in various diff commands.

Signed-off-by: Denton Liu &lt;liu.denton@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-lib: accept option flags in run_diff_index()</title>
<updated>2020-09-21T04:30:26Z</updated>
<author>
<name>Denton Liu</name>
<email>liu.denton@gmail.com</email>
</author>
<published>2020-09-20T11:22:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4c3fe82ef17a6636c742b95cd292b83b02876e08'/>
<id>urn:sha1:4c3fe82ef17a6636c742b95cd292b83b02876e08</id>
<content type='text'>
In a future commit, we will teach run_diff_index() to accept more
options via flag bits. For now, change `cached` into a flag in the
`option` bitfield. The behaviour should remain exactly the same.

Signed-off-by: Denton Liu &lt;liu.denton@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bloom/diff: properly short-circuit on max_changes</title>
<updated>2020-09-17T16:31:25Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-09-16T18:07:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b16a8277644e4b1d21c08d97de757105039dc7ae'/>
<id>urn:sha1:b16a8277644e4b1d21c08d97de757105039dc7ae</id>
<content type='text'>
Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).

To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.

This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.

One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:

1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
   allowing smaller commits to engage with this logic.

2. Create several interesting cases of edits, adds, removes, and mode
   changes (in the second commit). By testing both sides of the
   inequality with the *_MAX_CHANGED_PATHS variable, we can see that
   the count is exactly correct, so none of these changes are missed
   or over-counted.

3. Use the trace2 data value filter_found_large to verify that these
   commits are on the correct side of the limit.

Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command

  GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
    git commit-graph write --reachable --changed-paths

and reporting the wall time and resulting commit-graph size.

For Git, the results are

|        |      N=1       |       N=10     |      N=512     |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s  9.18MB | 11.11s  9.34MB | 11.31s  9.35MB |
| HEAD   |  9.21s  8.62MB | 11.11s  9.29MB | 11.29s  9.34MB |

For Linux, the results are

|        |       N=1      |     N=20      |     N=512     |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s  64.3MB | 76.9s  72.6MB | 77.6s  72.6MB |
| HEAD   | 49.44s  56.3MB | 68.7s  65.9MB | 69.2s  65.9MB |

Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.

Reported-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
