<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/combine-diff.c, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-10-05T21:01:55Z</updated>
<entry>
<title>Merge branch 'jk/diff-cc-oidfind-fix'</title>
<updated>2020-10-05T21:01:55Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-10-05T21:01:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7da656f1e083b065400c4ab92405c6332481eb7a'/>
<id>urn:sha1:7da656f1e083b065400c4ab92405c6332481eb7a</id>
<content type='text'>
"log -c --find-object=X" did not work well to find a merge that
involves a change to an object X from only one parent.

* jk/diff-cc-oidfind-fix:
  combine-diff: handle --find-object in multitree code path
</content>
</entry>
<entry>
<title>combine-diff: handle --find-object in multitree code path</title>
<updated>2020-09-30T20:35:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-09-30T11:52:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=957876f17d75b5cd0e94b85d3df843bb1e9ae110'/>
<id>urn:sha1:957876f17d75b5cd0e94b85d3df843bb1e9ae110</id>
<content type='text'>
When doing combined diffs, we have two possible code paths:

  - a slower one which independently diffs against each parent, applies
    any filters, and then intersects the resulting paths

  - a faster one which walks all trees simultaneously

When the diff options specify that we must do certain filters, like
pickaxe, then we always use the slow path, since the pickaxe code only
knows how to handle filepairs, not the n-parent entries generated for
combined diffs.

But there are two problems with the slow path:

 1. It's slow. Running:

      git rev-list HEAD | git diff-tree --stdin -r -c

    in git.git takes ~3s on my machine. But adding "--find-object" to
    that increases it to ~6s, even though find-object itself should
    incur only a few extra oid comparisons. On linux.git, it's even
    worse: 35s versus 215s.

 2. It doesn't catch all cases where a particular path is interesting.
    Consider a merge with parent blobs X and Y for a particular path,
    and end result Z. That should be interesting according to "-c",
    because the result doesn't match either parent. And it should be
    interesting even with "--find-object=X", because "X" went away in
    the merge.

    But because we perform each pairwise diff independently, this
    confuses the intersection code. The change from X to Z is still
    interesting according to --find-object. But in the other parent we
    went from Y to Z, so the diff appears empty! That causes the
    intersection code to think that parent didn't change the path, and
    thus it's not interesting for "-c".

This patch fixes both by implementing --find-object for the multitree
code. It's a bit unfortunate that we have to duplicate some logic from
diffcore-pickaxe, but this is the best we can do for now. In an ideal
world, all of the diffcore code would stop thinking about filepairs and
start thinking about n-parent sets, and we could use the multitree walk
with all of it.

Until then, there are some leftover warts:

  - other pickaxe operations, like -S or -G, still suffer from both
    problems. These would be hard to adapt because they rely on having
    a diff_filespec() for each path to look at content. And we'd need to
    define what an n-way "change" means in each case (probably easy for
    "-S", which can compare counts, but not so clear for -G, which is
    about grepping diffs).

  - other options besides --find-object may cause us to use the slow
    pairwise path, in which case we'll go back to producing a different
    (wrong) answer for the X/Y/Z case above.

We may be able to hack around these, but I think the ultimate solution
will be a larger rewrite of the diffcore code. For now, this patch
improves one specific case but leaves the rest.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: get rid of redundant 'dense' argument</title>
<updated>2020-09-29T18:54:53Z</updated>
<author>
<name>Sergey Organov</name>
<email>sorganov@gmail.com</email>
</author>
<published>2020-09-29T11:31:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d01141de5ab02cf4a156183ef4dc5ee8bf2638a3'/>
<id>urn:sha1:d01141de5ab02cf4a156183ef4dc5ee8bf2638a3</id>
<content type='text'>
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev-&gt;dense_combined_merges' field.

The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs-&gt;dense_combined_merges' to 1 in this local instance.

Signed-off-by: Sergey Organov &lt;sorganov@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>oid_array: rename source file from sha1-array</title>
<updated>2020-03-30T17:59:08Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-03-30T14:03:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fe299ec5ae7b419990bbc3efd4e6bfa3f0773b45'/>
<id>urn:sha1:fe299ec5ae7b419990bbc3efd4e6bfa3f0773b45</id>
<content type='text'>
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.

Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).

I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo</title>
<updated>2019-08-19T22:04:58Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2019-08-18T20:04:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=976ff7e49d722d1b72effb6d547ec2b00cd69fc0'/>
<id>urn:sha1:976ff7e49d722d1b72effb6d547ec2b00cd69fc0</id>
<content type='text'>
Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/combined-all-paths'</title>
<updated>2019-03-07T00:59:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-03-07T00:59:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c425d361f5bf46d99cd96d7eac3488ebb2e92b60'/>
<id>urn:sha1:c425d361f5bf46d99cd96d7eac3488ebb2e92b60</id>
<content type='text'>
Output from "diff --cc" did not show the original paths when the
merge involved renames.  A new option adds the paths in the
original trees to the output.

* en/combined-all-paths:
  log,diff-tree: add --combined-all-paths option
</content>
</entry>
<entry>
<title>log,diff-tree: add --combined-all-paths option</title>
<updated>2019-02-08T04:15:25Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-02-08T01:12:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d76ce4f734634f47b467b7f6eea11d6bf8c81f22'/>
<id>urn:sha1:d76ce4f734634f47b467b7f6eea11d6bf8c81f22</id>
<content type='text'>
The combined diff format for merges will only list one filename, even if
rename or copy detection is active.  For example, with raw format one
might see:

  ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM	describe.c
  ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM	bar.sh
  ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR	phooey.c

This doesn't let us know what the original name of bar.sh was in the
first parent, and doesn't let us know what either of the original names
of phooey.c were in either of the parents.  In contrast, for non-merge
commits, raw format does provide original filenames (and a rename score
to boot).  In order to also provide original filenames for merge
commits, add a --combined-all-paths option (which must be used with
either -c or --cc, and is likely only useful with rename or copy
detection active) so that we can print tab-separated filenames when
renames are involved.  This transforms the above output to:

  ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM	desc.c	desc.c	desc.c
  ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM	foo.sh	bar.sh	bar.sh
  ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR	fooey.c	fuey.c	phooey.c

Further, in patch format, this changes the from/to headers so that
instead of just having one "from" header, we get one for each parent.
For example, instead of having

  --- a/phooey.c
  +++ b/phooey.c

we would see

  --- a/fooey.c
  --- a/fuey.c
  +++ b/phooey.c

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/diff-cc-stat-fixes'</title>
<updated>2019-02-05T22:26:17Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-02-05T22:26:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5d2710bd3c706c3375991972b630f9419551f24e'/>
<id>urn:sha1:5d2710bd3c706c3375991972b630f9419551f24e</id>
<content type='text'>
"git diff --color-moved --cc --stat -p" did not work well due to
funny interaction between a bug in color-moved and the rest, which
has been fixed.

* jk/diff-cc-stat-fixes:
  combine-diff: treat --dirstat like --stat
  combine-diff: treat --summary like --stat
  combine-diff: treat --shortstat like --stat
  combine-diff: factor out stat-format mask
  diff: clear emitted_symbols flag after use
  t4006: resurrect commented-out tests
</content>
</entry>
<entry>
<title>combine-diff: treat --dirstat like --stat</title>
<updated>2019-01-24T20:18:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-01-24T12:36:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dac03b5518a053e52ed7ee20925970728d3ae4c3'/>
<id>urn:sha1:dac03b5518a053e52ed7ee20925970728d3ae4c3</id>
<content type='text'>
Currently "--cc --dirstat" will show nothing for a merge.  Like
--shortstat and --summary in the previous two patches, it probably makes
sense to treat it like we do --stat, and show a stat against the
first-parent.

This case is less obviously correct than for --shortstat and --summary,
as those are basically variants of --stat themselves. It's possible we
could develop a multi-parent combined dirstat format, in which case we
might regret defining this first-parent behavior. But the same could be
said for --stat, and in the 12+ years of it showing first-parent stats,
nobody has complained.

So showing the first-parent dirstat is at least _useful_, and if we
later develop a clever multi-parent stat format, we'd probably have to
deal with --stat anyway.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>combine-diff: treat --summary like --stat</title>
<updated>2019-01-24T20:18:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-01-24T12:35:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04b19fcafd734245e61fd1c090cd7de8c2993eaa'/>
<id>urn:sha1:04b19fcafd734245e61fd1c090cd7de8c2993eaa</id>
<content type='text'>
Currently "--cc --summary" on a merge shows nothing. Since we show "--cc
--stat" as a stat against the first parent, and because --summary is
typically used in combination with --stat, it makes sense to treat them
both the same way.

Note that we have to tweak t4013's setup a bit to test this case, as the
existing merges do not have any --summary results against their first
parent. But since the merge at the tip of 'master' does add and remove
files with respect to the second parent, we can just make a reversed
doppelganger merge where the parents are swapped.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
