<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diffcore.h, branch v2.32.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.32.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.32.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-04-16T20:53:33Z</updated>
<entry>
<title>Merge branch 'en/ort-perf-batch-10'</title>
<updated>2021-04-16T20:53:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-04-16T20:53:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e2e1a03f6b9c0869e64710c94fc43df82a06f0c8'/>
<id>urn:sha1:e2e1a03f6b9c0869e64710c94fc43df82a06f0c8</id>
<content type='text'>
Various rename detection optimization to help "ort" merge strategy
backend.

* en/ort-perf-batch-10:
  diffcore-rename: determine which relevant_sources are no longer relevant
  merge-ort: record the reason that we want a rename for a file
  diffcore-rename: add computation of number of unknown renames
  diffcore-rename: check if we have enough renames for directories early on
  diffcore-rename: only compute dir_rename_count for relevant directories
  merge-ort: record the reason that we want a rename for a directory
  merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type
  diffcore-rename: take advantage of "majority rules" to skip more renames
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-9'</title>
<updated>2021-04-08T20:23:26Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-04-08T20:23:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1b31224e59750f515f7ceb7adab2a7609371327d'/>
<id>urn:sha1:1b31224e59750f515f7ceb7adab2a7609371327d</id>
<content type='text'>
The ort merge backend has been optimized by skipping irrelevant
renames.

* en/ort-perf-batch-9:
  diffcore-rename: avoid doing basename comparisons for irrelevant sources
  merge-ort: skip rename detection entirely if possible
  merge-ort: use relevant_sources to filter possible rename sources
  merge-ort: precompute whether directory rename detection is needed
  merge-ort: introduce wrappers for alternate tree traversal
  merge-ort: add data structures for an alternate tree traversal
  merge-ort: precompute subset of sources for which we need rename detection
  diffcore-rename: enable filtering possible rename sources
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-8'</title>
<updated>2021-03-22T21:00:24Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-03-22T21:00:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dd4048d1c76f067ad7b339dfb3a5f4765f5ae979'/>
<id>urn:sha1:dd4048d1c76f067ad7b339dfb3a5f4765f5ae979</id>
<content type='text'>
Rename detection rework continues.

* en/ort-perf-batch-8:
  diffcore-rename: compute dir_rename_guess from dir_rename_counts
  diffcore-rename: limit dir_rename_counts computation to relevant dirs
  diffcore-rename: compute dir_rename_counts in stages
  diffcore-rename: extend cleanup_dir_rename_info()
  diffcore-rename: move dir_rename_counts into dir_rename_info struct
  diffcore-rename: add function for clearing dir_rename_count
  Move computation of dir_rename_count from merge-ort to diffcore-rename
  diffcore-rename: add a mapping of destination names to their indices
  diffcore-rename: provide basic implementation of idx_possible_rename()
  diffcore-rename: use directory rename guided basename comparisons
</content>
</entry>
<entry>
<title>merge-ort: record the reason that we want a rename for a file</title>
<updated>2021-03-18T21:32:56Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-03-13T22:22:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ec59da6015c457e84953a802fc6484ae2da2d774'/>
<id>urn:sha1:ec59da6015c457e84953a802fc6484ae2da2d774</id>
<content type='text'>
There are two different reasons we might want a rename for a file -- for
three-way content merging or as part of directory rename detection.
Record the reason.  diffcore-rename will potentially be able to filter
some of the ones marked as needed only for directory rename detection,
if it can determine those directory renames based solely on renames
found via exact rename detection and basename-guided rename detection.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>merge-ort: record the reason that we want a rename for a directory</title>
<updated>2021-03-18T21:32:55Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-03-13T22:22:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fb52938eec1cf6ec3169152362fe774849f5ac9b'/>
<id>urn:sha1:fb52938eec1cf6ec3169152362fe774849f5ac9b</id>
<content type='text'>
When one side of history renames a directory, and the other side of
history added files to the old directory, directory rename detection is
used to warn about the location of the added files so the user can
move them to the old directory or keep them with the new one.

This sets up three different types of directories:
  * directories that had new files added to them
  * directories underneath a directory that had new files added to them
  * directories where no new files were added to it or any leading path

Save this information in dirs_removed; the next several commits will
make use of this information.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type</title>
<updated>2021-03-18T21:32:55Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-03-13T22:22:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a49b55d52e7dcfd628b6328c9098d734ebe7a97d'/>
<id>urn:sha1:a49b55d52e7dcfd628b6328c9098d734ebe7a97d</id>
<content type='text'>
As noted in the previous commit, we want to be able to take advantage of
the "majority rules" portion of directory rename detection to avoid
detecting more renames than necessary.  However, for diffcore-rename to
take advantage of that, it needs to know whether a rename source file
was needed for just directory rename detection reasons, or if it is
wanted for potential three-way content merging.  Modify relevant_sources
from a strset to a strintmap, so we can encode additional information.

We also modify dirs_removed from a strset to a strintmap at the same
time because trying to determine what files are needed for directory
rename detection will require us tracking a bit more information for
each directory.

This commit only changes the types of the two variables from strset to
strintmap; it does not actually store any special values yet and for now
only checks for presence of entries in the strintmap.  Thus, the code is
functionally identical to how it behaved before.  Future commits will
start associating values with each key for these two maps.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: enable filtering possible rename sources</title>
<updated>2021-03-11T06:18:04Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-03-11T00:38:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9799889f2e2caddee00d59aec3e4ff605a275f98'/>
<id>urn:sha1:9799889f2e2caddee00d59aec3e4ff605a275f98</id>
<content type='text'>
Add the ability to diffcore_rename_extended() to allow external callers
to declare that they only need renames detected for a subset of source
files, and use that information to skip detecting renames for them.

There are two important pieces to this optimization that may not be
obvious at first glance:

  * We do not require callers to just filter the filepairs out
    to remove the non-relevant sources, because exact rename detection
    is fast and when it finds a match it can remove both a source and a
    destination whereas the relevant_sources filter can only remove a
    source.

  * We need to filter out the source pairs in a preliminary pass instead
    of adding a
       strset_contains(relevant_sources, one-&gt;path)
    check within the nested matrix loop.  The reason for that is if we
    have 30k renames, doing 30k * 30k = 900M strset_contains() calls
    becomes extraordinarily expensive and defeats the performance gains
    from this change; we only want to do 30k such calls instead.

If callers pass NULL for relevant_sources, that is special cases to
treat all sources as relevant.  Since all callers currently pass NULL,
this optimization does not yet have any effect.  Subsequent commits will
have merge-ort compute a set of relevant_sources to restrict which
sources we detect renames for, and have merge-ort pass that set of
relevant_sources to diffcore_rename_extended().

A note about filtering order:

Some may be curious why we don't filter out irrelevant sources at the
same time we filter out exact renames.  While that technically could be
done at this point, there are two reasons to defer it:

First, was to reinforce a lesson that was too easy to forget.  As I
mentioned above, in the past I filtered irrelevant sources out before
exact rename checking, and then discovered that exact renames' ability
to remove both sources and destinations was an important consideration
and thus doing the filtering after exact rename checking would speed
things up.  Then at some point I realized that basename matching could
also remove both sources and destinations, and decided to put irrelevant
source filtering after basename filtering.  That slowed things down a
lot.  But, despite learning about this important ordering, in later
restructuring I forgot and made the same mistake of putting the
filtering after basename guided rename detection again.  So, I have this
series of patches structured to do the irrelevant filtering last to
start to show how much extra it costs, and then add relevant filtering
in to find_basename_matches() to show how much it speeds things up.
Basically, it's a way to reinforce something that apparently was too
easy to forget, and make sure the commit messages record this lesson.

Second, the items in the "relevant_sources" in this patch series will
include all sources that *might be* relevant.  It has to be conservative
and catch anything that might need a rename, but in the patch series
after this one we'll find ways to weed out more of the *might be*
relevant ones.  Unfortunately, merge-ort does not have sufficient
information to weed those ones out, and there isn't enough information
at the time of filtering exact renames out to remove the extra ones
either.  It has to be deferred.  So the deferral is in part to simplify
some later additions.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: add function for clearing dir_rename_count</title>
<updated>2021-02-27T01:53:11Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-02-27T00:30:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cd52e0050f0aa18bb35cda08f2dcbe94943df2cf'/>
<id>urn:sha1:cd52e0050f0aa18bb35cda08f2dcbe94943df2cf</id>
<content type='text'>
As we adjust the usage of dir_rename_count we want to have a function
for clearing, or partially clearing it out.  Add a
partial_clear_dir_rename_count() function for this purpose.

Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Move computation of dir_rename_count from merge-ort to diffcore-rename</title>
<updated>2021-02-27T01:53:11Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-02-27T00:30:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0c4fd732f043be570e08b51c475ff9f2e2066912'/>
<id>urn:sha1:0c4fd732f043be570e08b51c475ff9f2e2066912</id>
<content type='text'>
Move the computation of dir_rename_count from merge-ort.c to
diffcore-rename.c, making slight adjustments to the data structures
based on the move.  While the diffstat looks large, viewing this commit
with --color-moved makes it clear that only about 20 lines changed.

With this patch, the computation of dir_rename_count is still only done
after inexact rename detection, but subsequent commits will add a
preliminary computation of dir_rename_count after exact rename
detection, followed by some updates after inexact rename detection.

Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: --{rotate,skip}-to=&lt;path&gt;</title>
<updated>2021-02-16T17:30:42Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-02-11T19:57:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1eb4136ac2a24764257567b930535fcece01719f'/>
<id>urn:sha1:1eb4136ac2a24764257567b930535fcece01719f</id>
<content type='text'>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning.  Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.

 - "git diff --rotate-to=C", when the resulting patch would show
   paths A B C D E without the option, would "rotate" the paths to
   shows patch to C D E A B instead.  It is an error when there is
   no patch for C is shown.

 - "git diff --skip-to=C" would instead "skip" the paths before C,
   and shows patch to C D E.  Again, it is an error when there is no
   patch for C is shown.

 - "git log [-p]" also accepts these two options, but it is not an
   error if there is no change to the specified path.  Instead, the
   set of output paths are rotated or skipped to the specified path
   or the first path that sorts after the specified path.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
