<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diffcore.h, branch v2.45.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.3</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-04-24T19:47:32Z</updated>
<entry>
<title>hash-ll.h: split out of hash.h to remove dependency on repository.h</title>
<updated>2023-04-24T19:47:32Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-22T20:17:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d1cbe1e6d8a9cab2b4ffe8a17d34db214dce1e49'/>
<id>urn:sha1:d1cbe1e6d8a9cab2b4ffe8a17d34db214dce1e49</id>
<content type='text'>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository-&gt;hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash.h: move some oid-related declarations from cache.h</title>
<updated>2023-02-24T01:25:28Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-02-24T00:09:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41227cb138c91fbd369ac6ee4877f253b39260cc'/>
<id>urn:sha1:41227cb138c91fbd369ac6ee4877f253b39260cc</id>
<content type='text'>
These defines and enum are all oid-related and as such seem to make
more sense being included in hash.h.  Further, moving them there
allows us to remove some includes of cache.h in other files.

The change to line-log.h might look unrelated, but line-log.h includes
diffcore.h, which previously included cache.h, which included the
kitchen sink.  Since this patch makes diffcore.h no longer include
cache.h, the compiler complains about the 'struct string_list *'
function parameter.  Add a forward declaration for struct string_list to
address this.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>line-log: free diff queue when processing non-merge commits</title>
<updated>2022-11-03T00:16:34Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2022-11-02T22:01:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04ae00062d2ec301828e8df9931817be4b2f8653'/>
<id>urn:sha1:04ae00062d2ec301828e8df9931817be4b2f8653</id>
<content type='text'>
When processing a non-merge commit, the line-level log first asks the
tree-diff machinery whether any of the files in the given line ranges
were modified between the current commit and its parent, and if some
of them were, then it loads the contents of those files from both
commits to see whether their line ranges were modified and/or need to
be adjusted.  Alas, it doesn't free() the diff queue holding the
results of that query and the contents of those files once its done.
This can add up to a substantial amount of leaked memory, especially
when the file in question is big and is frequently modified: a user
reported "Out of memory, malloc failed" errors with a 2MB text file
that was modified ~2800 times [1] (I estimate the leak would use up
almost 11GB memory in that case).

Free that diff queue to plug this memory leak.  However, instead of
simply open-coding the necessary three lines, add them as a helper
function to the diff API, because it will be useful elsewhere as well.

[1] https://public-inbox.org/git/CAFOPqVXz2XwzX8vGU7wLuqb2ZuwTuOFAzBLRM_QPk+NJa=eC-g@mail.gmail.com/

Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>merge-ort: store filepairs and filespecs in our mem_pool</title>
<updated>2021-07-30T16:01:19Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-07-30T11:47:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f239fff4c183b231134115bc3b38b4f8e3982d3e'/>
<id>urn:sha1:f239fff4c183b231134115bc3b38b4f8e3982d3e</id>
<content type='text'>
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc</title>
<updated>2021-07-30T16:01:19Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-07-30T11:47:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a8791ef6492bf702ba70352009cbd28fb581c6e2'/>
<id>urn:sha1:a8791ef6492bf702ba70352009cbd28fb581c6e2</id>
<content type='text'>
We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt-&gt;priv-&gt;pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-11'</title>
<updated>2021-06-14T04:33:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-06-14T04:33:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=169914ede2aa205c5500c7be0501889a8962dc24'/>
<id>urn:sha1:169914ede2aa205c5500c7be0501889a8962dc24</id>
<content type='text'>
Optimize out repeated rename detection in a sequence of mergy
operations.

* en/ort-perf-batch-11:
  merge-ort, diffcore-rename: employ cached renames when possible
  merge-ort: handle interactions of caching and rename/rename(1to1) cases
  merge-ort: add helper functions for using cached renames
  merge-ort: preserve cached renames for the appropriate side
  merge-ort: avoid accidental API mis-use
  merge-ort: add code to check for whether cached renames can be reused
  merge-ort: populate caches of rename detection results
  merge-ort: add data structures for in-memory caching of rename detection
  t6429: testcases for remembering renames
  fast-rebase: write conflict state to working tree, index, and HEAD
  fast-rebase: change assert() to BUG()
  Documentation/technical: describe remembering renames optimization
  t6423: rename file within directory that other side renamed
</content>
</entry>
<entry>
<title>merge-ort, diffcore-rename: employ cached renames when possible</title>
<updated>2021-05-20T06:40:39Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-05-20T06:09:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=25e65b6dd52c987056f1cac00fe6073fbf8ea237'/>
<id>urn:sha1:25e65b6dd52c987056f1cac00fe6073fbf8ea237</id>
<content type='text'>
When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted.  To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on.  See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused".  This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.665 s ±  0.129 s     5.622 s ±  0.059 s
    mega-renames:     11.435 s ±  0.158 s    10.127 s ±  0.073 s
    just-one-mega:   494.2  ms ±  6.1  ms   500.3  ms ±  3.8  ms

That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't.  If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:

                   Before Basename Series   After Just This Series
    no-renames:      13.815 s ±  0.062 s      5.697 s ±  0.080 s
    mega-renames:  1799.937 s ±  0.493 s    205.709 s ±  0.457 s

Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.

The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.

However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result.  If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization.  Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed.  Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-10'</title>
<updated>2021-04-16T20:53:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-04-16T20:53:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e2e1a03f6b9c0869e64710c94fc43df82a06f0c8'/>
<id>urn:sha1:e2e1a03f6b9c0869e64710c94fc43df82a06f0c8</id>
<content type='text'>
Various rename detection optimization to help "ort" merge strategy
backend.

* en/ort-perf-batch-10:
  diffcore-rename: determine which relevant_sources are no longer relevant
  merge-ort: record the reason that we want a rename for a file
  diffcore-rename: add computation of number of unknown renames
  diffcore-rename: check if we have enough renames for directories early on
  diffcore-rename: only compute dir_rename_count for relevant directories
  merge-ort: record the reason that we want a rename for a directory
  merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type
  diffcore-rename: take advantage of "majority rules" to skip more renames
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-9'</title>
<updated>2021-04-08T20:23:26Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-04-08T20:23:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1b31224e59750f515f7ceb7adab2a7609371327d'/>
<id>urn:sha1:1b31224e59750f515f7ceb7adab2a7609371327d</id>
<content type='text'>
The ort merge backend has been optimized by skipping irrelevant
renames.

* en/ort-perf-batch-9:
  diffcore-rename: avoid doing basename comparisons for irrelevant sources
  merge-ort: skip rename detection entirely if possible
  merge-ort: use relevant_sources to filter possible rename sources
  merge-ort: precompute whether directory rename detection is needed
  merge-ort: introduce wrappers for alternate tree traversal
  merge-ort: add data structures for an alternate tree traversal
  merge-ort: precompute subset of sources for which we need rename detection
  diffcore-rename: enable filtering possible rename sources
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-8'</title>
<updated>2021-03-22T21:00:24Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-03-22T21:00:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dd4048d1c76f067ad7b339dfb3a5f4765f5ae979'/>
<id>urn:sha1:dd4048d1c76f067ad7b339dfb3a5f4765f5ae979</id>
<content type='text'>
Rename detection rework continues.

* en/ort-perf-batch-8:
  diffcore-rename: compute dir_rename_guess from dir_rename_counts
  diffcore-rename: limit dir_rename_counts computation to relevant dirs
  diffcore-rename: compute dir_rename_counts in stages
  diffcore-rename: extend cleanup_dir_rename_info()
  diffcore-rename: move dir_rename_counts into dir_rename_info struct
  diffcore-rename: add function for clearing dir_rename_count
  Move computation of dir_rename_count from merge-ort to diffcore-rename
  diffcore-rename: add a mapping of destination names to their indices
  diffcore-rename: provide basic implementation of idx_possible_rename()
  diffcore-rename: use directory rename guided basename comparisons
</content>
</entry>
</feed>
