<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/cache-tree.c, branch v2.32.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.32.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.32.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-04-30T04:50:26Z</updated>
<entry>
<title>Merge branch 'ds/sparse-index-protections'</title>
<updated>2021-04-30T04:50:26Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-04-30T04:50:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8e97852919fa422bc5fe57bc7e71826cf2b5224d'/>
<id>urn:sha1:8e97852919fa422bc5fe57bc7e71826cf2b5224d</id>
<content type='text'>
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.

* ds/sparse-index-protections: (47 commits)
  name-hash: use expand_to_path()
  sparse-index: expand_to_path()
  name-hash: don't add directories to name_hash
  revision: ensure full index
  resolve-undo: ensure full index
  read-cache: ensure full index
  pathspec: ensure full index
  merge-recursive: ensure full index
  entry: ensure full index
  dir: ensure full index
  update-index: ensure full index
  stash: ensure full index
  rm: ensure full index
  merge-index: ensure full index
  ls-files: ensure full index
  grep: ensure full index
  fsck: ensure full index
  difftool: ensure full index
  commit: ensure full index
  checkout: ensure full index
  ...
</content>
</entry>
<entry>
<title>sparse-index: loose integration with cache_tree_verify()</title>
<updated>2021-03-30T19:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-03-30T13:11:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9ad2d5ea71d1470c2fd68b76c36b2ced3612cde3'/>
<id>urn:sha1:9ad2d5ea71d1470c2fd68b76c36b2ced3612cde3</id>
<content type='text'>
The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE
is enabled, which it is by default in the test suite. The logic must
be adjusted for the presence of these directory entries.

For now, leave the test as a simple check for whether the directory
entry is sparse. Do not go any further until needed.

This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in
t1092-sparse-checkout-compatibility.sh. Further,
p2000-sparse-operations.sh uses the test suite and hence this is enabled
for all tests. We need to integrate with it before we run our
performance tests with a sparse-index.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache-tree: integrate with sparse directory entries</title>
<updated>2021-03-30T19:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-03-30T13:11:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2de37c536d54a28a032491ad4ef97632ef6ab836'/>
<id>urn:sha1:2de37c536d54a28a032491ad4ef97632ef6ab836</id>
<content type='text'>
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-index: convert from full to sparse</title>
<updated>2021-03-30T19:57:47Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-03-30T13:10:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6e773527b6b03976cefbb0f9571bd40dd5995e6c'/>
<id>urn:sha1:6e773527b6b03976cefbb0f9571bd40dd5995e6c</id>
<content type='text'>
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.

For now, we avoid converting the index to a sparse index if:

 1. the index is split.
 2. the index is already sparse.
 3. sparse-checkout is disabled.
 4. sparse-checkout does not use cone mode.

Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.

The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.

The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.

Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."

We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.

The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>use CALLOC_ARRAY</title>
<updated>2021-03-14T00:00:09Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2021-03-13T16:17:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ca56dadb4b65ccaeab809d80db80a312dc00941a'/>
<id>urn:sha1:ca56dadb4b65ccaeab809d80db80a312dc00941a</id>
<content type='text'>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache-tree: extract subtree_pos()</title>
<updated>2021-01-24T01:14:07Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-01-23T19:58:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c80dd3967f28527dab49c8e9525524c7f33baa22'/>
<id>urn:sha1:c80dd3967f28527dab49c8e9525524c7f33baa22</id>
<content type='text'>
This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache-tree: simplify verify_cache() prototype</title>
<updated>2021-01-24T01:14:07Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-01-23T19:58:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8d87e338e16e022545638a0e9a3e15c6bdb56111'/>
<id>urn:sha1:8d87e338e16e022545638a0e9a3e15c6bdb56111</id>
<content type='text'>
The verify_cache() method takes an array of cache entries and a count,
but these are always provided directly from a struct index_state. Use
a pointer to the full structure instead.

There is a subtle point when istate-&gt;cache_nr is zero that subtracting
one will underflow. This triggers a failure in t0000-basic.sh, among
others. Use "i + 1 &lt; istate-&gt;cache_nr" to avoid these strange
comparisons. Convert i to be unsigned as well, which also removes the
potential signed overflow in the unlikely case that cache_nr is over 2.1
billion entries. The 'funny' variable has a maximum value of 11, so
making it unsigned does not change anything of importance.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache-tree: clean up cache_tree_update()</title>
<updated>2021-01-24T01:14:07Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-01-23T19:58:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fb0882648e0d624f825974aa7030e56daf6de2af'/>
<id>urn:sha1:fb0882648e0d624f825974aa7030e56daf6de2af</id>
<content type='text'>
Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.

Callers can also remove their conditional allocations of cache_tree.

Also drop local variables that can be found directly from the 'istate'
parameter.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache-tree: speed up consecutive path comparisons</title>
<updated>2021-01-16T07:05:13Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-01-07T16:32:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0'/>
<id>urn:sha1:a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0</id>
<content type='text'>
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache-tree: use ce_namelen() instead of strlen()</title>
<updated>2021-01-16T07:05:13Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2021-01-07T16:32:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0b72536a0b6128d2bfd05d633cd2228d7515b53d'/>
<id>urn:sha1:0b72536a0b6128d2bfd05d633cd2228d7515b53d</id>
<content type='text'>
Use the name length field of cache entries instead of calculating its
value anew.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
