<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/sparse-index.c, branch v2.40.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.40.3</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.40.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-01-17T22:32:06Z</updated>
<entry>
<title>treewide: always have a valid "index_state.repo" member</title>
<updated>2023-01-17T22:32:06Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2023-01-17T13:57:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6269f8eaad054a02517f5e03873726e84a032d8e'/>
<id>urn:sha1:6269f8eaad054a02517f5e03873726e84a032d8e</id>
<content type='text'>
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.

Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".

This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate-&gt;repo ? istate-&gt;repo : the_repository"
code in [2]. We can now simply use "istate-&gt;repo".

We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.

Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo-&gt;index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().

For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate-&gt;repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).

This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r-&gt;settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".

I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.

1. 1fd9ae517c4 (repository: add repo reference to index_state,
   2021-01-23)
2. ee1f0c242ef (read-cache: add index.skipHash config option,
   2023-01-06)
3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function,
   add release_index(), 2023-01-12)
4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific,
   2022-03-25)

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Acked-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-index API: BUG() out on NULL ensure_full_index()</title>
<updated>2023-01-13T18:36:57Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2023-01-12T12:55:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=29fefafcba0a3608465491835d84b646ce1e1b6e'/>
<id>urn:sha1:29fefafcba0a3608465491835d84b646ce1e1b6e</id>
<content type='text'>
Make the ensure_full_index() function stricter, and have it only
accept a non-NULL "struct index_state". This function (and this
behavior) was added in [1].

The only reason it needed to be this lax was due to interaction with
repo_index_has_changes(). See the addition of that code in [2].

The other reason for why this was needed dates back to interaction
with code added in [3]. In [4] we started calling ensure_full_index()
in unpack_trees(), but the caller added in 34110cd4e39 wants to pass
us a NULL "dst_index". Let's instead do the NULL check in
unpack_trees() itself.

1. 4300f8442a2 (sparse-index: implement ensure_full_index(), 2021-03-30)
2. 0c18c059a15 (read-cache: ensure full index, 2021-04-01)
3. 34110cd4e39 (Make 'unpack_trees()' have a separate source and
   destination index, 2008-03-06)
4. 6863df35503 (unpack-trees: ensure full index, 2021-03-30)

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Acked-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-index.c: expand_to_path() can assume non-NULL "istate"</title>
<updated>2023-01-13T18:36:57Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2023-01-12T12:55:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d2cdf2c285b091f44e0bdc154ee39c0071f8934e'/>
<id>urn:sha1:d2cdf2c285b091f44e0bdc154ee39c0071f8934e</id>
<content type='text'>
This function added in [1] was subsequently used in [2]. All of the
calls to it are in name-hash.c, and come after calls to
lazy_init_name_hash(istate). The first thing that function does is:

	if (istate-&gt;name_hash_initialized)
		return;

So we can already assume that we have a non-NULL "istate" here, or
we'd be segfaulting. Let's not confuse matters by making it appear
that's not the case.

1. 71f82d032f3 (sparse-index: expand_to_path(), 2021-04-12)
2. 4589bca829a (name-hash: use expand_to_path(), 2021-04-12)

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Acked-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>index: raise a bug if the index is materialised more than once</title>
<updated>2022-11-05T00:28:28Z</updated>
<author>
<name>Anh Le</name>
<email>anh@canva.com</email>
</author>
<published>2022-11-03T23:05:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8c7abdc596d572bee5d001d4e889c793f7020588'/>
<id>urn:sha1:8c7abdc596d572bee5d001d4e889c793f7020588</id>
<content type='text'>
If clear_skip_worktree_from_present_files() encounter a sparse directory,
it fully materialise the index which should expand any sparse directories
and start going through each entries again. If this happens more than once,
raise it with a BUG.

Signed-off-by: Anh Le &lt;anh@canva.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>index: add trace2 region for clear skip worktree</title>
<updated>2022-11-05T00:28:28Z</updated>
<author>
<name>Anh Le</name>
<email>anh@canva.com</email>
</author>
<published>2022-11-03T23:05:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=89aaab11a34d9b4a7421fbd10a0e399135b2dc2c'/>
<id>urn:sha1:89aaab11a34d9b4a7421fbd10a0e399135b2dc2c</id>
<content type='text'>
When using sparse checkout, clear_skip_worktree_from_present_files() must
enumerate index entries to find ones with the SKIP_WORKTREE bit to
determine whether those index entries exist on disk (in which case their
SKIP_WORKTREE bit should be removed).

In a large repository, this may take considerable time depending on the
size of the index.

Add a trace2 region to surface this information, keeping a count of how
many paths have been checked. Separately, keep counts after a full index is
materialized.

Signed-off-by: Anh Le &lt;anh@canva.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/sparse-sparse-checkout'</title>
<updated>2022-06-03T21:30:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-06-03T21:30:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c276c21da6d055060b1c216de1b1c04fb058425f'/>
<id>urn:sha1:c276c21da6d055060b1c216de1b1c04fb058425f</id>
<content type='text'>
"sparse-checkout" learns to work well with the sparse-index
feature.

* ds/sparse-sparse-checkout:
  sparse-checkout: integrate with sparse index
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-index: complete partial expansion
  sparse-index: partially expand directories
  sparse-checkout: --no-sparse-index needs a full index
  cache-tree: implement cache_tree_find_path()
  sparse-index: introduce partially-sparse indexes
  sparse-index: create expand_index()
  t1092: stress test 'git sparse-checkout set'
  t1092: refactor 'sparse-index contents' test
</content>
</entry>
<entry>
<title>sparse-index: complete partial expansion</title>
<updated>2022-05-23T18:08:21Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2022-05-23T13:48:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ac8acb4f2c70dd95c582bd5d4fb4f689f82ff3c6'/>
<id>urn:sha1:ac8acb4f2c70dd95c582bd5d4fb4f689f82ff3c6</id>
<content type='text'>
To complete the implementation of expand_to_pattern_list(), we need to
detect when a sparse directory entry should remain sparse. This avoids a
full expansion, so we now need to use the PARTIALLY_SPARSE mode to
indicate this state.

There still are no callers to this method, but we will add one in the
next change.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-index: partially expand directories</title>
<updated>2022-05-23T18:08:21Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2022-05-23T13:48:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0243930af40f1ede50598c7de0965bdbe6fcbe30'/>
<id>urn:sha1:0243930af40f1ede50598c7de0965bdbe6fcbe30</id>
<content type='text'>
The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate-&gt;sparse_checkout_patterns.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-index: introduce partially-sparse indexes</title>
<updated>2022-05-23T18:08:21Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2022-05-23T13:48:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9fadb373dd4a670e761776560d3c40f6fcc80360'/>
<id>urn:sha1:9fadb373dd4a670e761776560d3c40f6fcc80360</id>
<content type='text'>
A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* INDEX_EXPANDED (0): No sparse directories exist. This is always the
  case for repositories that do not use cone-mode sparse-checkout.

* INDEX_COLLAPSED: Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
  entries outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
INDEX_EXPANDED mode but to actually do the necessary work when in
INDEX_PARTIALLY_SPARSE mode.

The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-index: create expand_index()</title>
<updated>2022-05-23T18:08:21Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2022-05-23T13:48:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dce241b020cf32c9485c7ef23247f0b003731afa'/>
<id>urn:sha1:dce241b020cf32c9485c7ef23247f0b003731afa</id>
<content type='text'>
This is the first change in a series to allow modifying the
sparse-checkout pattern set without expanding a sparse index to a full
one in the process. Here, we focus on the problem of expanding the
pattern set through a command like 'git sparse-checkout add &lt;path&gt;'
which needs to create new index entries for the paths now being written
to the worktree.

To achieve this, we need to be able to replace sparse directory entries
with their contained files and subdirectories. Once this is complete,
other code paths can discover those cache entries and write the
corresponding files to disk before committing the index.

We already have logic in ensure_full_index() that expands the index
entries, so we will use that as our base. Create a new method,
expand_index(), which takes a pattern list, but for now mostly ignores
it. The current implementation is only correct when the pattern list is
NULL as that does the same as ensure_full_index(). In fact,
ensure_full_index() is converted to a shim over expand_index().

A future update will actually implement expand_index() to its full
capabilities. For now, it is created and documented.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
