<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-bitmap.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-23T17:32:22Z</updated>
<entry>
<title>Merge branch 'tb/bitmap-fix-pack-reuse'</title>
<updated>2024-12-23T17:32:22Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-23T17:32:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a08ebf8b3ef2601eb84f67b74d2f6012f78ed949'/>
<id>urn:sha1:a08ebf8b3ef2601eb84f67b74d2f6012f78ed949</id>
<content type='text'>
Code to reuse objects based on bitmap contents have been tightened
to avoid race condition even when multiple packs are involved.

* tb/bitmap-fix-pack-reuse:
  pack-bitmap.c: ensure pack validity for all reuse packs
</content>
</entry>
<entry>
<title>Merge branch 'ps/build-sign-compare'</title>
<updated>2024-12-23T17:32:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-23T17:32:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4156b6a741c7fb15a4eccb320612fb6e453f439c'/>
<id>urn:sha1:4156b6a741c7fb15a4eccb320612fb6e453f439c</id>
<content type='text'>
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
</content>
</entry>
<entry>
<title>pack-bitmap.c: ensure pack validity for all reuse packs</title>
<updated>2024-12-18T17:51:09Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-12-13T19:20:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=62b3ec8a3f160cc9f1949b28644fc3947252db73'/>
<id>urn:sha1:62b3ec8a3f160cc9f1949b28644fc3947252db73</id>
<content type='text'>
Commit 44f9fd6496 (pack-bitmap.c: check preferred pack validity when
opening MIDX bitmap, 2022-05-24) prevents a race condition whereby the
preferred pack disappears between opening the MIDX bitmap and attempting
verbatim reuse out of its packs.

That commit forces open_midx_bitmap_1() to ensure the validity of the
MIDX's preferred pack, meaning that we have an open file handle on the
*.pack, ensuring that we can reuse bytes out of verbatim later on in the
process[^1].

But 44f9fd6496 was not extended to cover multi-pack reuse, meaning that
this same race condition exists for non-preferred packs during verbatim
reuse. Work around that race in the same way by only marking valid packs
as reuse-able. For packs that aren't reusable, skip over them but
include the number of objects they have to ensure we allocate a large
enough 'reuse' bitmap (e.g. if a pack in the middle of the MIDX
disappeared but we still want to reuse later packs).

Since we're ensuring the validity of these packs within the verbatim
reuse code, we no longer have to special-case the preferred pack and
open it within the open_midx_bitmap_1() function.

An alternative approach to the one taken here would be to open all
MIDX'd packs from within open_midx_bitmap_1(). But that would be both
slower and make the bitmaps less useful, since we can still perform some
pack reuse among the packs that still exist when the *.bitmap is opened.

After applying this patch, we can simulate the new behavior after
instrumenting Git like so:

    diff --git a/packfile.c b/packfile.c
    index 9560f0a33c..aedce72524 100644
    --- a/packfile.c
    +++ b/packfile.c
    @@ -557,6 +557,11 @@ static int open_packed_git_1(struct packed_git *p)
     		; /* nothing */

     	p-&gt;pack_fd = git_open(p-&gt;pack_name);
    +	{
    +		const char *delete = getenv("GIT_RACILY_DELETE");
    +		if (delete &amp;&amp; !strcmp(delete, pack_basename(p)))
    +			return -1;
    +	}
     	if (p-&gt;pack_fd &lt; 0 || fstat(p-&gt;pack_fd, &amp;st))
     		return -1;
     	pack_open_fds++;

and adding the following test:

    test_expect_success 'disappearing packs' '
            git init disappearing-packs &amp;&amp;
            (
                    cd disappearing-packs &amp;&amp;

                    git config pack.allowPackReuse multi &amp;&amp;

                    test_commit A &amp;&amp;
                    test_commit B &amp;&amp;
                    test_commit C &amp;&amp;

                    A="$(echo "A" | git pack-objects --revs $packdir/pack-A)" &amp;&amp;
                    B="$(echo "A..B" | git pack-objects --revs $packdir/pack-B)" &amp;&amp;
                    C="$(echo "B..C" | git pack-objects --revs $packdir/pack-C)" &amp;&amp;

                    git multi-pack-index write --bitmap --preferred-pack=pack-A-$A.idx &amp;&amp;

                    test_pack_objects_reused_all 9 3 &amp;&amp;

                    test_env GIT_RACILY_DELETE=pack-A-$A.pack \
                            test_pack_objects_reused_all 6 2 &amp;&amp;
                    test_env GIT_RACILY_DELETE=pack-B-$B.pack \
                            test_pack_objects_reused_all 6 2 &amp;&amp;
                    test_env GIT_RACILY_DELETE=pack-C-$C.pack \
                            test_pack_objects_reused_all 6 2

            )
    '

Note that we could relax the single-pack version of this which was most
recently addressed in dc1daacdcc (pack-bitmap: check pack validity when
opening bitmap, 2021-07-23), but only partially. Because we still need
to know the object count in the pack, we'd still have to open the pack's
*.idx, so the savings there are marginal.

Note likewise that we add a new "if (!packs_nr)" early return in the
pack reuse code to avoid a potentially expensive allocation on the
'reuse' bitmap in the case that no packs are available for reuse.

[^1]: Unless we run out of open file handles. If that happens and we are
  forced to close the only open file handle of a file that has been
  removed from underneath us, there is nothing we can do.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'kn/midx-wo-the-repository'</title>
<updated>2024-12-13T15:33:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-13T15:33:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ca43bd2562fb85b34c846c1bbc2175e1ef145bf1'/>
<id>urn:sha1:ca43bd2562fb85b34c846c1bbc2175e1ef145bf1</id>
<content type='text'>
Yet another "pass the repository through the callchain" topic.

* kn/midx-wo-the-repository:
  midx: inline the `MIDX_MIN_SIZE` definition
  midx: pass down `hash_algo` to functions using global variables
  midx: pass `repository` to `load_multi_pack_index`
  midx: cleanup internal usage of `the_repository` and `the_hash_algo`
  midx-write: pass down repository to `write_midx_file[_only]`
  write-midx: add repository field to `write_midx_context`
  midx-write: use `revs-&gt;repo` inside `read_refs_snapshot`
  midx-write: pass down repository to static functions
  packfile.c: remove unnecessary prepare_packed_git() call
  midx: add repository to `multi_pack_index` struct
  config: make `packed_git_(limit|window_size)` non-global variables
  config: make `delta_base_cache_limit` a non-global variable
  packfile: pass down repository to `for_each_packed_object`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `odb_pack_name`
  packfile: pass `repository` to static function in the file
  packfile: use `repository` from `packed_git` directly
  packfile: add repository to struct `packed_git`
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: pass down `hash_algo` to functions using global variables</title>
<updated>2024-12-04T01:32:21Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-11-27T16:28:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f59de71cf700f9f8da27023ec8b5df117f99d9c8'/>
<id>urn:sha1:f59de71cf700f9f8da27023ec8b5df117f99d9c8</id>
<content type='text'>
The functions `get_split_midx_filename_ext()`, `get_midx_filename()` and
`get_midx_filename_ext()` use `hash_to_hex()` which internally uses the
`the_hash_algo` global variable.

Remove this dependency on global variables by passing down the
`hash_algo` through to the functions mentioned and instead calling
`hash_to_hex_algop()` along with the obtained `hash_algo`.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/boundary-traversal-fix'</title>
<updated>2024-12-04T01:14:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-04T01:14:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0a0712e05f12d95478671f314a4296ae18dbb664'/>
<id>urn:sha1:0a0712e05f12d95478671f314a4296ae18dbb664</id>
<content type='text'>
A trivial "correctness" fix that does not yet matter in practice.

* tb/boundary-traversal-fix:
  pack-bitmap.c: typofix in `find_boundary_objects()`
</content>
</entry>
<entry>
<title>midx: add repository to `multi_pack_index` struct</title>
<updated>2024-12-03T23:21:55Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:44:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e106040722c746346c3c40d3790789d9e196f60d'/>
<id>urn:sha1:e106040722c746346c3c40d3790789d9e196f60d</id>
<content type='text'>
The `multi_pack_index` struct represents the MIDX for a repository.
Here, we add a pointer to the repository in this struct, allowing direct
use of the repository variable without relying on the global
`the_repository` struct.

With this addition, we can determine the repository associated with a
`bitmap_index` struct. A `bitmap_index` points to either a `packed_git`
or a `multi_pack_index`, both of which have direct repository
references. To support this, we introduce a static helper function,
`bitmap_repo`, in `pack-bitmap.c`, which retrieves a repository given a
`bitmap_index`.

With this, we clear up all usages of `the_repository` within
`pack-bitmap.c` and also remove the `USE_THE_REPOSITORY_VARIABLE`
definition. Bringing us another step closer to remove all global
variable usage.

Although this change also opens up the potential to clean up `midx.c`,
doing so would require additional refactoring to pass the repository
struct to functions where the MIDX struct is created: a task better
suited for future patches.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: pass down repository to `has_object[_kept]_pack`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cc656f4eb2b7b10bc530c96844909c869bdd1fdf'/>
<id>urn:sha1:cc656f4eb2b7b10bc530c96844909c869bdd1fdf</id>
<content type='text'>
The functions `has_object[_kept]_pack` currently rely on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from these
functions and any related ones.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap.c: typofix in `find_boundary_objects()`</title>
<updated>2024-11-21T23:57:18Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-11-21T22:50:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91f88f76e631a60b51a4814d938f18587f5e2210'/>
<id>urn:sha1:91f88f76e631a60b51a4814d938f18587f5e2210</id>
<content type='text'>
In the boundary-based bitmap traversal, we use the given 'rev_info'
structure to first do a commit-only walk in order to determine the
boundary between interesting and uninteresting objects.

That walk only looks at commit objects, regardless of the state of
revs-&gt;blob_objects, revs-&gt;tree_objects, and so on. In order to do this,
we store the state of these variables in temporary fields before
setting them back to zero, performing the traversal, and then setting
them back.

But there is a typo here that dates back to b0afdce5da (pack-bitmap.c:
use commit boundary during bitmap traversal, 2023-05-08), where we
incorrectly store the value of the "tags" field as "revs-&gt;blob_objects".

This could lead to problems later on if, say, the caller wants tag
objects but *not* blob objects. In the pre-image behavior, we'd set
revs-&gt;tag_objects back to the old value of revs-&gt;blob_objects, thus
emitting fewer objects than expected back to the caller.

Fix that by correctly assigning the value of 'revs-&gt;tag_objects' to the
'tmp_tags' field.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
