<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/midx-write.c, branch v2.46.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.46.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.46.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-07-23T23:54:33Z</updated>
<entry>
<title>Merge branch 'ds/midx-write-repack-fix'</title>
<updated>2024-07-23T23:54:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-07-23T23:54:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ec9d46588ec063685b4abfbe13dfd9da39c77d7c'/>
<id>urn:sha1:ec9d46588ec063685b4abfbe13dfd9da39c77d7c</id>
<content type='text'>
Repacking a repository with multi-pack index started making stupid
pack selections in Git 2.45, which has been corrected.

* ds/midx-write-repack-fix:
  midx-write: revert use of --stdin-packs
  t5319: add failing test case for repack/expire
</content>
</entry>
<entry>
<title>midx-write: revert use of --stdin-packs</title>
<updated>2024-07-19T14:19:01Z</updated>
<author>
<name>Derrick Stolee</name>
<email>stolee@gmail.com</email>
</author>
<published>2024-07-18T19:55:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8fb6d11fade8301f0af1074fe250b9d79544cb18'/>
<id>urn:sha1:8fb6d11fade8301f0af1074fe250b9d79544cb18</id>
<content type='text'>
This reverts b7d6f23a171 (midx-write.c: use `--stdin-packs` when
repacking, 2024-04-01) and then marks the test created in the previous
change as passing.

The fundamental issue with the reverted change is that the focus on
pack-files separates the object selection from how the multi-pack-index
selects a single pack-file for an object ID with multiple copies among
the tracked pack-files.

The change was made with the intention of improving delta compression in
the resulting pack-file, but that can be resolved with the existing
object list mechanism. There are other potential pitfalls of doing an
object walk at this time if the repository is a blobless partial clone,
and that will require additional testing on top of the one that changes
here.

Signed-off-by: Derrick Stolee &lt;stolee@gmail.com&gt;
Acked-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/use-the-repository'</title>
<updated>2024-07-02T16:59:00Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-07-02T16:59:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7b472da91541d672ee220896a3a7fd4508c378f3'/>
<id>urn:sha1:7b472da91541d672ee220896a3a7fd4508c378f3</id>
<content type='text'>
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.

* ps/use-the-repository:
  hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
  t/helper: remove dependency on `the_repository` in "proc-receive"
  t/helper: fix segfault in "oid-array" command without repository
  t/helper: use correct object hash in partial-clone helper
  compat/fsmonitor: fix socket path in networked SHA256 repos
  replace-object: use hash algorithm from passed-in repository
  protocol-caps: use hash algorithm from passed-in repository
  oidset: pass hash algorithm when parsing file
  http-fetch: don't crash when parsing packfile without a repo
  hash-ll: merge with "hash.h"
  refs: avoid include cycle with "repository.h"
  global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
  hash: require hash algorithm in `empty_tree_oid_hex()`
  hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
  hash: make `is_null_oid()` independent of `the_repository`
  hash: convert `oidcmp()` and `oideq()` to compare whole hash
  global: ensure that object IDs are always padded
  hash: require hash algorithm in `oidread()` and `oidclr()`
  hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
  hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
</content>
</entry>
<entry>
<title>Merge branch 'tb/pseudo-merge-reachability-bitmap'</title>
<updated>2024-06-24T23:39:13Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-06-24T23:39:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ffa47b75cf9032c1655985f8ba934d2ba5c60e81'/>
<id>urn:sha1:ffa47b75cf9032c1655985f8ba934d2ba5c60e81</id>
<content type='text'>
The pseudo-merge reachability bitmap to help more efficient storage
of the reachability bitmap in a repository with too many refs has
been added.

* tb/pseudo-merge-reachability-bitmap: (26 commits)
  pack-bitmap.c: ensure pseudo-merge offset reads are bounded
  Documentation/technical/bitmap-format.txt: add missing position table
  t/perf: implement performance tests for pseudo-merge bitmaps
  pseudo-merge: implement support for finding existing merges
  ewah: `bitmap_equals_ewah()`
  pack-bitmap: extra trace2 information
  pack-bitmap.c: use pseudo-merges during traversal
  t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
  pack-bitmap: implement test helpers for pseudo-merge
  ewah: implement `ewah_bitmap_popcount()`
  pseudo-merge: implement support for reading pseudo-merge commits
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: scaffolding for reads
  pack-bitmap: extract `read_bitmap()` function
  pack-bitmap-write.c: write pseudo-merge table
  pseudo-merge: implement support for selecting pseudo-merge commits
  config: introduce `git_config_double()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap-write: support storing pseudo-merge commits
  ...
</content>
</entry>
<entry>
<title>global: introduce `USE_THE_REPOSITORY_VARIABLE` macro</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7da9385708accf518a80a1e17969020fb361048'/>
<id>urn:sha1:e7da9385708accf518a80a1e17969020fb361048</id>
<content type='text'>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx-write.c: do not read existing MIDX with `packs_to_include`</title>
<updated>2024-06-11T23:08:28Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-06-11T17:28:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0c5a62f14bc1f8b240a2d650ccc3f1ce82e917f1'/>
<id>urn:sha1:0c5a62f14bc1f8b240a2d650ccc3f1ce82e917f1</id>
<content type='text'>
Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
`packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
support reading from an existing MIDX when writing a new one.

Unfortunately, the rest of the MIDX generation machinery is not prepared
to deal with such a change. For instance, the function responsible for
adding to the object ID fanout table from a MIDX source
(midx_fanout_add_midx_fanout()) will gladly add objects from an existing
MIDX for some fanout level regardless of whether or not those objects
came from packs that are to be included in the subsequent MIDX write.

This results in broken pseudo-pack object order (leading to incorrect
object traversal results) and segmentation faults, like so (generated by
running the added test prior to the changes in midx-write.c):

    #0  0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
    #1  0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
        packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
        refs_snapshot=0x0, flags=15) at midx-write.c:1171
    #2  0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
        packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
        at midx-write.c:1274
    [...]

In stack frame #0, the code on midx-write.c:590 is using the new pack ID
corresponding to some object which was added from the existing MIDX.
Importantly, the pack from which that object was selected in the
existing MIDX does not appear in the new MIDX as it was excluded via
`--stdin-packs`.

In this instance, the pack in question had pack ID "1" in the existing
MIDX, but since it was excluded from the new MIDX, we never filled in
that entry in the pack_perm table, resulting in:

    (gdb) p *ctx-&gt;pack_perm@2
    $1 = {0, 1515870810}

Which is what causes the segfault above when we try and read:

    struct pack_info *pack = &amp;ctx-&gt;info[ctx-&gt;pack_perm[i]];
    if (pack-&gt;bitmap_pos == BITMAP_POS_UNKNOWN)
        pack-&gt;bitmap_pos = 0;

Fundamentally, we should be able to read information from an existing
MIDX when generating a new one. But in practice the midx-write.c code
assumes that we won't run into issues like the above with incongruent
pack IDs, and often makes those assumptions in extremely subtle and
fragile ways.

Instead, let's avoid reading from an existing MIDX altogether, and stick
with the pre-d6a8c58675 implementation. Harden against any regressions
in this area by adding a test which demonstrates these issues.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/midx-write-cleanup'</title>
<updated>2024-06-07T17:57:23Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-06-07T17:57:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1b76f065085811104b5f4adff001956d7e5c5d1c'/>
<id>urn:sha1:1b76f065085811104b5f4adff001956d7e5c5d1c</id>
<content type='text'>
Code clean-up around writing the .midx files.

* tb/midx-write-cleanup:
  pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
  midx: replace `get_midx_rev_filename()` with a generic helper
  midx-write.c: support reading an existing MIDX with `packs_to_include`
  midx-write.c: extract `fill_packs_from_midx()`
  midx-write.c: extract `should_include_pack()`
  midx-write.c: pass `start_pack` to `compute_sorted_entries()`
  midx-write.c: reduce argument count for `get_sorted_entries()`
  midx-write.c: tolerate `--preferred-pack` without bitmaps
</content>
</entry>
<entry>
<title>Merge branch 'ps/refs-without-the-repository-updates'</title>
<updated>2024-05-30T21:15:13Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-05-30T21:15:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=988499e2955f052fa5f58434e13d12285cb8a361'/>
<id>urn:sha1:988499e2955f052fa5f58434e13d12285cb8a361</id>
<content type='text'>
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.

* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
</content>
</entry>
<entry>
<title>midx-write.c: support reading an existing MIDX with `packs_to_include`</title>
<updated>2024-05-30T20:43:51Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-05-29T22:55:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d6a8c58675a04abe2597179e3332bf031bd6e48a'/>
<id>urn:sha1:d6a8c58675a04abe2597179e3332bf031bd6e48a</id>
<content type='text'>
Avoid unconditionally copying all packs from an existing MIDX into a new
MIDX by checking that packs added via `fill_packs_from_midx()` don't
appear in the `to_include` set, if one was provided.

Do so by calling `should_include_pack()` from both `add_pack_to_midx()`
and `fill_packs_from_midx()`.

In order to make this work, teach `should_include_pack()` a new
"exclude_from_midx" parameter, which allows skipping the first check.
This is done so that the caller in `fill_packs_from_midx()` doesn't
reject all of the packs it provided since they appear in an existing
MIDX by definition.

The sum total of this change is that we are now able to read and
reference objects in an existing MIDX even when given a non-NULL
`packs_to_include`. This is a prerequisite step for incremental MIDXs,
which need to load any existing MIDX (if one is present) in order to
determine whether or not an object already appears in an earlier portion
of the MIDX to avoid duplicating it across multiple portions.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx-write.c: extract `fill_packs_from_midx()`</title>
<updated>2024-05-30T20:43:51Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-05-29T22:55:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c5e204af1f395f38724aff08c3440aa86fdd9657'/>
<id>urn:sha1:c5e204af1f395f38724aff08c3440aa86fdd9657</id>
<content type='text'>
When write_midx_internal() loads an existing MIDX, all packs are copied
forward into the new MIDX. Improve the readability of
write_midx_internal() by extracting this functionality out into a
separate function.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
