<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/midx.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2026-03-25T19:58:04Z</updated>
<entry>
<title>Merge branch 'tb/incremental-midx-part-3.2'</title>
<updated>2026-03-25T19:58:04Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-03-25T19:58:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=105a22cf692a08604bc294ec2c164528889837b0'/>
<id>urn:sha1:105a22cf692a08604bc294ec2c164528889837b0</id>
<content type='text'>
Further work on incremental repacking using MIDX/bitmap

* tb/incremental-midx-part-3.2:
  midx: enable reachability bitmaps during MIDX compaction
  midx: implement MIDX compaction
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: introduce `midx_pack_perm()` helper
  midx: do not require packs to be sorted in lexicographic order
  midx-write.c: introduce `struct write_midx_opts`
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  git-multi-pack-index(1): remove non-existent incompatibility
  builtin/multi-pack-index.c: make '--progress' a common option
  midx: introduce `midx_get_checksum_hex()`
  midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  midx: mark `get_midx_checksum()` arguments as const
</content>
</entry>
<entry>
<title>odb: embed base source in the "files" backend</title>
<updated>2026-03-05T19:45:15Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-05T14:19:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d9ecf268ef3f69130fa269012318470d908978f6'/>
<id>urn:sha1:d9ecf268ef3f69130fa269012318470d908978f6</id>
<content type='text'>
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:

  - We avoid an extra allocation.

  - We hide implementation details in the generic structure.

  - We can easily downcast from a generic backend to the specialized
    structure and vice versa because the offsets are known at compile
    time.

  - It becomes trivial to identify locations where we depend on backend
    specific logic because the cast needs to be explicit.

Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.

There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.

As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>odb: introduce "files" source</title>
<updated>2026-03-05T19:45:14Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-05T14:19:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cb506a8a69c953f7b87bb3ae099e0bed8218d3ab'/>
<id>urn:sha1:cb506a8a69c953f7b87bb3ae099e0bed8218d3ab</id>
<content type='text'>
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.

Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: do not require packs to be sorted in lexicographic order</title>
<updated>2026-02-24T19:16:33Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-02-24T19:00:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b2ec8e90c201d2395b67f89f3bf38bb472e43460'/>
<id>urn:sha1:b2ec8e90c201d2395b67f89f3bf38bb472e43460</id>
<content type='text'>
The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").

As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).

To see why, consider the following example:

          | packs       | preferred pack
  --------+-------------+---------------
  MIDX #0 | { X, Y, Z } | Y
  MIDX #1 | { A, B, C } | B
  MIDX #2 | { D, E, F } | D

, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.

But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.

Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.

One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.

However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.

Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.

This change produces MIDXs which may not be correctly read with external
tools or older versions of Git. Though older versions of Git know how to
gracefully degrade and ignore any MIDX(s) they consider corrupt,
external tools may not be as robust. To avoid unintentionally breaking
any such tools, guard this change behind a version bump in the MIDX's
on-disk format.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: introduce `midx_get_checksum_hex()`</title>
<updated>2026-02-24T19:16:32Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-02-24T18:59:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6e86f679248580bec5c105b37217862f3022b504'/>
<id>urn:sha1:6e86f679248580bec5c105b37217862f3022b504</id>
<content type='text'>
When trying to print out, say, the hexadecimal representation of a
MIDX's hash, our code will do something like:

    hash_to_hex_algop(midx_get_checksum_hash(m),
                      m-&gt;source-&gt;odb-&gt;repo-&gt;hash_algo);

, which is both cumbersome and repetitive. In fact, all but a handful of
callers to `midx_get_checksum_hash()` do exactly the above. Reduce the
repetitive nature of calling `midx_get_checksum_hash()` by having it
return a pointer into a static buffer containing the above result.

For the handful of callers that do need to compare the raw bytes and
don't want to deal with an encoded copy (e.g., because they are passing
it to hasheq() or similar), they may still rely on
`midx_get_checksum_hash()` which returns the raw bytes.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`</title>
<updated>2026-02-24T19:16:32Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-02-24T18:59:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=de811c26bb97ac324b60883ae4f4db84a83be2f1'/>
<id>urn:sha1:de811c26bb97ac324b60883ae4f4db84a83be2f1</id>
<content type='text'>
Since 541204aabea (Documentation: document naming schema for structs and
their functions, 2024-07-30), we have adopted a naming convention for
functions that would prefer a name like, say, `midx_get_checksum()` over
`get_midx_checksum()`.

Adopt this convention throughout the midx.h API. Since this function
returns a raw (that is, non-hex encoded) hash, let's suffix the function
with "_hash()" to make this clear. As a side effect, this prepares us
for the subsequent change which will introduce a "_hex()" variant that
encodes the checksum itself.

Suggested-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: mark `get_midx_checksum()` arguments as const</title>
<updated>2026-02-24T19:16:31Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-02-24T18:59:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=00c0d84e6f0cdb91b0f261e12ed058ad3baa49bf'/>
<id>urn:sha1:00c0d84e6f0cdb91b0f261e12ed058ad3baa49bf</id>
<content type='text'>
To make clear that the function `get_midx_checksum()` does not do
anything to modify its argument, mark the MIDX pointer as const.

The following commit will rename this function altogether to make clear
that it returns the raw bytes of the checksum, not a hex-encoded copy of
it.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/packfile-store-in-odb-source'</title>
<updated>2026-01-21T16:28:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-01-21T16:28:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d627023d80667b8975fee0a8876ac42df20bf7d2'/>
<id>urn:sha1:d627023d80667b8975fee0a8876ac42df20bf7d2</id>
<content type='text'>
The packfile_store data structure is moved from object store to odb
source.

* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
</content>
</entry>
<entry>
<title>packfile: move MIDX into packfile store</title>
<updated>2026-01-09T14:40:08Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-01-09T08:33:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a282a8f163fa70f9eacc880e6188141cef917058'/>
<id>urn:sha1:a282a8f163fa70f9eacc880e6188141cef917058</id>
<content type='text'>
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.

Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: move packfile store into object source</title>
<updated>2026-01-09T14:40:07Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-01-09T08:33:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=84f0e60b28de69d1ccb7a51b729af6202b6cf4c8'/>
<id>urn:sha1:84f0e60b28de69d1ccb7a51b729af6202b6cf4c8</id>
<content type='text'>
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.

Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.

Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.

Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
