git/midx.c, branch jch

Merge branch 'tb/incremental-midx-part-3.2'

2026-03-25T19:58:04Z

Further work on incremental repacking using MIDX/bitmap * tb/incremental-midx-part-3.2: midx: enable reachability bitmaps during MIDX compaction midx: implement MIDX compaction t/helper/test-read-midx.c: plug memory leak when selecting layer midx-write.c: factor fanout layering from `compute_sorted_entries()` midx-write.c: enumerate `pack_int_id` values directly midx-write.c: extract `fill_pack_from_midx()` midx-write.c: introduce `midx_pack_perm()` helper midx: do not require packs to be sorted in lexicographic order midx-write.c: introduce `struct write_midx_opts` midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' git-multi-pack-index(1): remove non-existent incompatibility builtin/multi-pack-index.c: make '--progress' a common option midx: introduce `midx_get_checksum_hex()` midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` midx: mark `get_midx_checksum()` arguments as const

odb: embed base source in the "files" backend

2026-03-05T19:45:15Z

The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

odb: introduce "files" source

2026-03-05T19:45:14Z

Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

midx: do not require packs to be sorted in lexicographic order

2026-02-24T19:16:33Z

The MIDX file format currently requires that pack files be identified by the lexicographic ordering of their names (that is, a pack having a checksum beginning with "abc" would have a numeric pack_int_id which is smaller than the same value for a pack beginning with "bcd"). As a result, it is impossible to combine adjacent MIDX layers together without permuting bits from bitmaps that are in more recent layer(s). To see why, consider the following example: | packs | preferred pack --------+-------------+--------------- MIDX #0 | { X, Y, Z } | Y MIDX #1 | { A, B, C } | B MIDX #2 | { D, E, F } | D , where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want to combine MIDX layers #0 and #1, to create a new layer #0' containing the packs from both layers. With the original three MIDX layers, objects are laid out in the bitmap in the order they appear in their source pack, and the packs themselves are arranged according to the pseudo-pack order. In this case, that ordering is Y, X, Z, B, A, C. But recall that the pseudo-pack ordering is defined by the order that packs appear in the MIDX, with the exception of the preferred pack, which sorts ahead of all other packs regardless of its position within the MIDX. In the above example, that means that pack 'Y' could be placed anywhere (so long as it is designated as preferred), however, all other packs must be placed in the location listed above. Because that ordering isn't sorted lexicographically, it is impossible to compact MIDX layers in the above configuration without permuting the object-to-bit-position mapping. Changing this mapping would affect all bitmaps belonging to newer layers, rendering the bitmaps associated with MIDX #2 unreadable. One of the goals of MIDX compaction is that we are able to shrink the length of the MIDX chain *without* invalidating bitmaps that belong to newer layers, and the lexicographic ordering constraint is at odds with this goal. However, packs do not *need* to be lexicographically ordered within the MIDX. As far as I can gather, the only reason they are sorted lexically is to make it possible to perform a binary search over the pack names in a MIDX, necessary to make `midx_contains_pack()`'s performance logarithmic in the number of packs rather than linear. Relax this constraint by allowing MIDX writes to proceed with packs that are not arranged in lexicographic order. `midx_contains_pack()` will lazily instantiate a `pack_names_sorted` array on the MIDX, which will be used to implement the binary search over pack names. This change produces MIDXs which may not be correctly read with external tools or older versions of Git. Though older versions of Git know how to gracefully degrade and ignore any MIDX(s) they consider corrupt, external tools may not be as robust. To avoid unintentionally breaking any such tools, guard this change behind a version bump in the MIDX's on-disk format. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano

midx: introduce `midx_get_checksum_hex()`

2026-02-24T19:16:32Z

When trying to print out, say, the hexadecimal representation of a MIDX's hash, our code will do something like: hash_to_hex_algop(midx_get_checksum_hash(m), m->source->odb->repo->hash_algo); , which is both cumbersome and repetitive. In fact, all but a handful of callers to `midx_get_checksum_hash()` do exactly the above. Reduce the repetitive nature of calling `midx_get_checksum_hash()` by having it return a pointer into a static buffer containing the above result. For the handful of callers that do need to compare the raw bytes and don't want to deal with an encoded copy (e.g., because they are passing it to hasheq() or similar), they may still rely on `midx_get_checksum_hash()` which returns the raw bytes. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano

midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`

2026-02-24T19:16:32Z

Since 541204aabea (Documentation: document naming schema for structs and their functions, 2024-07-30), we have adopted a naming convention for functions that would prefer a name like, say, `midx_get_checksum()` over `get_midx_checksum()`. Adopt this convention throughout the midx.h API. Since this function returns a raw (that is, non-hex encoded) hash, let's suffix the function with "_hash()" to make this clear. As a side effect, this prepares us for the subsequent change which will introduce a "_hex()" variant that encodes the checksum itself. Suggested-by: Patrick Steinhardt Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano

midx: mark `get_midx_checksum()` arguments as const

2026-02-24T19:16:31Z

To make clear that the function `get_midx_checksum()` does not do anything to modify its argument, mark the MIDX pointer as const. The following commit will rename this function altogether to make clear that it returns the raw bytes of the checksum, not a hex-encoded copy of it. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano

Merge branch 'ps/packfile-store-in-odb-source'

2026-01-21T16:28:59Z

The packfile_store data structure is moved from object store to odb source. * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source

packfile: move MIDX into packfile store

2026-01-09T14:40:08Z

The multi-pack index still is tracked as a member of the object database source, but ultimately the MIDX is always tied to one specific packfile store. Move the structure into `struct packfile_store` accordingly. This ensures that the packfile store now keeps track of all data related to packfiles. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

packfile: move packfile store into object source

2026-01-09T14:40:07Z

The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano