<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-bitmap.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2026-03-25T19:58:04Z</updated>
<entry>
<title>Merge branch 'tb/incremental-midx-part-3.2'</title>
<updated>2026-03-25T19:58:04Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-03-25T19:58:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=105a22cf692a08604bc294ec2c164528889837b0'/>
<id>urn:sha1:105a22cf692a08604bc294ec2c164528889837b0</id>
<content type='text'>
Further work on incremental repacking using MIDX/bitmap

* tb/incremental-midx-part-3.2:
  midx: enable reachability bitmaps during MIDX compaction
  midx: implement MIDX compaction
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: introduce `midx_pack_perm()` helper
  midx: do not require packs to be sorted in lexicographic order
  midx-write.c: introduce `struct write_midx_opts`
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  git-multi-pack-index(1): remove non-existent incompatibility
  builtin/multi-pack-index.c: make '--progress' a common option
  midx: introduce `midx_get_checksum_hex()`
  midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  midx: mark `get_midx_checksum()` arguments as const
</content>
</entry>
<entry>
<title>midx: introduce `midx_get_checksum_hex()`</title>
<updated>2026-02-24T19:16:32Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-02-24T18:59:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6e86f679248580bec5c105b37217862f3022b504'/>
<id>urn:sha1:6e86f679248580bec5c105b37217862f3022b504</id>
<content type='text'>
When trying to print out, say, the hexadecimal representation of a
MIDX's hash, our code will do something like:

    hash_to_hex_algop(midx_get_checksum_hash(m),
                      m-&gt;source-&gt;odb-&gt;repo-&gt;hash_algo);

, which is both cumbersome and repetitive. In fact, all but a handful of
callers to `midx_get_checksum_hash()` do exactly the above. Reduce the
repetitive nature of calling `midx_get_checksum_hash()` by having it
return a pointer into a static buffer containing the above result.

For the handful of callers that do need to compare the raw bytes and
don't want to deal with an encoded copy (e.g., because they are passing
it to hasheq() or similar), they may still rely on
`midx_get_checksum_hash()` which returns the raw bytes.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`</title>
<updated>2026-02-24T19:16:32Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-02-24T18:59:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=de811c26bb97ac324b60883ae4f4db84a83be2f1'/>
<id>urn:sha1:de811c26bb97ac324b60883ae4f4db84a83be2f1</id>
<content type='text'>
Since 541204aabea (Documentation: document naming schema for structs and
their functions, 2024-07-30), we have adopted a naming convention for
functions that would prefer a name like, say, `midx_get_checksum()` over
`get_midx_checksum()`.

Adopt this convention throughout the midx.h API. Since this function
returns a raw (that is, non-hex encoded) hash, let's suffix the function
with "_hash()" to make this clear. As a side effect, this prepares us
for the subsequent change which will introduce a "_hex()" variant that
encodes the checksum itself.

Suggested-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refs: replace `refs_for_each_ref_in()`</title>
<updated>2026-02-23T21:21:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-02-23T11:59:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=00be226f1f2a1036ea3920f8700b23b7cc55bf57'/>
<id>urn:sha1:00be226f1f2a1036ea3920f8700b23b7cc55bf57</id>
<content type='text'>
Replace calls to `refs_for_each_ref_in()` with the newly introduced
`refs_for_each_ref_ext()` function.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refs: rename `each_ref_fn`</title>
<updated>2026-02-23T21:21:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-02-23T11:59:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=635f08b7394b9dda013a0b78f4db11348dc7717b'/>
<id>urn:sha1:635f08b7394b9dda013a0b78f4db11348dc7717b</id>
<content type='text'>
Similar to the preceding commit, rename `each_ref_fn` to better match
our current best practices around how we name things.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: fix bug with exact ref match in "pack.preferBitmapTips"</title>
<updated>2026-02-19T18:41:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-02-19T07:57:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7174098834e30d2da1834bbfd0aaf575f6d6bb1a'/>
<id>urn:sha1:7174098834e30d2da1834bbfd0aaf575f6d6bb1a</id>
<content type='text'>
The "pack.preferBitmapTips" configuration allows the user to specify
which references should be preferred when generating bitmaps. This
option is typically expected to be set to a reference prefix, like for
example "refs/heads/".

It's not unreasonable though for a user to configure one specific
reference as preferred. But if they do, they'll hit a `BUG()`:

    $ git -c pack.preferBitmapTips=refs/heads/main repack -adb
    BUG: ../refs/iterator.c:366: attempt to trim too many characters
    error: pack-objects died of signal 6

The root cause for this bug is how we enumerate these references. We
call `refs_for_each_ref_in()`, which will:

  - Yield all references that have a user-specified prefix.

  - Trim each of these references so that the prefix is removed.

Typically, this function is called with a trailing slash, like
"refs/heads/", and in that case things work alright. But if the function
is called with the name of an existing reference then we'll try to trim
the full reference name, which would leave us with an empty name. And as
this would not really leave us with anything sensible, we call `BUG()`
instead of yielding this reference.

One could argue that this is a bug in `refs_for_each_ref_in()`. But the
question then becomes what the correct behaviour would be:

  - Do we want to skip exact matches? In our case we certainly don't
    want that, as the user has asked us to generate a bitmap for it.

  - Do we want to yield the reference with the empty refname? That would
    lead to a somewhat weird result.

Neither of these feel like viable options, so calling `BUG()` feels like
a sensible way out. The root cause ultimately is that we even try to
trim the whole refname in the first place. There are two possible ways
to fix this issue:

  - We can fix the bug by using `refs_for_each_fullref_in()` instead,
    which does not strip the prefix at all. Consequently, we would now
    start to accept all references that start with the configured
    prefix, including exact matches. So if we had "refs/heads/main", we
    would both match "refs/heads/main" and "refs/heads/main-branch".

  - Or we can fix the bug by appending a slash to the prefix if it
    doesn't already have one. This would mean that we only match
    ref hierarchies that start with this prefix.

While the first fix leaves the user with strictly _more_ configuration
options, we have already fixed a similar case in 10e8a9352b (refs.c:
stop matching non-directory prefixes in exclude patterns, 2025-03-06) by
using the second option. So for the sake of consistency, let's apply the
same fix here.

Clarify the documentation accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: deduplicate logic to iterate over preferred bitmap tips</title>
<updated>2026-02-19T18:41:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-02-19T07:57:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ed693078e988adf66c969feb6de9f83c7abe7d24'/>
<id>urn:sha1:ed693078e988adf66c969feb6de9f83c7abe7d24</id>
<content type='text'>
We have two locations that iterate over the preferred bitmap tips as
configured by the user via "pack.preferBitmapTips". Both of these
callsites are subtly wrong: when the preferred bitmap tips contain an
exact refname match, then we will hit a `BUG()`.

Prepare for the fix by unifying the two callsites into a new
`for_each_preferred_bitmap_tip()` function.

This removes the last callsite of `bitmap_preferred_tips()` outside of
"pack-bitmap.c". As such, convert the function to be local to that file
only. Note that the function is still used by a second caller, so we
cannot just inline it.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: drop repository parameter from `packed_object_info()`</title>
<updated>2026-01-12T14:51:15Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-01-12T09:00:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=12d3b58b5552750f351ded7166b347446d9543f3'/>
<id>urn:sha1:12d3b58b5552750f351ded7166b347446d9543f3</id>
<content type='text'>
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.

Drop the redundant parameter.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/asan-bonanza'</title>
<updated>2025-12-01T02:31:41Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-12-01T02:31:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=aea8cc3a10c325a22a75e2d4f582db959d3854ae'/>
<id>urn:sha1:aea8cc3a10c325a22a75e2d4f582db959d3854ae</id>
<content type='text'>
Various issues detected by Asan have been corrected.

* jk/asan-bonanza:
  t: enable ASan's strict_string_checks option
  fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated
  fsck: remove redundant date timestamp check
  fsck: avoid strcspn() in fsck_ident()
  fsck: assert newline presence in fsck_ident()
  cache-tree: avoid strtol() on non-string buffer
  Makefile: turn on NO_MMAP when building with ASan
  pack-bitmap: handle name-hash lookups in incremental bitmaps
  compat/mmap: mark unused argument in git_munmap()
</content>
</entry>
<entry>
<title>pack-bitmap: handle name-hash lookups in incremental bitmaps</title>
<updated>2025-11-18T17:36:06Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-11-18T09:12:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4deb882e54152c31bef23f8b33ad38b7bc26d398'/>
<id>urn:sha1:4deb882e54152c31bef23f8b33ad38b7bc26d398</id>
<content type='text'>
If a bitmap has a name-hash cache, it is an array of 32-bit integers,
one per entry in the bitmap, which we've mmap'd from the .bitmap file.
We access it directly like this:

    if (bitmap_git-&gt;hashes)
            hash = get_be32(bitmap_git-&gt;hashes + index_pos);

That works for both regular pack bitmaps and for non-incremental midx
bitmaps. There is one bitmap_index with one "hashes" array, and
index_pos is within its bounds (we do the bounds-checking when we load
the bitmap).

But for an incremental midx bitmap, we have a linked list of
bitmap_index structs, and each one has only its own small slice of the
name-hash array. If index_pos refers to an object that is not in the
first bitmap_git of the chain, then we'll access memory outside of the
bounds of its "hashes" array, and often outside of the mmap.

Instead, we should walk through the list until we find the bitmap_index
which serves our index_pos, and use its hash (after adjusting index_pos
to make it relative to the slice we found). This is exactly what we do
elsewhere for incremental midx lookups (like the pack_pos_to_midx() call
a few lines above). But we can't use existing helpers like
midx_for_object() here, because we're walking through the chain of
bitmap_index structs (each of which refers to a midx), not the chain of
incremental multi_pack_index structs themselves.

The problem is triggered in the test suite, but we don't get a segfault
because the out-of-bounds index is too small. The OS typically rounds
our mmap up to the nearest page size, so we just end up accessing some
extra zero'd memory. Nor do we catch it with ASan, since it doesn't seem
to instrument mmaps at all. But if we build with NO_MMAP, then our maps
are replaced with heap allocations, which ASan does check. And so:

  make NO_MMAP=1 SANITIZE=address
  cd t
  ./t5334-incremental-multi-pack-index.sh

does show the problem (and this patch makes it go away).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
