<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/midx.c, branch v2.50.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.50.1</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.50.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-05-28T14:56:29Z</updated>
<entry>
<title>midx: stop repeatedly looking up nonexistent packfiles</title>
<updated>2025-05-28T14:56:29Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-05-28T12:24:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1f34bf3e082741e053d25b76a0ffe31d9d967594'/>
<id>urn:sha1:1f34bf3e082741e053d25b76a0ffe31d9d967594</id>
<content type='text'>
The multi-pack index acts as a cache across a set of packfiles so that
we can quickly look up which of those packfiles contains a given object.
As such, the multi-pack index naturally needs to be updated every time
one of the packfiles goes away, or otherwise the multi-pack index has
grown stale.

A stale multi-pack index should be handled gracefully by Git though, and
in fact it is: if the indexed pack cannot be found we simply ignore it
and eventually we fall back to doing the object lookup by just iterating
through all packs, even if those aren't indexed.

But while this fallback works, it has one significant downside: we don't
cache the fact that a pack has vanished. This leads to us repeatedly
trying to look up the same pack only to realize that it (still) doesn't
exist.

This issue can be easily demonstrated by creating a repository with a
stale multi-pack index and a couple of objects. We do so by creating a
repository with two packfiles, both of which are indexed by the
multi-pack index, and then repack those two packfiles. Note that we have
to move the multi-pack-index before doing the final repack, as Git knows
to delete it otherwise.

    $ git init repo
    $ cd repo/
    $ git config set maintenance.auto false
    $ for i in $(seq 1000); do printf "%d-original" $i &gt;file-$i; done
    $ git add .
    $ git commit -moriginal
    $ git repack -dl
    $ for i in $(seq 1000); do printf "%d-modified" $i &gt;file-$i; done
    $ git commit -a -mmodified
    $ git repack -dl
    $ git multi-pack-index write
    $ mv .git/objects/pack/multi-pack-index .
    $ git repack -Adl
    $ mv multi-pack-index .git/objects/pack/

Commands that cause a lot of objects lookups will now repeatedly invoke
`add_packed_git()`, which leads to three failed access(3p) calls as well
as one failed stat(3p) call. The following strace for example is done
for `git log --patch` in the above repository:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     74.67    0.024693           1     18038     18031 access
     25.33    0.008378           1      6045      6017 newfstatat
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.033071           1     24083     24048 total

Fix the issue by introducing a negative lookup cache for indexed packs.
This cache works by simply storing an invalid pointer for a missing pack
when `prepare_midx_pack()` fails to look up the pack. Most users of the
`packs` array don't need to be adjusted, either, as they all know to
call `prepare_midx_pack()` before accessing the array.

With this change in place we can now see a significantly reduced number
of syscalls:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     73.58    0.000323           5        60        28 newfstatat
     26.42    0.000116           5        23        16 access
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.000439           5        83        44 total

Furthermore, this change also results in a speedup:

    Benchmark 1: git log --patch (revision = HEAD~)
      Time (mean ± σ):      50.4 ms ±   2.5 ms    [User: 22.0 ms, System: 24.4 ms]
      Range (min … max):    45.4 ms …  54.9 ms    53 runs

    Benchmark 2: git log --patch (revision = HEAD)
      Time (mean ± σ):      12.7 ms ±   0.4 ms    [User: 11.1 ms, System: 1.6 ms]
      Range (min … max):    12.4 ms …  15.0 ms    191 runs

    Summary
      git log --patch (revision = HEAD) ran
        3.96 ± 0.22 times faster than git log --patch (revision = HEAD~)

In the end, it should in theory never be necessary to have this negative
lookup cache given that we know to update the multi-pack index together
with repacks. But as the change is quite contained and as the speedup
can be significant as demonstrated above, it does feel sensible to have
the negative lookup cache regardless.

Based-on-patch-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: move `git_open_cloexec()` to "compat/open.c"</title>
<updated>2025-04-15T15:24:35Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=97dc141fd676e7079c2fd51e3bea2681a5b9f824'/>
<id>urn:sha1:97dc141fd676e7079c2fd51e3bea2681a5b9f824</id>
<content type='text'>
The `git_open_cloexec()` wrapper function provides the ability to open a
file with `O_CLOEXEC` in a platform-agnostic way. This function is
provided by "object-file.c" even though it is not specific to the object
subsystem at all.

Move the file into "compat/open.c". This file already exists before this
commit, but has only been compiled conditionally depending on whether or
not open(3p) may return EINTR. With this change we now unconditionally
compile the object, but wrap `git_open_with_retry()` in an ifdef.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>csum-file: stop depending on `the_repository`</title>
<updated>2025-03-10T20:16:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=228457c9d9f32f000f5c04c36fcce9002f72965a'/>
<id>urn:sha1:228457c9d9f32f000f5c04c36fcce9002f72965a</id>
<content type='text'>
There are multiple sites in "csum-file.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`.

Refactor the code to stop using `the_repository` by adapting functions
to receive required data as parameters. Adapt callsites accordingly by
either using `the_repository-&gt;hash_algo`, or by using a context-provided
hash algorithm in case the subsystem already got rid of its dependency
on `the_repository`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>progress: stop using `the_repository`</title>
<updated>2024-12-18T18:44:30Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-17T06:43:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1f7e6478dcd9e7462c70a5784ae0d41ab25ced11'/>
<id>urn:sha1:1f7e6478dcd9e7462c70a5784ae0d41ab25ced11</id>
<content type='text'>
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/build-sign-compare' into ps/the-repository</title>
<updated>2024-12-18T18:43:16Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-18T18:43:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=913a1e157c326ee2e0c9e2074090b6b7f2a62722'/>
<id>urn:sha1:913a1e157c326ee2e0c9e2074090b6b7f2a62722</id>
<content type='text'>
* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: inline the `MIDX_MIN_SIZE` definition</title>
<updated>2024-12-04T01:32:21Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-11-27T16:28:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=24d3dd79e4810ae7bafb3180664ac435fecd567f'/>
<id>urn:sha1:24d3dd79e4810ae7bafb3180664ac435fecd567f</id>
<content type='text'>
The `MIDX_MIN_SIZE` definition is used to check the midx_size in
`local_multi_pack_index_one`. This definition relies on the
`the_hash_algo` global variable. Inline this and remove the global
variable usage.

With this, remove `USE_THE_REPOSITORY_VARIABLE` usage from `midx.c`.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: pass down `hash_algo` to functions using global variables</title>
<updated>2024-12-04T01:32:21Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-11-27T16:28:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f59de71cf700f9f8da27023ec8b5df117f99d9c8'/>
<id>urn:sha1:f59de71cf700f9f8da27023ec8b5df117f99d9c8</id>
<content type='text'>
The functions `get_split_midx_filename_ext()`, `get_midx_filename()` and
`get_midx_filename_ext()` use `hash_to_hex()` which internally uses the
`the_hash_algo` global variable.

Remove this dependency on global variables by passing down the
`hash_algo` through to the functions mentioned and instead calling
`hash_to_hex_algop()` along with the obtained `hash_algo`.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: pass `repository` to `load_multi_pack_index`</title>
<updated>2024-12-04T01:32:20Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-11-27T16:28:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d5c2ca576a47480b03a83821041955a21a645d1a'/>
<id>urn:sha1:d5c2ca576a47480b03a83821041955a21a645d1a</id>
<content type='text'>
The `load_multi_pack_index` function in midx uses `the_repository`
variable to access the `repository` struct. Modify the function and its
callee's to send the `repository` field.

This moves usage of `the_repository` to the `test-read-midx.c` file.
While that is not optimal, it is okay, since the upcoming commits will
slowly move the usage of `the_repository` up the layers and remove it
eventually.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>midx: cleanup internal usage of `the_repository` and `the_hash_algo`</title>
<updated>2024-12-04T01:32:20Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-11-27T16:28:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fae9bae7091958255fa7729f6cedb8cc4f9a3387'/>
<id>urn:sha1:fae9bae7091958255fa7729f6cedb8cc4f9a3387</id>
<content type='text'>
In the `midx.c` file, there are multiple usages of `the_repository` and
`the_hash_algo` within static functions of the file. Some of the usages
can be simply swapped out with the available `repository` struct. While
some of them can be swapped out by passing the repository to the
required functions.

This leaves out only some other usages of `the_repository` and
`the_hash_algo` in the file in non-static functions, which we'll tackle
in upcoming commits.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
