<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/t5318-commit-graph.sh, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2026-03-24T15:39:37Z</updated>
<entry>
<title>commit-graph: fix writing generations with dates exceeding 34 bits</title>
<updated>2026-03-24T15:39:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-24T06:18:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04c9c5e8d2d99050d260149cad9dde1302a02ff4'/>
<id>urn:sha1:04c9c5e8d2d99050d260149cad9dde1302a02ff4</id>
<content type='text'>
The `timestamp_t` type is declared as `uintmax_t` and thus typically has
64 bits of precision. Usually, the full precision of such dates is not
required: it would be comforting to know that Git is still around in
millions of years, but all in all the chance is rather low.

We abuse this fact in the commit-graph: instead of storing the full 64
bits of precision, committer dates only store 34 bits. This is still
plenty of headroom, as it means that we can represent dates until year
2514. Commits which are dated beyond that year will simply get a date
whose remaining bits are masked.

The result of this is somewhat curious: the committer date will be
different depending on whether a commit gets parsed via the commit-graph
or via the object database. This isn't really too much of an issue in
general though, as we don't typically use the date parsed from the
commit-graph in user-facing output.

But with 024b4c9697 (commit: make `repo_parse_commit_no_graph()` more
robust, 2026-02-16) it started to become a problem when writing the
commit-graph itself. This commit changed `repo_parse_commit_no_graph()`
so that we re-parse the commit via the object database in case it was
already parsed beforehand via the commit-graph.

The consequence is that we may now act with two different commit dates
at different stages:

  - Initially, we use the 34-bit precision timestamp when writing the
    chunk generation data. We thus correctly compute the offsets
    relative to the on-disk timestamp here.

  - Later, when writing the overflow data, we may end up with the
    full-precision timestamp. When the date is larger than 34 bits the
    result of this is an underflow when computing the offset.

This causes a mismatch in the number of generation data overflow records
we want to write, and that ultimately causes Git to die.

Introduce a new helper function that computes the generation offset for
a commit while correctly masking the date to 34 bits. This makes the
previously-implicit assumptions about the commit date precision explicit
and thus hopefully less fragile going forward.

Adapt sites that compute the offset to use the function.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: add new config for changed-paths &amp; recommend it in scalar</title>
<updated>2025-10-22T17:40:11Z</updated>
<author>
<name>Emily Yang</name>
<email>emilyyang.git@gmail.com</email>
</author>
<published>2025-10-17T20:58:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fafdf23b2f57bdf6a74513b3cc03902f0a8e954d'/>
<id>urn:sha1:fafdf23b2f57bdf6a74513b3cc03902f0a8e954d</id>
<content type='text'>
The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option. For this functionality, refer to 0087a87ba8
(commit-graph: persist existence of changed-paths, 2020-07-01).

Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths":

* If "true", "git commit-graph write" will write Bloom filters,
  equivalent to passing "--changed-paths";
* If "false" or "unset", Bloom filters will be written during "git
  commit-graph write" only if the filters already exist in the current
  commit-graph file. This matches the default behaviour of "git
  commit-graph write" without any "--[no-]changed-paths" option. Note
  "false" can disable a previous "true" config value but doesn't imply
  "--no-changed-paths".

This config will always respect the precedence of command line option
"--[no-]changed-paths".

We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.

Helped-by: Derrick Stolee &lt;stolee@gmail.com&gt;
Signed-off-by: Emily Yang &lt;emilyyang.git@gmail.com&gt;
Acked-by: Derrick Stolee &lt;stolee@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t: introduce PERL_TEST_HELPERS prerequisite</title>
<updated>2025-04-07T21:47:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-03T05:05:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=23e21a58d5c7b5ae7b4b5532933e0f82e24024fe'/>
<id>urn:sha1:23e21a58d5c7b5ae7b4b5532933e0f82e24024fe</id>
<content type='text'>
In the early days of Git, Perl was used quite prominently throughout the
project. This has changed significantly as almost all of the executables
we ship nowadays have eventually been rewritten in C. Only a handful of
subsystems remain that require Perl:

  - gitweb, a read-only web interface.

  - A couple of scripts that allow importing repositories from GNU Arch,
    CVS and Subversion.

  - git-send-email(1), which can be used to send mails.

  - git-request-pull(1), which is used to request somebody to pull from
    a URL by sending an email.

  - git-filter-branch(1), which uses Perl with the `--state-branch`
    option. This command is typically recommended against nowadays in
    favor of git-filter-repo(1).

  - Our Perl bindings for Git.

  - The netrc Git credential helper.

None of these subsystems can really be considered to be part of the
"core" of Git, and an installation without them is fully functional.
It is more likely than not that an end user wouldn't even notice that
any features are missing if those tools weren't installed. But while
Perl nowadays very much is an optional dependency of Git, there is a
significant limitation when Perl isn't available: developers cannot run
our test suite.

Preceding commits have started to lift this restriction by removing the
strict dependency on Perl in many central parts of the test library. But
there are still many tests that rely on small Perl helpers to do various
different things.

Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests
that require Perl. This prerequisite is explicitly different than the
preexisting PERL prerequisite:

  - PERL records whether or not features depending on the Perl
    interpreter are built.

  - PERL_TEST_HELPERS records whether or not a Perl interpreter is
    available for our tests.

By having these two separate prerequisites we can thus distinguish
between tests that inherently depend on Perl because the underlying
feature does, and those tests that depend on Perl because the test
itself is using Perl.

Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t: remove TEST_PASSES_SANITIZE_LEAK annotations</title>
<updated>2024-11-20T23:23:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-11-20T13:39:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fc1ddf42af6742fae7e770cae20e30d7902014c0'/>
<id>urn:sha1:fc1ddf42af6742fae7e770cae20e30d7902014c0</id>
<content type='text'>
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there
is no longer a need to have that variable declared in all of our tests.
Drop it.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t/test-repository: fix leaking repository</title>
<updated>2024-08-01T15:47:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-01T10:40:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=11f841c1cc9c7ffadf5d462d25a378fcab5bb6e1'/>
<id>urn:sha1:11f841c1cc9c7ffadf5d462d25a378fcab5bb6e1</id>
<content type='text'>
The test-repository test helper zeroes out `the_repository` such that it
can be sure that our codebase only ends up using the supplied repository
that we initialize in the respective helper functions. This does cause
memory leaks though as the data that `the_repository` has been holding
onto is not referenced anymore.

Fix this by calling `repo_clear()` instead.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/commit-graph-less-paranoid'</title>
<updated>2023-12-18T22:10:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-12-18T22:10:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=66685e85556ad1890d896bc30ba3a7a99bf4dd78'/>
<id>urn:sha1:66685e85556ad1890d896bc30ba3a7a99bf4dd78</id>
<content type='text'>
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications.  The default has been
flipped to disable this pessimization.

* ps/commit-graph-less-paranoid:
  commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
</content>
</entry>
<entry>
<title>Merge branch 'jk/chunk-bounds-more'</title>
<updated>2023-12-10T00:37:48Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-12-10T00:37:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=34401b7ddbd79c8511e864048bc52d896eac2f22'/>
<id>urn:sha1:34401b7ddbd79c8511e864048bc52d896eac2f22</id>
<content type='text'>
Code clean-up for jk/chunk-bounds topic.

* jk/chunk-bounds-more:
  commit-graph: mark chunk error messages for translation
  commit-graph: drop verify_commit_graph_lite()
  commit-graph: check order while reading fanout chunk
  commit-graph: use fanout value for graph size
  commit-graph: abort as soon as we see a bogus chunk
  commit-graph: clarify missing-chunk error messages
  commit-graph: drop redundant call to "lite" verification
  midx: check consistency of fanout table
  commit-graph: handle overflow in chunk_size checks
</content>
</entry>
<entry>
<title>commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default</title>
<updated>2023-11-26T01:10:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2023-11-24T11:08:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b1df3b3867e351913887121063cbd69de24e83fc'/>
<id>urn:sha1:b1df3b3867e351913887121063cbd69de24e83fc</id>
<content type='text'>
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:

  Benchmark 1: git.v2.42.1 rev-list --count HEAD
    Time (mean ± σ):     658.0 ms ±   5.2 ms    [User: 613.5 ms, System: 44.4 ms]
    Range (min … max):   650.2 ms … 666.0 ms    10 runs

  Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
    Time (mean ± σ):      1.333 s ±  0.019 s    [User: 1.263 s, System: 0.069 s]
    Range (min … max):    1.302 s …  1.361 s    10 runs

  Summary
    git.v2.42.1 rev-list --count HEAD ran
      2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD

While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.

With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).

This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:

  Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
    Time (mean ± σ):      2.947 s ±  0.003 s    [User: 2.412 s, System: 0.534 s]
    Range (min … max):    2.943 s …  2.949 s    3 runs

  Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
    Time (mean ± σ):      2.724 s ±  0.030 s    [User: 2.207 s, System: 0.514 s]
    Range (min … max):    2.704 s …  2.759 s    3 runs

  Summary
    GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
      1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted

So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.

Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.

Reported-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: check order while reading fanout chunk</title>
<updated>2023-11-09T10:07:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-11-09T07:25:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=06fb135f8eddc64071a719fe309c771883c07775'/>
<id>urn:sha1:06fb135f8eddc64071a719fe309c771883c07775</id>
<content type='text'>
We read the fanout chunk, storing a pointer to it, but only confirm that
the entries are monotonic in a final "lite" verification step. Let's
move that into the actual OIDF chunk callback, so that we can report
problems immediately (for all the reasons given in the previous
"commit-graph: abort as soon as we see a bogus chunk" commit).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: use fanout value for graph size</title>
<updated>2023-11-09T10:07:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-11-09T07:24:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d3b6f6c63137b72df5055b71721825e786bcbd6e'/>
<id>urn:sha1:d3b6f6c63137b72df5055b71721825e786bcbd6e</id>
<content type='text'>
Commit-graph, midx, and pack idx files all have both a lookup table of
oids and an oid fanout table. In midx and pack idx files, we take the
final entry of the fanout table as the source of truth for the number of
entries, and then verify that the size of the lookup table matches that.
But for commit-graph files, we do the opposite: we use the size of the
lookup table as the source of truth, and then check the final fanout
entry against it.

As noted in 4169d89645 (commit-graph: check consistency of fanout
table, 2023-10-09), either is correct. But there are a few reasons to
prefer the fanout table as the source of truth:

  1. The fanout entries are 32-bits on disk, and that defines the
     maximum number of entries we can store. But since the size of the
     lookup table is only bounded by the filesystem, it can be much
     larger. And hence computing it as the commit-graph does means that
     we may truncate the result when storing it in a uint32_t.

  2. We read the fanout first, then the lookup table. If we're verifying
     the chunks as we read them, then we'd want to take the fanout as
     truth (we have nothing yet to check it against) and then we can
     check that the lookup table matches what we already know.

  3. It is pointlessly inconsistent with the midx and pack idx code.
     Since the three have to do similar size and bounds checks, it is
     easier to reason about all three if they use the same approach.

So this patch moves the assignment of g-&gt;num_commits to the fanout
parser, and then we can check the size of the lookup chunk as soon as we
try to load it.

There's already a test covering this situation, which munges the final
fanout entry to 2^32-1. In the current code we complain that it does not
agree with the table size. But now that we treat the munged value as the
source of truth, we'll complain that the lookup table is the wrong size
(again, either is correct). So we'll have to update the message we
expect (and likewise for an earlier test which does similar munging).

There's a similar test for this situation on the midx side, but rather
than making a very-large fanout value, it just truncates the lookup
table. We could do that here, too, but the very-large fanout value
actually shows an interesting corner case. On a 32-bit system,
multiplying to find the expected table size would cause an integer
overflow. Using st_mult() would detect that, but cause us to die()
rather than falling back to the non-graph code path. Checking the size
using division (as we do with existing chunk-size checks) avoids the
overflow entirely, and the test demonstrates this when run on a 32-bit
system.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
