<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/t5324-split-commit-graph.sh, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-04-07T21:47:37Z</updated>
<entry>
<title>t: introduce PERL_TEST_HELPERS prerequisite</title>
<updated>2025-04-07T21:47:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-03T05:05:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=23e21a58d5c7b5ae7b4b5532933e0f82e24024fe'/>
<id>urn:sha1:23e21a58d5c7b5ae7b4b5532933e0f82e24024fe</id>
<content type='text'>
In the early days of Git, Perl was used quite prominently throughout the
project. This has changed significantly as almost all of the executables
we ship nowadays have eventually been rewritten in C. Only a handful of
subsystems remain that require Perl:

  - gitweb, a read-only web interface.

  - A couple of scripts that allow importing repositories from GNU Arch,
    CVS and Subversion.

  - git-send-email(1), which can be used to send mails.

  - git-request-pull(1), which is used to request somebody to pull from
    a URL by sending an email.

  - git-filter-branch(1), which uses Perl with the `--state-branch`
    option. This command is typically recommended against nowadays in
    favor of git-filter-repo(1).

  - Our Perl bindings for Git.

  - The netrc Git credential helper.

None of these subsystems can really be considered to be part of the
"core" of Git, and an installation without them is fully functional.
It is more likely than not that an end user wouldn't even notice that
any features are missing if those tools weren't installed. But while
Perl nowadays very much is an optional dependency of Git, there is a
significant limitation when Perl isn't available: developers cannot run
our test suite.

Preceding commits have started to lift this restriction by removing the
strict dependency on Perl in many central parts of the test library. But
there are still many tests that rely on small Perl helpers to do various
different things.

Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests
that require Perl. This prerequisite is explicitly different than the
preexisting PERL prerequisite:

  - PERL records whether or not features depending on the Perl
    interpreter are built.

  - PERL_TEST_HELPERS records whether or not a Perl interpreter is
    available for our tests.

By having these two separate prerequisites we can thus distinguish
between tests that inherently depend on Perl because the underlying
feature does, and those tests that depend on Perl because the test
itself is using Perl.

Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t: remove TEST_PASSES_SANITIZE_LEAK annotations</title>
<updated>2024-11-20T23:23:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-11-20T13:39:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fc1ddf42af6742fae7e770cae20e30d7902014c0'/>
<id>urn:sha1:fc1ddf42af6742fae7e770cae20e30d7902014c0</id>
<content type='text'>
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there
is no longer a need to have that variable declared in all of our tests.
Drop it.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t: fix typos</title>
<updated>2024-10-24T16:45:53Z</updated>
<author>
<name>Andrew Kreimer</name>
<email>algonell@gmail.com</email>
</author>
<published>2024-10-24T11:47:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f56f9d6c0b83399c0c95b7630b7e3d3d57b495f5'/>
<id>urn:sha1:f56f9d6c0b83399c0c95b7630b7e3d3d57b495f5</id>
<content type='text'>
Fix typos and grammar in documentation, comments, etc.

Via codespell.

Reported-by: Kristoffer Haugsbakk &lt;kristofferhaugsbakk@fastmail.com&gt;
Signed-off-by: Andrew Kreimer &lt;algonell@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>commit-graph.c: remove temporary graph layers on exit</title>
<updated>2024-06-07T15:40:48Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-06-06T22:19:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c195ecda7703098f28c160cc63306ca0e1488236'/>
<id>urn:sha1:c195ecda7703098f28c160cc63306ca0e1488236</id>
<content type='text'>
Since the introduction of split commit graph layers in 92b1ea66b9a
(Merge branch 'ds/commit-graph-incremental', 2019-07-19), the function
write_commit_graph_file() has done the following when writing an
incremental commit-graph layer:

  - used a lock_file to control access to the commit-graph-chain file

  - used an auxiliary file (whose descriptor was stored in 'fd') to
    write the new commit-graph layer itself

Using a lock_file to control access to the commit-graph-chain is
sensible, since only one writer may modify it at a time. Likewise, when
the commit-graph machinery is writing out a single layer, the lock_file
structure is used to modify the commit-graph itself. This is also
sensible, since the non-incremental commit-graph may also have at most
one writer.

However, using an auxiliary temporary file without using the tempfile.h
API means that writes that fail after the temporary graph layer has been
created will leave around a file in

    $GIT_DIR/objects/info/commit-graphs/tmp_graph_XXXXXX

The commit-graph-chain file and non-incremental commit-graph do not
suffer from this problem as the lockfile.h API uses the tempfile.h API
transparently, so processes that died before moving those finals into
their final location cleaned up after themselves.

Ensure that the temporary file used to write incremental commit-graphs
is also managed with the tempfile.h API, to ensure that we do not ever
leave tmp_graph_XXXXXX files laying around.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jc/test-i18ngrep'</title>
<updated>2023-11-08T02:04:02Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-11-08T02:04:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a8e2394704d0543f4e1f1ac6ea532d098316d97e'/>
<id>urn:sha1:a8e2394704d0543f4e1f1ac6ea532d098316d97e</id>
<content type='text'>
Another step to deprecate test_i18ngrep.

* jc/test-i18ngrep:
  tests: teach callers of test_i18ngrep to use test_grep
  test framework: further deprecate test_i18ngrep
</content>
</entry>
<entry>
<title>tests: teach callers of test_i18ngrep to use test_grep</title>
<updated>2023-11-02T08:13:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-10-31T05:23:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6789275d3780bcb950e6be8557aeedf160d4ad6d'/>
<id>urn:sha1:6789275d3780bcb950e6be8557aeedf160d4ad6d</id>
<content type='text'>
They are equivalents and the former still exists, so as long as the
only change this commit makes are to rewrite test_i18ngrep to
test_grep, there won't be any new bug, even if there still are
callers of test_i18ngrep remaining in the tree, or when merged to
other topics that add new uses of test_i18ngrep.

This patch was produced more or less with

    git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' |
    xargs perl -p -i -e 's/test_i18ngrep /test_grep /'

and a good way to sanity check the result yourself is to run the
above in a checkout of c4603c1c (test framework: further deprecate
test_i18ngrep, 2023-10-31) and compare the resulting working tree
contents with the result of applying this patch to the same commit.
You'll see that test_i18ngrep in a few t/lib-*.sh files corrected,
in addition to the manual reproduction.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/chunk-bounds'</title>
<updated>2023-10-23T20:56:36Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-10-23T20:56:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f32af12ceec1c19d8a8a7874523d3a7ceef6eebf'/>
<id>urn:sha1:f32af12ceec1c19d8a8a7874523d3a7ceef6eebf</id>
<content type='text'>
The codepaths that read "chunk" formatted files have been corrected
to pay attention to the chunk size and notice broken files.

* jk/chunk-bounds: (21 commits)
  t5319: make corrupted large-offset test more robust
  chunk-format: drop pair_chunk_unsafe()
  commit-graph: detect out-of-order BIDX offsets
  commit-graph: check bounds when accessing BIDX chunk
  commit-graph: check bounds when accessing BDAT chunk
  commit-graph: bounds-check generation overflow chunk
  commit-graph: check size of generations chunk
  commit-graph: bounds-check base graphs chunk
  commit-graph: detect out-of-bounds extra-edges pointers
  commit-graph: check size of commit data chunk
  midx: check size of revindex chunk
  midx: bounds-check large offset chunk
  midx: check size of object offset chunk
  midx: enforce chunk alignment on reading
  midx: check size of pack names chunk
  commit-graph: check consistency of fanout table
  midx: check size of oid lookup chunk
  commit-graph: check size of oid fanout chunk
  midx: stop ignoring malformed oid fanout chunk
  t: add library for munging chunk-format files
  ...
</content>
</entry>
<entry>
<title>Merge branch 'jk/commit-graph-leak-fixes'</title>
<updated>2023-10-13T21:18:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-10-13T21:18:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a45eddec40b7fd0b9a86ccd0e889a785829d55d7'/>
<id>urn:sha1:a45eddec40b7fd0b9a86ccd0e889a785829d55d7</id>
<content type='text'>
Leakfix.

* jk/commit-graph-leak-fixes:
  commit-graph: clear oidset after finishing write
  commit-graph: free write-context base_graph_name during cleanup
  commit-graph: free write-context entries before overwriting
  commit-graph: free graph struct that was not added to chain
  commit-graph: delay base_graph assignment in add_graph_to_chain()
  commit-graph: free all elements of graph chain
  commit-graph: move slab-clearing to close_commit_graph()
  merge: free result of repo_get_merge_bases()
  commit-reach: free temporary list in get_octopus_merge_bases()
  t6700: mark test as leak-free
</content>
</entry>
<entry>
<title>commit-graph: bounds-check base graphs chunk</title>
<updated>2023-10-09T22:55:01Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-10-09T21:05:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344'/>
<id>urn:sha1:6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344</id>
<content type='text'>
When we are loading a commit-graph chain, we check that each slice of the
chain points to the appropriate set of base graphs via its BASE chunk.
But since we don't record the size of the chunk, we may access
out-of-bounds memory if the file is corrupted.

Since we know the number of entries we expect to find (based on the
position within the commit-graph-chain file), we can just check the size
up front.

In theory this would also let us drop the st_mult() call a few lines
later when we actually access the memory, since we know that the
computed offset will fit in a size_t. But because the operands
"g-&gt;hash_len" and "n" have types "unsigned char" and "int", we'd have to
cast to size_t first. Leaving the st_mult() does that cast, and makes it
more obvious that we don't have an overflow problem.

Note that the test does not actually segfault before this patch, since
it just reads garbage from the chunk after BASE (and indeed, it even
rejects the file because that garbage does not have the expected hash
value). You could construct a file with BASE at the end that did
segfault, but corrupting the existing one is easy, and we can check
stderr for the expected message.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: check consistency of fanout table</title>
<updated>2023-10-09T22:55:00Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-10-09T21:04:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4169d8964523198ca89f507824c07b70ba833732'/>
<id>urn:sha1:4169d8964523198ca89f507824c07b70ba833732</id>
<content type='text'>
We use bsearch_hash() to look up items in the oid index of a
commit-graph. It also has a fanout table to reduce the initial range in
which we'll search. But since the fanout comes from the on-disk file, a
corrupted or malicious file can cause us to look outside of the
allocated index memory.

One solution here would be to pass the total table size to
bsearch_hash(), which could then bounds check the values it reads from
the fanout. But there's an inexpensive up-front check we can do, and
it's the same one used by the midx and pack idx code (both of which
likewise have fanout tables and use bsearch_hash(), but are not affected
by this bug):

  1. We can check the value of the final fanout entry against the size
     of the table we got from the index chunk. These must always match,
     since the fanout is just slicing up the index.

       As a side note, the midx and pack idx code compute it the other
       way around: they use the final fanout value as the object count, and
       check the index size against it. Either is valid; if they
       disagree we cannot know which is wrong (a corrupted fanout value,
       or a too-small table of oids).

  2. We can quickly scan the fanout table to make sure it is
     monotonically increasing. If it is, then we know that every value
     is less than or equal to the final value, and therefore less than
     or equal to the table size.

     It would also be sufficient to just check that each fanout value is
     smaller than the final one, but the midx and pack idx code both do
     a full monotonicity check. It's the same cost, and it catches some
     other corruptions (though not all; the checks done by "commit-graph
     verify" are more complete but more expensive, and our goal here is
     to be fast and memory-safe).

There are two new tests. One just checks the final fanout value (this is
the mirror image of the "too small oid lookup" case added for the midx
in the previous commit; it's flipped here because commit-graph considers
the oid lookup chunk to be the source of truth).

The other actually creates a fanout with many out-of-bounds entries, and
prior to this patch, it does cause the segfault you'd expect. But note
that the error is not "your fanout entry is out-of-bounds", but rather
"fanout value out of order". That's because we leave the final fanout
value in place (to get past the table size check), making the index
non-monotonic (the second-to-last entry is big, but the last one must
remain small to match the actual table).

We need adjustments to a few existing tests, as well:

  - an earlier test in t5318 corrupts the fanout and runs "commit-graph
    verify". Its message is now changed, since we catch the problem
    earlier (during the load step, rather than the careful validation
    step).

  - in t5324, we test that "commit-graph verify --shallow" does not do
    expensive verification on the base file of the chain. But the
    corruption it uses (munging a byte at offset 1000) happens to be in
    the middle of the fanout table. And now we detect that problem in
    the cheaper checks that are performed for every part of the graph.
    We'll push this back to offset 1500, which is only caught by the
    more expensive checksum validation.

    Likewise, there's a later test in t5324 which munges an offset 100
    bytes into a file (also in the fanout table) that is referenced by
    an alternates file. So we now find that corruption during the load
    step, rather than the verification step. At the very least we need
    to change the error message (like the case above in t5318). But it
    is probably good to make sure we handle all parts of the
    verification even for alternate graph files. So let's likewise
    corrupt byte 1500 and make sure we found the invalid checksum.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
