<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/t5324-split-commit-graph.sh, branch v2.45.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-11-08T02:04:02Z</updated>
<entry>
<title>Merge branch 'jc/test-i18ngrep'</title>
<updated>2023-11-08T02:04:02Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-11-08T02:04:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a8e2394704d0543f4e1f1ac6ea532d098316d97e'/>
<id>urn:sha1:a8e2394704d0543f4e1f1ac6ea532d098316d97e</id>
<content type='text'>
Another step to deprecate test_i18ngrep.

* jc/test-i18ngrep:
  tests: teach callers of test_i18ngrep to use test_grep
  test framework: further deprecate test_i18ngrep
</content>
</entry>
<entry>
<title>tests: teach callers of test_i18ngrep to use test_grep</title>
<updated>2023-11-02T08:13:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-10-31T05:23:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6789275d3780bcb950e6be8557aeedf160d4ad6d'/>
<id>urn:sha1:6789275d3780bcb950e6be8557aeedf160d4ad6d</id>
<content type='text'>
They are equivalents and the former still exists, so as long as the
only change this commit makes are to rewrite test_i18ngrep to
test_grep, there won't be any new bug, even if there still are
callers of test_i18ngrep remaining in the tree, or when merged to
other topics that add new uses of test_i18ngrep.

This patch was produced more or less with

    git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' |
    xargs perl -p -i -e 's/test_i18ngrep /test_grep /'

and a good way to sanity check the result yourself is to run the
above in a checkout of c4603c1c (test framework: further deprecate
test_i18ngrep, 2023-10-31) and compare the resulting working tree
contents with the result of applying this patch to the same commit.
You'll see that test_i18ngrep in a few t/lib-*.sh files corrected,
in addition to the manual reproduction.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/chunk-bounds'</title>
<updated>2023-10-23T20:56:36Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-10-23T20:56:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f32af12ceec1c19d8a8a7874523d3a7ceef6eebf'/>
<id>urn:sha1:f32af12ceec1c19d8a8a7874523d3a7ceef6eebf</id>
<content type='text'>
The codepaths that read "chunk" formatted files have been corrected
to pay attention to the chunk size and notice broken files.

* jk/chunk-bounds: (21 commits)
  t5319: make corrupted large-offset test more robust
  chunk-format: drop pair_chunk_unsafe()
  commit-graph: detect out-of-order BIDX offsets
  commit-graph: check bounds when accessing BIDX chunk
  commit-graph: check bounds when accessing BDAT chunk
  commit-graph: bounds-check generation overflow chunk
  commit-graph: check size of generations chunk
  commit-graph: bounds-check base graphs chunk
  commit-graph: detect out-of-bounds extra-edges pointers
  commit-graph: check size of commit data chunk
  midx: check size of revindex chunk
  midx: bounds-check large offset chunk
  midx: check size of object offset chunk
  midx: enforce chunk alignment on reading
  midx: check size of pack names chunk
  commit-graph: check consistency of fanout table
  midx: check size of oid lookup chunk
  commit-graph: check size of oid fanout chunk
  midx: stop ignoring malformed oid fanout chunk
  t: add library for munging chunk-format files
  ...
</content>
</entry>
<entry>
<title>Merge branch 'jk/commit-graph-leak-fixes'</title>
<updated>2023-10-13T21:18:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-10-13T21:18:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a45eddec40b7fd0b9a86ccd0e889a785829d55d7'/>
<id>urn:sha1:a45eddec40b7fd0b9a86ccd0e889a785829d55d7</id>
<content type='text'>
Leakfix.

* jk/commit-graph-leak-fixes:
  commit-graph: clear oidset after finishing write
  commit-graph: free write-context base_graph_name during cleanup
  commit-graph: free write-context entries before overwriting
  commit-graph: free graph struct that was not added to chain
  commit-graph: delay base_graph assignment in add_graph_to_chain()
  commit-graph: free all elements of graph chain
  commit-graph: move slab-clearing to close_commit_graph()
  merge: free result of repo_get_merge_bases()
  commit-reach: free temporary list in get_octopus_merge_bases()
  t6700: mark test as leak-free
</content>
</entry>
<entry>
<title>commit-graph: bounds-check base graphs chunk</title>
<updated>2023-10-09T22:55:01Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-10-09T21:05:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344'/>
<id>urn:sha1:6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344</id>
<content type='text'>
When we are loading a commit-graph chain, we check that each slice of the
chain points to the appropriate set of base graphs via its BASE chunk.
But since we don't record the size of the chunk, we may access
out-of-bounds memory if the file is corrupted.

Since we know the number of entries we expect to find (based on the
position within the commit-graph-chain file), we can just check the size
up front.

In theory this would also let us drop the st_mult() call a few lines
later when we actually access the memory, since we know that the
computed offset will fit in a size_t. But because the operands
"g-&gt;hash_len" and "n" have types "unsigned char" and "int", we'd have to
cast to size_t first. Leaving the st_mult() does that cast, and makes it
more obvious that we don't have an overflow problem.

Note that the test does not actually segfault before this patch, since
it just reads garbage from the chunk after BASE (and indeed, it even
rejects the file because that garbage does not have the expected hash
value). You could construct a file with BASE at the end that did
segfault, but corrupting the existing one is easy, and we can check
stderr for the expected message.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: check consistency of fanout table</title>
<updated>2023-10-09T22:55:00Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-10-09T21:04:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4169d8964523198ca89f507824c07b70ba833732'/>
<id>urn:sha1:4169d8964523198ca89f507824c07b70ba833732</id>
<content type='text'>
We use bsearch_hash() to look up items in the oid index of a
commit-graph. It also has a fanout table to reduce the initial range in
which we'll search. But since the fanout comes from the on-disk file, a
corrupted or malicious file can cause us to look outside of the
allocated index memory.

One solution here would be to pass the total table size to
bsearch_hash(), which could then bounds check the values it reads from
the fanout. But there's an inexpensive up-front check we can do, and
it's the same one used by the midx and pack idx code (both of which
likewise have fanout tables and use bsearch_hash(), but are not affected
by this bug):

  1. We can check the value of the final fanout entry against the size
     of the table we got from the index chunk. These must always match,
     since the fanout is just slicing up the index.

       As a side note, the midx and pack idx code compute it the other
       way around: they use the final fanout value as the object count, and
       check the index size against it. Either is valid; if they
       disagree we cannot know which is wrong (a corrupted fanout value,
       or a too-small table of oids).

  2. We can quickly scan the fanout table to make sure it is
     monotonically increasing. If it is, then we know that every value
     is less than or equal to the final value, and therefore less than
     or equal to the table size.

     It would also be sufficient to just check that each fanout value is
     smaller than the final one, but the midx and pack idx code both do
     a full monotonicity check. It's the same cost, and it catches some
     other corruptions (though not all; the checks done by "commit-graph
     verify" are more complete but more expensive, and our goal here is
     to be fast and memory-safe).

There are two new tests. One just checks the final fanout value (this is
the mirror image of the "too small oid lookup" case added for the midx
in the previous commit; it's flipped here because commit-graph considers
the oid lookup chunk to be the source of truth).

The other actually creates a fanout with many out-of-bounds entries, and
prior to this patch, it does cause the segfault you'd expect. But note
that the error is not "your fanout entry is out-of-bounds", but rather
"fanout value out of order". That's because we leave the final fanout
value in place (to get past the table size check), making the index
non-monotonic (the second-to-last entry is big, but the last one must
remain small to match the actual table).

We need adjustments to a few existing tests, as well:

  - an earlier test in t5318 corrupts the fanout and runs "commit-graph
    verify". Its message is now changed, since we catch the problem
    earlier (during the load step, rather than the careful validation
    step).

  - in t5324, we test that "commit-graph verify --shallow" does not do
    expensive verification on the base file of the chain. But the
    corruption it uses (munging a byte at offset 1000) happens to be in
    the middle of the fanout table. And now we detect that problem in
    the cheaper checks that are performed for every part of the graph.
    We'll push this back to offset 1500, which is only caught by the
    more expensive checksum validation.

    Likewise, there's a later test in t5324 which munges an offset 100
    bytes into a file (also in the fanout table) that is referenced by
    an alternates file. So we now find that corruption during the load
    step, rather than the verification step. At the very least we need
    to change the error message (like the case above in t5318). But it
    is probably good to make sure we handle all parts of the
    verification even for alternate graph files. So let's likewise
    corrupt byte 1500 and make sure we found the invalid checksum.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: clear oidset after finishing write</title>
<updated>2023-10-03T21:28:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-10-03T20:31:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=da09e7af68247519e2b19fc8dff113896c39ac3c'/>
<id>urn:sha1:da09e7af68247519e2b19fc8dff113896c39ac3c</id>
<content type='text'>
In graph_write() we store commits in an oidset, but never clean it up,
leaking the contents. We should clear it in the cleanup section.

The oidset comes from 6830c36077 (commit-graph.h: replace 'commit_hex'
with 'commits', 2020-04-13), but it was just replacing a string_list
that was also leaked. Curiously, we fixed the leak of some adjacent
variables in commit fa8953cb40 (builtin/commit-graph.c: extract
'read_one_commit()', 2020-05-18), but the oidset wasn't included for
some reason.

In combination with the preceding commits, this lets us mark t5324 as
leak-free.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: report incomplete chains during verification</title>
<updated>2023-09-28T14:00:43Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-09-28T04:39:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5f259197eea0a3acc48f46748778f33c935476cb'/>
<id>urn:sha1:5f259197eea0a3acc48f46748778f33c935476cb</id>
<content type='text'>
The load_commit_graph_chain_fd_st() function will stop loading chains
when it sees an error. But if it has loaded any graph slice at all, it
will return it. This is a good thing for normal use (we use what data we
can, and this is just an optimization). But it's a bad thing for
"commit-graph verify", which should be careful about finding any
irregularities. We do complain to stderr with a warning(), but the
verify command still exits with a successful return code.

The new tests here cover corruption of both the base and tip slices of
the chain. The corruption of the base file already works (it is the
first file we look at, so when we see the error we return NULL). The
"tip" case is what is fixed by this patch (it complains to stderr but
still returns the base slice).

Likewise the existing tests for corruption of the commit-graph-chain
file itself need to be updated. We already exited non-zero correctly for
the "base" case, but the "tip" case can now do so, too.

Note that this also causes us to adjust a test later in the file that
similarly corrupts a tip (though confusingly the test script calls this
"base"). It checks stderr but erroneously expects the whole "verify"
command to exit with a successful code.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: tighten chain size check</title>
<updated>2023-09-28T14:00:43Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-09-28T04:39:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7754a565e2e78e4163dbf597bba5fc729cc3bbc7'/>
<id>urn:sha1:7754a565e2e78e4163dbf597bba5fc729cc3bbc7</id>
<content type='text'>
When we open a commit-graph-chain file, if it's smaller than a single
entry, we just quietly treat that as ENOENT. That make some sense if the
file is truly zero bytes, but it means that "commit-graph verify" will
quietly ignore a file that contains garbage if that garbage happens to
be short.

Instead, let's only simulate ENOENT when the file is truly empty, and
otherwise return EINVAL. The normal graph-loading routines don't care,
but "commit-graph verify" will notice and complain about the difference.

It's not entirely clear to me that the 0-is-ENOENT case actually happens
in real life, so we could perhaps just eliminate this special-case
altogether. But this is how we've always behaved, so I'm preserving it
in the name of backwards compatibility (though again, it really only
matters for "verify", as the regular routines are happy to load what
they can).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: detect read errors when verifying graph chain</title>
<updated>2023-09-28T14:00:43Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-09-28T04:38:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=47d06bb010867cd122f732be93e55a797371570c'/>
<id>urn:sha1:47d06bb010867cd122f732be93e55a797371570c</id>
<content type='text'>
Because it's OK to not have a graph file at all, the graph_verify()
function needs to tell the difference between a missing file and a real
error.  So when loading a traditional graph file, we call
open_commit_graph() separately from load_commit_graph_chain_fd_st(), and
don't complain if the first one fails with ENOENT.

When the function learned about chain files in 3da4b609bb (commit-graph:
verify chains with --shallow mode, 2019-06-18), we couldn't be as
careful, since the only way to load a chain was with
read_commit_graph_one(), which did both the open/load as a single unit.
So we'll miss errors in chain files we load, thinking instead that there
was just no chain file at all.

Note that we do still report some of these problems to stderr, as the
loading function calls error() and warning(). But we'd exit with a
successful exit code, which is wrong.

We can fix that by using the recently split open/load functions for
chains. That lets us treat the chain file just like a single file with
respect to error handling here.

An existing test (from 3da4b609bb) shows off the problem; we were
expecting "commit-graph verify" to report success, but that makes no
sense. We did not even verify the contents of the graph data, because we
couldn't load it! I don't think this was an intentional exception, but
rather just the test covering what happened to occur.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
