aboutsummaryrefslogtreecommitdiffstats
path: root/t/t5318-commit-graph.sh (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-22commit-graph: add new config for changed-paths & recommend it in scalarEmily Yang1-0/+44
The changed-path Bloom filters feature has proven stable and reliable over several years of use, delivering significant performance improvement for file history computation in large monorepos. Currently a user can opt-in to writing the changed-path Bloom filters using the "--changed-paths" option to "git commit-graph write". The filters will be persisted until the user drops the filters using the "--no-changed-paths" option. For this functionality, refer to 0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01). Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation. In this commit, we're proposing a new config option "commitGraph.changedPaths": * If "true", "git commit-graph write" will write Bloom filters, equivalent to passing "--changed-paths"; * If "false" or "unset", Bloom filters will be written during "git commit-graph write" only if the filters already exist in the current commit-graph file. This matches the default behaviour of "git commit-graph write" without any "--[no-]changed-paths" option. Note "false" can disable a previous "true" config value but doesn't imply "--no-changed-paths". This config will always respect the precedence of command line option "--[no-]changed-paths". We also set this new config as optional recommended config in scalar to turn on this feature for large repos. Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Emily Yang <emilyyang.git@gmail.com> Acked-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07t: introduce PERL_TEST_HELPERS prerequisitePatrick Steinhardt1-6/+6
In the early days of Git, Perl was used quite prominently throughout the project. This has changed significantly as almost all of the executables we ship nowadays have eventually been rewritten in C. Only a handful of subsystems remain that require Perl: - gitweb, a read-only web interface. - A couple of scripts that allow importing repositories from GNU Arch, CVS and Subversion. - git-send-email(1), which can be used to send mails. - git-request-pull(1), which is used to request somebody to pull from a URL by sending an email. - git-filter-branch(1), which uses Perl with the `--state-branch` option. This command is typically recommended against nowadays in favor of git-filter-repo(1). - Our Perl bindings for Git. - The netrc Git credential helper. None of these subsystems can really be considered to be part of the "core" of Git, and an installation without them is fully functional. It is more likely than not that an end user wouldn't even notice that any features are missing if those tools weren't installed. But while Perl nowadays very much is an optional dependency of Git, there is a significant limitation when Perl isn't available: developers cannot run our test suite. Preceding commits have started to lift this restriction by removing the strict dependency on Perl in many central parts of the test library. But there are still many tests that rely on small Perl helpers to do various different things. Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests that require Perl. This prerequisite is explicitly different than the preexisting PERL prerequisite: - PERL records whether or not features depending on the Perl interpreter are built. - PERL_TEST_HELPERS records whether or not a Perl interpreter is available for our tests. By having these two separate prerequisites we can thus distinguish between tests that inherently depend on Perl because the underlying feature does, and those tests that depend on Perl because the test itself is using Perl. Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21t: remove TEST_PASSES_SANITIZE_LEAK annotationsPatrick Steinhardt1-1/+0
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there is no longer a need to have that variable declared in all of our tests. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01t/test-repository: fix leaking repositoryPatrick Steinhardt1-0/+2
The test-repository test helper zeroes out `the_repository` such that it can be sure that our codebase only ends up using the supplied repository that we initialize in the respective helper functions. This does cause memory leaks though as the data that `the_repository` has been holding onto is not referenced anymore. Fix this by calling `repo_clear()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-18Merge branch 'ps/commit-graph-less-paranoid'Junio C Hamano1-4/+4
Earlier we stopped relying on commit-graph that (still) records information about commits that are lost from the object store, which has negative performance implications. The default has been flipped to disable this pessimization. * ps/commit-graph-less-paranoid: commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
2023-12-09Merge branch 'jk/chunk-bounds-more'Junio C Hamano1-7/+9
Code clean-up for jk/chunk-bounds topic. * jk/chunk-bounds-more: commit-graph: mark chunk error messages for translation commit-graph: drop verify_commit_graph_lite() commit-graph: check order while reading fanout chunk commit-graph: use fanout value for graph size commit-graph: abort as soon as we see a bogus chunk commit-graph: clarify missing-chunk error messages commit-graph: drop redundant call to "lite" verification midx: check consistency of fanout table commit-graph: handle overflow in chunk_size checks
2023-11-26commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by defaultPatrick Steinhardt1-4/+4
In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09commit-graph: check order while reading fanout chunkJeff King1-0/+1
We read the fanout chunk, storing a pointer to it, but only confirm that the entries are monotonic in a final "lite" verification step. Let's move that into the actual OIDF chunk callback, so that we can report problems immediately (for all the reasons given in the previous "commit-graph: abort as soon as we see a bogus chunk" commit). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09commit-graph: use fanout value for graph sizeJeff King1-2/+3
Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09commit-graph: clarify missing-chunk error messagesJeff King1-5/+5
When a required commit-graph chunk cannot be loaded, we leave its entry in the struct NULL, and then later complain that it is missing. But that's just one reason we might not have loaded it, as we also do some data quality checks. Let's switch these messages to say "missing or corrupted", which is exactly what the midx code says for the same cases. Likewise, we'll use the same phrasing and capitalization as those for consistency. And while we're here, we can mark them for translation (just like the midx ones). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-08Merge branch 'jc/test-i18ngrep'Junio C Hamano1-9/+9
Another step to deprecate test_i18ngrep. * jc/test-i18ngrep: tests: teach callers of test_i18ngrep to use test_grep test framework: further deprecate test_i18ngrep
2023-11-08Merge branch 'ps/do-not-trust-commit-graph-blindly-for-existence'Junio C Hamano1-0/+48
The codepath to traverse the commit-graph learned to notice that a commit is missing (e.g., corrupt repository lost an object), even though it knows something about the commit (like its parents) from what is in commit-graph. * ps/do-not-trust-commit-graph-blindly-for-existence: commit: detect commits that exist in commit-graph but not in the ODB commit-graph: introduce envvar to disable commit existence checks
2023-11-02tests: teach callers of test_i18ngrep to use test_grepJunio C Hamano1-9/+9
They are equivalents and the former still exists, so as long as the only change this commit makes are to rewrite test_i18ngrep to test_grep, there won't be any new bug, even if there still are callers of test_i18ngrep remaining in the tree, or when merged to other topics that add new uses of test_i18ngrep. This patch was produced more or less with git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' | xargs perl -p -i -e 's/test_i18ngrep /test_grep /' and a good way to sanity check the result yourself is to run the above in a checkout of c4603c1c (test framework: further deprecate test_i18ngrep, 2023-10-31) and compare the resulting working tree contents with the result of applying this patch to the same commit. You'll see that test_i18ngrep in a few t/lib-*.sh files corrected, in addition to the manual reproduction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01commit: detect commits that exist in commit-graph but not in the ODBPatrick Steinhardt1-0/+27
Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01commit-graph: introduce envvar to disable commit existence checksPatrick Steinhardt1-0/+21
Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09commit-graph: check size of generations chunkJeff King1-0/+8
We neither check nor record the size of the generations chunk we parse from a commit-graph file. This should have one uint32_t for each commit in the file; if it is smaller (due to corruption, etc), we may read outside the mapped memory. The included test segfaults without this patch, as it shrinks the size considerably (and the chunk is near the end of the file, so we read off the end of the array rather than accidentally reading another chunk). We can fix this by checking the size up front (like we do for other fixed-size chunks, like CDAT). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09commit-graph: detect out-of-bounds extra-edges pointersJeff King1-0/+8
If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t *". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char *"). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09commit-graph: check size of commit data chunkJeff King1-0/+9
We expect a commit-graph file to have a fixed-size data record for each commit in the file (and we know the number of commits to expct from the size of the lookup table). If we encounter a file where this is too small, we'll look past the end of the chunk (and possibly even off the mapped memory). We can fix this by checking the size up front when we record the pointer. The included test doesn't segfault, since it ends up reading bytes from another chunk. But it produces nonsense results, since the values it reads are garbage. Our test notices this by comparing the output to a non-corrupted run of the same command (and of course we also check that the expected error is printed to stderr). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09commit-graph: check consistency of fanout tableJeff King1-1/+24
We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09commit-graph: check size of oid fanout chunkJeff King1-0/+26
We load the oid fanout chunk with pair_chunk(), which means we never see the size of the chunk. We just assume the on-disk file uses the appropriate size, and if it's too small we'll access random memory. It's easy to check this up-front; the fanout always consists of 256 uint32's, since it is a fanout of the first byte of the hash pointing into the oid index. These parameters can't be changed without introducing a new chunk type. This matches the similar check in the midx OIDF chunk (but note that rather than checking for the error immediately, the graph code just leaves parts of the struct NULL and checks for required fields later). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-21commit-graph: avoid repeated mixed generation number warningsTaylor Blau1-2/+2
When validating that a commit-graph has either all zero, or all non-zero generation numbers, we emit a warning on both the rising and falling edge of transitioning between the two. So if we are unfortunate enough to see a commit-graph which has a repeating sequence of zero, then non-zero generation numbers, we'll generate many warnings that contain more or less the same information. Avoid this by keeping track of a single example for a commit with zero- and non-zero generation, and emit a single warning at the end of verification if both are non-NULL. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-21t/t5318-commit-graph.sh: test generation zero transitions during fsckTaylor Blau1-6/+12
The second test called "detect incorrect generation number" asserts that we correctly warn during an fsck when we see a non-zero generation number after seeing a zero beforehand. The other transition (going from non-zero to zero) was previously untested. Test both directions, and rename the existing test to make clear which direction it is exercising. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-21commit-graph: introduce `commit_graph_generation_from_graph()`Taylor Blau1-1/+1
In 2ee11f7261 (commit-graph: return generation from memory, 2023-03-20), the `commit_graph_generation()` function stopped returning zeros when asked to locate the generation number of a given commit. This was done at the time to prepare for a later change which set generation values in memory, meaning that we could no longer rely on `graph_pos` alone to tell us whether or not to trust the generation number returned by this function. In 2ee11f7261, it was noted that this change only impacted very old commit-graphs, which were written with all commits having generation number 0. Indeed, zero is not a valid generation number, so we should never expect to see that value outside of the aforementioned case. The test fallout in 2ee11f7261 indicated that we were no longer able to fsck a specific old case of commit-graph corruption, where we see a non-zero generation number after having seen a generation number of 0 earlier. Introduce a variant of `commit_graph_generation()` which behaves like that function did prior to 2ee11f7261, known as `commit_graph_generation_from_graph()`. Then use this function in the context of `verify_one_commit_graph()`, where we only want to trust the values from the graph. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-24t5318: avoid top-level directory changesTaylor Blau1-206/+172
Avoid changing the current working directory from outside of a sub-shell during the tests in t5318. Each test has mostly straightforward changes, either: - Removing the top-level `cd "$TRASH_DIRECTORY/full"`, which is unnecessary after ensuring that other tests don't change their working directory outside of a sub-shell. - Changing any Git invocations which want to be in a sub-directory by either (a) adding a "-C $DIR" argument, or (b) moving the whole test into a sub-shell. While we're here, remove any explicit "git config core.commitGraph true" invocations which were designed to enable use of the commit-graph. These are unnecessary following 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10fsck: suppress commit-graph output with `--no-progress`Taylor Blau1-0/+10
Since e0fd51e1d7 (fsck: verify commit-graph, 2018-06-27), `fsck` runs `git commit-graph verify` to check the integrity of any commit-graph(s). Originally, the `git commit-graph verify` step would always print to stdout/stderr, regardless of whether or not `fsck` was invoked with `--[no-]progress` or not. But in 7371612255 (commit-graph: add --[no-]progress to write and verify, 2019-08-26), the commit-graph machinery learned the `--[no-]progress` option, though `fsck` was not updated to pass this new flag (or not). This led to seeing output from running `git fsck`, even with `--no-progress` on repositories that have a commit-graph: $ git.compile fsck --connectivity-only --no-progress --no-dangling Verifying commits in commit graph: 100% (4356/4356), done. Verifying commits in commit graph: 100% (131912/131912), done. Ensure that `fsck` passes `--[no-]progress` as appropriate when calling `git commit-graph verify`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20commit-graph: return generation from memoryDerrick Stolee1-1/+1
The commit_graph_generation() method used to report a value of GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance for the given commit but the graph_pos indicated the commit was not in the commit-graph file. However, an upcoming change will introduce the ability to set generation values in-memory without writing the commit-graph file. Thus, we can no longer trust 'graph_pos' to indicate whether or not the generation member can be trusted. Instead, trust the 'generation' member if the commit has a value in the slab _and_ the 'generation' member is non-zero. Otherwise, treat it as GENERATION_NUMBER_INFINITY. This only makes a difference for a very old case for the commit-graph: the very first Git release to write commit-graph files wrote zeroes in the topological level positions. If we are parsing a commit-graph with all zeroes, those commits will now appear to have GENERATION_NUMBER_INFINITY (as if they were not parsed from the commit-graph). I attempted several variations to work around the need for providing an uninitialized 'generation' member, but this was the best one I found. It does require a change to a verification test in t5318 because it reports a different error than the one about non-zero generation numbers. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01Merge branch 'sg/parse-options-subcommand'Junio C Hamano1-2/+2
Introduce the "subcommand" mode to parse-options API and update the command line parser of Git commands with subcommands. * sg/parse-options-subcommand: (23 commits) remote: run "remote rm" argv through parse_options() maintenance: add parse-options boilerplate for subcommands pass subcommand "prefix" arguments to parse_options() builtin/worktree.c: let parse-options parse subcommands builtin/stash.c: let parse-options parse subcommands builtin/sparse-checkout.c: let parse-options parse subcommands builtin/remote.c: let parse-options parse subcommands builtin/reflog.c: let parse-options parse subcommands builtin/notes.c: let parse-options parse subcommands builtin/multi-pack-index.c: let parse-options parse subcommands builtin/hook.c: let parse-options parse subcommands builtin/gc.c: let parse-options parse 'git maintenance's subcommands builtin/commit-graph.c: let parse-options parse subcommands builtin/bundle.c: let parse-options parse subcommands parse-options: add support for parsing subcommands parse-options: drop leading space from '--git-completion-helper' output parse-options: clarify the limitations of PARSE_OPT_NODASH parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options api-parse-options.txt: fix description of OPT_CMDMODE t0040-parse-options: test parse_options() with various 'parse_opt_flags' ...
2022-08-19builtin/commit-graph.c: let parse-options parse subcommandsSZEDER Gábor1-2/+2
'git commit-graph' parses its subcommands with an if-else if statement. parse-options has just learned to parse subcommands, so let's use that facility instead, with the benefits of shorter code, handling missing or unknown subcommands, and listing subcommands for Bash completion. Note that the functions implementing each subcommand only accept the 'argc' and '**argv' parameters, so add a (unused) '*prefix' parameter to make them match the type expected by parse-options, and thus avoid casting function pointers. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03Merge branch 'tb/commit-graph-genv2-upgrade-fix'Junio C Hamano1-0/+27
There was a bug in the codepath to upgrade generation information in commit-graph from v1 to v2 format, which has been corrected. * tb/commit-graph-genv2-upgrade-fix: commit-graph: fix corrupt upgrade from generation v1 to v2 commit-graph: introduce `repo_find_commit_pos_in_graph()` t5318: demonstrate commit-graph generation v2 corruption
2022-07-15commit-graph: fix corrupt upgrade from generation v1 to v2Taylor Blau1-1/+1
The previous commit demonstrates a bug where a commit-graph using generation v2 could enter a state where one of the GDA2 values has its most-significant bit set (indicating that its value should be read from the extended offset table in the GDO2 chunk) without having a GDO2 chunk to read from. This results in the following error message being displayed to the caller: fatal: commit-graph requires overflow generation data but has none This bug arises in the following scenario: - We decide to write a commit-graph using generation number v2, and decide (correctly) that no GDO2 chunk is necessary (e.g., because all of the commiter date offsets are no larger than 2^31-1). - The v2 generation numbers are stored in the `->generation` member of the commit slab holding `struct commit_graph_data`'s. - Later on, `load_commit_graph_info()` is called, overwriting the v2 generation data in the aforementioned slab with any existing v1 generation data. Then, when the commit-graph code goes to write the GDA2 chunk via `write_graph_chunk_generation_data()`, we use the overwritten generation v1 data in a place where we expect to use a v2 generation number: offset = commit_graph_data_at(c)->generation - c->date; ...because `commit_graph_data_at(c)->generation` used to hold the v2 generation data, but it was overwritten to contain the v1 generation number via `load_commit_graph_info()`. If the `offset` computation above overflows the v2 generation number max, then `write_graph_chunk_generation_data()` will update its count of large offsets and write the marker accordingly: if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; num_generation_data_overflows++; } and reads will look for the GDO2 chunk containing the overflowing v2 generation number, *after* the commit-graph code decided that no such chunk was necessary. The main problem is that the slab containing `struct commit_graph_data` has a dual purpose. It is used to hold data that we are about to write to disk while generating a commit-graph, as well as hold data that was read from an existing commit-graph. When the two mix, namely when the result of reading the commit-graph has a side-effect that mixes poorly with an in-progress commit-graph write, we end up with corrupt data. A complete fix might be to introduce a new slab that is used exclusively for writing, and gate access between the two slabs based on context provided by the caller (e.g., whether this computation is part of a "read" or "write" operation). But a more minimal fix addresses the only known path which overwrites the slab data, which is `compute_bloom_filters()` -> `get_or_compute_bloom_filter()` -> `load_commit_graph_info()` -> `fill_commit_graph_info()` by avoiding the last call which clobbers the data altogether. This path only needs to learn the graph position of a given commit so that it can be used in `load_bloom_filter_from_graph()`. By replacing the last steps of the above with one that records the graph position into a temporary variable which is then used to load the existing Bloom data, we eliminate the clobbering, removing the corruption. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15t5318: demonstrate commit-graph generation v2 corruptionTaylor Blau1-0/+27
When upgrading a commit-graph using generation v1 to one using generation v2, it is possible to force Git into a corrupt state where it (incorrectly) believes that a GDO2 chunk is necessary, *after* deciding not to write one. This makes subsequent reads using the commit-graph produce the following error message: fatal: commit-graph requires overflow generation data but has none Demonstrate this bug by increasing our test coverage to include a minimal example of upgrading a commit-graph from generation v1 to v2. The only notable components of this test are: - The committer date of the commit is chosen carefully so that the offset underflows when computed using a v1 generation number, but would not overflow when using v2 generation numbers. - The upgrade to generation number v2 must read in the v1 generation numbers, which we can do by passing `--changed-paths`, which will force the commit-graph internals to call `fill_commit_graph_info()`. A future patch will squash this bug. Reported-by: Jeff King <peff@peff.net> Reproduced-by: Will Chandler <wfc@wfchandler.org> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06tests: don't assume a .git/info for .git/info/graftsÆvar Arnfjörð Bjarmason1-1/+2
Change those tests that assumed that a .git/info directory would be created for them when writing .git/info/grafts to explicitly create the directory. Do this using the new "TEST_CREATE_REPO_NO_TEMPLATE" facility, and use "mkdir" instead of "mkdir -p" to assert that we don't have the .git/info already. An exception to this is the "with grafts" test in "t6001-rev-list-graft.sh". There we're modifying our ".git" state in a for-loop, in lieu of refactoring that more extensively let's use "mkdir -p" there. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01commit-graph: start parsing generation v2 (again)Derrick Stolee1-1/+1
The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e65 (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110aac (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01commit-graph: fix ordering bug in generation numbersDerrick Stolee1-2/+2
When computing the generation numbers for a commit-graph, we compute the corrected commit dates and then check if their offsets from the actual dates is too large to fit in the 32-bit Generation Data chunk. However, there is a problem with this approach: if we have parsed the generation data from the previous commit-graph, then we continue the loop because the corrected commit date is already computed. This causes an under-count in the number of overflow values. It is incorrect to add an increment to num_generation_data_overflows next to this 'continue' statement, because we might start double-counting commits that are computed because of the depth-first search walk from a commit with an earlier OID. Instead, iterate over the full commit list at the end, checking the offsets to see how many grow beyond the maximum value. Create a new t5328-commit-graph-64-bit-time.sh test script to handle special cases of testing 64-bit timestamps. This helps demonstrate this bug in more cases. It still won't hit all potential cases until the next change, which reenables reading generation numbers. Use the skip_all trick from 0a2bfccb9c8 (t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04) to make the output clean when run on a 32-bit system. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01t5318: extract helpers to lib-commit-graph.shDerrick Stolee1-48/+2
The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01test-read-graph: include extra post-parse infoDerrick Stolee1-0/+1
It can be helpful to verify that the 'struct commit_graph' that results from parsing a commit-graph is correctly structured. The existence of different chunks is not enough to verify that all of the optional features are correctly enabled. Update 'test-tool read-graph' to output an "options:" line that includes information for different parts of the struct commit_graph. In particular, this change demonstrates that the read_generation_data option is never being enabled, which will be fixed in a later change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13t5000-t5999: detect and signal failure within loopEric Sunshine1-3/+3
Failures within `for` and `while` loops can go unnoticed if not detected and signaled manually since the loop itself does not abort when a contained command fails, nor will a failure necessarily be detected when the loop finishes since the loop returns the exit code of the last command it ran on the final iteration, which may not be the command which failed. Therefore, detect and signal failures manually within loops using the idiom `|| return 1` (or `|| exit 1` within subshells). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15fsck: verify commit graph when implicitly enabledGlen Choo1-1/+22
Change fsck to check the "core_commit_graph" variable set in "repo-settings.c" instead of reading the "core.commitGraph" variable. This fixes a bug where we wouldn't verify the commit-graph if the config key was missing. This bug was introduced in 31b1de6a09 (commit-graph: turn on commit-graph by default, 2019-08-13), where core.commitGraph was turned on by default. Add tests to "t5318-commit-graph.sh" to verify that fsck checks the commit-graph as expected for the 3 values of core.commitGraph. Also, disable GIT_TEST_COMMIT_GRAPH in t/t0410-partial-clone.sh because some test cases use fsck in ways that assume that commit-graph checking is disabled. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15Merge branch 'ab/ignore-replace-while-working-on-commit-graph' into ↵Junio C Hamano1-2/+3
gc/use-repo-settings * ab/ignore-replace-while-working-on-commit-graph: commit-graph: don't consider "replace" objects with "verify" commit-graph tests: fix another graph_git_two_modes() helper commit-graph tests: fix error-hiding graph_git_two_modes() helper
2021-10-15commit-graph: don't consider "replace" objects with "verify"Ævar Arnfjörð Bjarmason1-0/+1
Extend the code added in d6538246d3d (commit-graph: not compatible with replace objects, 2018-08-20) which ignored replace objects in the "write" command to ignore it in the "verify" command too. We can just move this assignment to the cmd_commit_graph(), it dispatches to "write" and "verify", and we're unlikely to ever get a sub-command that would like to consider replace refs. This will make tests added in eddc1f556cd (mktag tests: test update-ref and reachable fsck, 2021-06-17) pass in combination with the "GIT_TEST_COMMIT_GRAPH" mode added in 859fdc0c3cf (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29), except that mode is currently broken (but is being fixed concurrently). See the discussion starting at [1]. 1. https://lore.kernel.org/git/87wnmihswp.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15commit-graph tests: fix error-hiding graph_git_two_modes() helperÆvar Arnfjörð Bjarmason1-2/+2
The graph_git_two_modes() helper added in 177722b3442 (commit: integrate commit graph with commit parsing, 2018-04-10) didn't &&-chain its "git commit-graph" invocations, which as can be seen with SANITIZE=leak will happily mark tests as passing if both of these commands die, since test_cmp() will be comparing two empty files. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30commit-graph: show "unexpected subcommand" errorÆvar Arnfjörð Bjarmason1-1/+15
Bring the "commit-graph" command in line with the error output and general pattern in cmd_multi_pack_index(). Let's test for that output, and also cover the same potential bug as was fixed in the multi-pack-index command in 88617d11f9d (multi-pack-index: fix potential segfault without sub-command, 2021-07-19). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30commit-graph: show usage on "commit-graph [write|verify] garbage"Ævar Arnfjörð Bjarmason1-0/+5
Change the parse_options() invocation in the commit-graph code to error on unknown leftover argv elements, in addition to the existing and implicit erroring via parse_options() on unknown options. We'd already error in cmd_commit_graph() on e.g.: git commit-graph unknown verify git commit-graph --unknown verify But here we're calling parse_options() twice more for the "write" and "verify" subcommands. We did not do the same checking for leftover argv elements there. As a result we'd silently accept garbage in these subcommands, let's not do that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-22Merge branch 'ds/commit-graph-generation-config'Junio C Hamano1-1/+1
A new configuration variable has been introduced to allow choosing which version of the generation number gets used in the commit-graph file. * ds/commit-graph-generation-config: commit-graph: use config to specify generation type commit-graph: create local repository pointer
2021-03-01Merge branch 'ds/chunked-file-api'Junio C Hamano1-1/+1
The common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust. * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn
2021-02-25commit-graph: use config to specify generation typeDerrick Stolee1-1/+1
We have two established generation number versions: 1: topological levels 2: corrected commit dates The corrected commit dates are enabled by default, but they also write extra data in the GDAT and GDOV chunks. Services that host Git data might want to have more control over when this feature rolls out than just updating the Git binaries. Add a new "commitGraph.generationVersion" config option that specifies the intended generation number version. If this value is less than 2, then the GDAT chunk is never written _or read_ from an existing file. This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT environment variable in the test suite. Remove it. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22Merge branch 'ab/test-lib'Junio C Hamano1-1/+1
Test framework clean-up. * ab/test-lib: test-lib-functions: assert correct parameter count test-lib-functions: remove bug-inducing "diagnostics" helper param test libs: rename "diff-lib" to "lib-diff" t/.gitattributes: sort lines test-lib-functions: move function to lib-bitmap.sh test libs: rename gitweb-lib.sh to lib-gitweb.sh test libs: rename bundle helper to "lib-bundle.sh" test-lib-functions: remove generate_zero_bytes() wrapper test-lib-functions: move test_set_index_version() to its user test lib: change "error" to "BUG" as appropriate test-lib: remove check_var_migration
2021-02-18commit-graph: use chunk-format read APIDerrick Stolee1-1/+1
Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10test-lib-functions: remove generate_zero_bytes() wrapperÆvar Arnfjörð Bjarmason1-1/+1
Since d5cfd142ec1 (tests: teach the test-tool to generate NUL bytes and use it, 2019-02-14) the generate_zero_bytes() functions has been a thin wrapper for "test-tool genzeros". Let's have its only user call that directly instead. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01commit-graph: prepare commit graphDerrick Stolee1-1/+1
Before checking if the repository has a commit-graph loaded, be sure to run prepare_commit_graph(). This is necessary because otherwise the topo_levels slab is not initialized. As we compute topo_levels for the new commits, we iterate further into the lower layers since the first visit to each commit looks as though the topo_level is not populated. By properly initializing the topo_slab, we fix the previously broken case of a split commit graph where a base layer has the generation_data_overflow chunk. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>