From 61fd31a17975ba27ef1d4a3f25bf55b92f5e7738 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:01 -0500 Subject: t5326: demonstrate bitmap corruption after permutation This patch demonstrates a cause of bitmap corruption that can occur when the contents of the multi-pack index does not change, but the underlying object order does. In this example, we have a MIDX containing two packs, each with a distinct set of objects (pack A corresponds to the tree, blob, and commit from the first patch, and pack B corresponds to the second patch). First, a MIDX is written where the 'A' pack is preferred. As expected, the bitmaps generated there are in-tact. But then, we generate an identical MIDX with a different object order: this time preferring pack 'B'. Due to a bug which will be explained and fixed in the following commit, the MIDX is updated, but the .rev file is not, causing the .bitmap file to be read incorrectly. Specifically, the .bitmap file will contain correct data, but the auxiliary object order in the .rev file is stale, causing readers to get confused by reading the new bitmaps using the old object order. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- t/t5326-multi-pack-bitmaps.sh | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index e187f90f29..0ca2868b0b 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -395,4 +395,35 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' ' ) ' +test_expect_failure 'changing the preferred pack does not corrupt bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit A && + test_commit B && + + git rev-list --objects --no-object-names HEAD^ >A.objects && + git rev-list --objects --no-object-names HEAD^.. >B.objects && + + A=$(git pack-objects $objdir/pack/pack indexes <<-EOF && + pack-$A.idx + pack-$B.idx + EOF + + git multi-pack-index write --bitmap --stdin-packs \ + --preferred-pack=pack-$A.pack Date: Tue, 25 Jan 2022 17:41:03 -0500 Subject: midx.c: make changing the preferred pack safe The previous patch demonstrates a bug where a MIDX's auxiliary object order can become out of sync with a MIDX bitmap. This is because of two confounding factors: - First, the object order is stored in a file which is named according to the multi-pack index's checksum, and the MIDX does not store the object order. This means that the object order can change without altering the checksum. - But the .rev file is moved into place with finalize_object_file(), which link(2)'s the file into place instead of renaming it. For us, that means that a modified .rev file will not be moved into place if MIDX's checksum was unchanged. This fix is to force the MIDX's checksum to change when the preferred pack changes but the set of packs contained in the MIDX does not. In other words, when the object order changes, the MIDX's checksum needs to change with it (regardless of whether the MIDX is tracking the same or different packs). This prevents a race whereby changing the object order (but not the packs themselves) enables a reader to see the new .rev file with the old MIDX, or similarly seeing the new bitmap with the old object order. But why can't we just stop hardlinking the .rev into place instead adding additional data to the MIDX? Suppose that's what we did. Then when we go to generate the new bitmap, we'll load the old MIDX bitmap, along with the MIDX that it references. That's fine, since the new MIDX isn't moved into place until after the new bitmap is generated. But the new object order *has* been moved into place. So we'll read the old bitmaps in the new order when generating the new bitmap file, meaning that without this secondary change, bitmap generation itself would become a victim of the race described here. This can all be prevented by forcing the MIDX's checksum to change when the object order does. By embedding the entire object order into the MIDX, we do just that. That is, the MIDX's checksum will change in response to any perturbation of the underlying object order. In t5326, this will cause the MIDX's checksum to update (even without changing the set of packs in the MIDX), preventing the stale read problem. Note that this makes it safe to continue to link(2) the MIDX .rev file into place, since it is now impossible to have a .rev file that is out-of-sync with the MIDX whose checksum it references. (But we will do away with MIDX .rev files later in this series anyway, so this is somewhat of a moot point). In theory, it is possible to store a "fingerprint" of the full object order here, so long as that fingerprint changes at least as often as the full object order does. Some possibilities here include storing the identity of the preferred pack, along with the mtimes of the non-preferred packs in a consistent order. But storing a limited part of the information makes it difficult to reason about whether or not there are gaps between the two that would cause us to get bitten by this bug again. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- Documentation/technical/multi-pack-index.txt | 1 + Documentation/technical/pack-format.txt | 13 +++++++------ midx.c | 23 ++++++++++++++++++++--- t/t5326-multi-pack-bitmaps.sh | 2 +- 4 files changed, 29 insertions(+), 10 deletions(-) diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt index b39c69da8c..f2221d2b44 100644 --- a/Documentation/technical/multi-pack-index.txt +++ b/Documentation/technical/multi-pack-index.txt @@ -24,6 +24,7 @@ and their offsets into multiple packfiles. It contains: ** An offset within the jth packfile for the object. * If large offsets are required, we use another list of large offsets similar to version 2 pack-indexes. +- An optional list of objects in pseudo-pack order (used with MIDX bitmaps). Thus, we can provide O(log N) lookup time for any number of packfiles. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 8d2f42f29e..6d3efb7d16 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -376,6 +376,11 @@ CHUNK DATA: [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'}) 8-byte offsets into large packfiles. + [Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'}) + A list of MIDX positions (one per object in the MIDX, num_objects in + total, each a 4-byte unsigned integer in network byte order), sorted + according to their relative bitmap/pseudo-pack positions. + TRAILER: Index checksum of the above contents. @@ -456,9 +461,5 @@ In short, a MIDX's pseudo-pack is the de-duplicated concatenation of objects in packs stored by the MIDX, laid out in pack order, and the packs arranged in MIDX order (with the preferred pack coming first). -Finally, note that the MIDX's reverse index is not stored as a chunk in -the multi-pack-index itself. This is done because the reverse index -includes the checksum of the pack or MIDX to which it belongs, which -makes it impossible to write in the MIDX. To avoid races when rewriting -the MIDX, a MIDX reverse index includes the MIDX's checksum in its -filename (e.g., `multi-pack-index-xyz.rev`). +The MIDX's reverse index is stored in the optional 'RIDX' chunk within +the MIDX itself. diff --git a/midx.c b/midx.c index 837b46b2af..0248c4577c 100644 --- a/midx.c +++ b/midx.c @@ -33,6 +33,7 @@ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */ #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */ +#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */ #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256) #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t)) #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) @@ -833,6 +834,18 @@ static int write_midx_large_offsets(struct hashfile *f, return 0; } +static int write_midx_revindex(struct hashfile *f, + void *data) +{ + struct write_midx_context *ctx = data; + uint32_t i; + + for (i = 0; i < ctx->entries_nr; i++) + hashwrite_be32(f, ctx->pack_order[i]); + + return 0; +} + struct midx_pack_order_data { uint32_t nr; uint32_t pack; @@ -1403,15 +1416,19 @@ static int write_midx_internal(const char *object_dir, (size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH, write_midx_large_offsets); + if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) { + ctx.pack_order = midx_pack_order(&ctx); + add_chunk(cf, MIDX_CHUNKID_REVINDEX, + ctx.entries_nr * sizeof(uint32_t), + write_midx_revindex); + } + write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); write_chunkfile(cf, &ctx); finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM); free_chunkfile(cf); - if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) - ctx.pack_order = midx_pack_order(&ctx); - if (flags & MIDX_WRITE_REV_INDEX) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); if (flags & MIDX_WRITE_BITMAP) { diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 0ca2868b0b..353282310d 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -395,7 +395,7 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' ' ) ' -test_expect_failure 'changing the preferred pack does not corrupt bitmaps' ' +test_expect_success 'changing the preferred pack does not corrupt bitmaps' ' rm -fr repo && git init repo && test_when_finished "rm -fr repo" && -- cgit v1.2.3 From 09a77999e787911244e2a8d09fdce7a7428429f4 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:05 -0500 Subject: pack-revindex.c: instrument loading on-disk reverse index In a subsequent commit, we'll use the MIDX's new 'RIDX' chunk as a source for the reverse index's data. But it will be useful for tests to be able to determine whether the reverse index was loaded from the separate .rev file, or from a chunk within the MIDX. To instrument this, add a trace2 event which the tests can look for in order to determine the reverse index's source. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- pack-revindex.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pack-revindex.c b/pack-revindex.c index 70d0fbafcb..bd15ebad03 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -301,6 +301,9 @@ int load_midx_revindex(struct multi_pack_index *m) if (m->revindex_data) return 0; + trace2_data_string("load_midx_revindex", the_repository, + "source", "rev"); + get_midx_rev_filename(&revindex_name, m); ret = load_revindex_from_disk(revindex_name.buf, -- cgit v1.2.3 From 90a8ea47d88b38ececc07205f512297d8794e877 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:08 -0500 Subject: t5326: drop unnecessary setup The core.multiPackIndex config became true by default back in 18e449f86b (midx: enable core.multiPackIndex by default, 2020-09-25), so it is no longer necessary to enable it explicitly. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- t/t5326-multi-pack-bitmaps.sh | 4 ---- 1 file changed, 4 deletions(-) diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 353282310d..1a9581af30 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -19,10 +19,6 @@ midx_pack_source () { setup_bitmap_history -test_expect_success 'enable core.multiPackIndex' ' - git config core.multiPackIndex true -' - test_expect_success 'create single-pack midx with bitmaps' ' git repack -ad && git multi-pack-index write --bitmap && -- cgit v1.2.3 From f0ed59afcce8e71efe7b7b32b66e6e896455bb1d Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:10 -0500 Subject: t5326: extract `test_rev_exists` To determine which source of data is used for the MIDX's reverse index cache, introduce a helper which forces loading the reverse index, and then looks for the special trace2 event introduced in a previous commit. For now, this helper just looks for when the legacy MIDX .rev file was loaded, but in a subsequent commit will become parameterized over the the reverse index's source. This function replaces checking for the existence of the .rev file. We could write a similar helper to ensure that the .rev file is cleaned up after repacking, but it will make subsequent tests more difficult to write, and provides marginal value since we already check that the MIDX .bitmap file is removed. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- t/t5326-multi-pack-bitmaps.sh | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 1a9581af30..999740f8c7 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -17,16 +17,29 @@ midx_pack_source () { test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2 } +test_rev_exists () { + commit="$1" + + test_expect_success 'reverse index exists' ' + GIT_TRACE2_EVENT=$(pwd)/event.trace \ + git rev-list --test-bitmap "$commit" && + + test_path_is_file $midx-$(midx_checksum $objdir).rev && + grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace + ' +} + setup_bitmap_history test_expect_success 'create single-pack midx with bitmaps' ' git repack -ad && git multi-pack-index write --bitmap && test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev + test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' +test_rev_exists HEAD + basic_bitmap_tests test_expect_success 'create new additional packs' ' @@ -52,10 +65,11 @@ test_expect_success 'create multi-pack midx with bitmaps' ' test_line_count = 25 packs && test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev + test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' +test_rev_exists HEAD + basic_bitmap_tests test_expect_success '--no-bitmap is respected when bitmaps exist' ' @@ -66,7 +80,6 @@ test_expect_success '--no-bitmap is respected when bitmaps exist' ' test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev && git multi-pack-index write --no-bitmap && @@ -206,10 +219,11 @@ test_expect_success 'setup partial bitmaps' ' test_commit loose && git multi-pack-index write --bitmap 2>err && test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev + test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' +test_rev_exists HEAD~ + basic_bitmap_tests HEAD~ test_expect_success 'removing a MIDX clears stale bitmaps' ' @@ -224,7 +238,6 @@ test_expect_success 'removing a MIDX clears stale bitmaps' ' # Write a MIDX and bitmap; remove the MIDX but leave the bitmap. stale_bitmap=$midx-$(midx_checksum $objdir).bitmap && - stale_rev=$midx-$(midx_checksum $objdir).rev && rm $midx && # Then write a new MIDX. @@ -234,9 +247,7 @@ test_expect_success 'removing a MIDX clears stale bitmaps' ' test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev && - test_path_is_missing $stale_bitmap && - test_path_is_missing $stale_rev + test_path_is_missing $stale_bitmap ) ' @@ -257,7 +268,6 @@ test_expect_success 'pack.preferBitmapTips' ' git multi-pack-index write --bitmap && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev && test-tool bitmap list-commits | sort >bitmaps && comm -13 bitmaps commits >before && @@ -267,7 +277,6 @@ test_expect_success 'pack.preferBitmapTips' ' snapshot && rm -fr $midx-$(midx_checksum $objdir).bitmap && - rm -fr $midx-$(midx_checksum $objdir).rev && rm -fr $midx && git multi-pack-index write --bitmap --refs-snapshot=snapshot && -- cgit v1.2.3 From 791170fa2b23cfc49ae0e5949b1f301431a6058b Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:13 -0500 Subject: t5326: move tests to t/lib-bitmap.sh In t5326, we have a handful of tests that we would like to run twice: once using the MIDX's new `RIDX` chunk as the source of the reverse-index cache, and once using the separate `.rev` file. But because these tests mutate the state of the underlying repository, and then make assumptions about those mutations occurring in a certain sequence, simply running the tests twice in the same repository is awkward. Instead, extract the core of interesting tests into t/lib-bitmap.sh to prepare for them to be run twice, each in a separate test script. This means that they can each operate on a separate repository, removing any concerns about mutating state. For now, this patch is a strict cut-and-paste of some tests from t5326. The tests which did not move are not interesting with respect to the source of their reverse index data. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- t/lib-bitmap.sh | 177 ++++++++++++++++++++++++++++++++++++++++++ t/t5326-multi-pack-bitmaps.sh | 173 +---------------------------------------- 2 files changed, 179 insertions(+), 171 deletions(-) diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index 21d0392dda..48a8730a13 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -1,6 +1,9 @@ # Helpers for scripts testing bitmap functionality; see t5310 for # example usage. +objdir=.git/objects +midx=$objdir/pack/multi-pack-index + # Compare a file containing rev-list bitmap traversal output to its non-bitmap # counterpart. You can't just use test_cmp for this, because the two produce # subtly different output: @@ -264,3 +267,177 @@ have_delta () { midx_checksum () { test-tool read-midx --checksum "$1" } + +# midx_pack_source +midx_pack_source () { + test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2 +} + +test_rev_exists () { + commit="$1" + + test_expect_success 'reverse index exists' ' + GIT_TRACE2_EVENT=$(pwd)/event.trace \ + git rev-list --test-bitmap "$commit" && + + test_path_is_file $midx-$(midx_checksum $objdir).rev && + grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace + ' +} + +midx_bitmap_core () { + setup_bitmap_history + + test_expect_success 'create single-pack midx with bitmaps' ' + git repack -ad && + git multi-pack-index write --bitmap && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap + ' + + test_rev_exists HEAD + + basic_bitmap_tests + + test_expect_success 'create new additional packs' ' + for i in $(test_seq 1 16) + do + test_commit "$i" && + git repack -d || return 1 + done && + + git checkout -b other2 HEAD~8 && + for i in $(test_seq 1 8) + do + test_commit "side-$i" && + git repack -d || return 1 + done && + git checkout second + ' + + test_expect_success 'create multi-pack midx with bitmaps' ' + git multi-pack-index write --bitmap && + + ls $objdir/pack/pack-*.pack >packs && + test_line_count = 25 packs && + + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap + ' + + test_rev_exists HEAD + + basic_bitmap_tests + + test_expect_success '--no-bitmap is respected when bitmaps exist' ' + git multi-pack-index write --bitmap && + + test_commit respect--no-bitmap && + git repack -d && + + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + + git multi-pack-index write --no-bitmap && + + test_path_is_file $midx && + test_path_is_missing $midx-$(midx_checksum $objdir).bitmap && + test_path_is_missing $midx-$(midx_checksum $objdir).rev + ' + + test_expect_success 'setup midx with base from later pack' ' + # Write a and b so that "a" is a delta on top of base "b", since Git + # prefers to delete contents out of a base rather than add to a shorter + # object. + test_seq 1 128 >a && + test_seq 1 130 >b && + + git add a b && + git commit -m "initial commit" && + + a=$(git rev-parse HEAD:a) && + b=$(git rev-parse HEAD:b) && + + # In the first pack, "a" is stored as a delta to "b". + p1=$(git pack-objects .git/objects/pack/pack <<-EOF + $a + $b + EOF + ) && + + # In the second pack, "a" is missing, and "b" is not a delta nor base to + # any other object. + p2=$(git pack-objects .git/objects/pack/pack <<-EOF + $b + $(git rev-parse HEAD) + $(git rev-parse HEAD^{tree}) + EOF + ) && + + git prune-packed && + # Use the second pack as the preferred source, so that "b" occurs + # earlier in the MIDX object order, rendering "a" unusable for pack + # reuse. + git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx && + + have_delta $a $b && + test $(midx_pack_source $a) != $(midx_pack_source $b) + ' + + rev_list_tests 'full bitmap with backwards delta' + + test_expect_success 'clone with bitmaps enabled' ' + git clone --no-local --bare . clone-reverse-delta.git && + test_when_finished "rm -fr clone-reverse-delta.git" && + + git rev-parse HEAD >expect && + git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual && + test_cmp expect actual + ' + + test_expect_success 'changing the preferred pack does not corrupt bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit A && + test_commit B && + + git rev-list --objects --no-object-names HEAD^ >A.objects && + git rev-list --objects --no-object-names HEAD^.. >B.objects && + + A=$(git pack-objects $objdir/pack/pack indexes <<-EOF && + pack-$A.idx + pack-$B.idx + EOF + + git multi-pack-index write --bitmap --stdin-packs \ + --preferred-pack=pack-$A.pack err && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap + ' + + test_rev_exists HEAD~ + + basic_bitmap_tests HEAD~ +} diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 999740f8c7..100ac90d15 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -9,134 +9,7 @@ test_description='exercise basic multi-pack bitmap functionality' GIT_TEST_MULTI_PACK_INDEX=0 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 -objdir=.git/objects -midx=$objdir/pack/multi-pack-index - -# midx_pack_source -midx_pack_source () { - test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2 -} - -test_rev_exists () { - commit="$1" - - test_expect_success 'reverse index exists' ' - GIT_TRACE2_EVENT=$(pwd)/event.trace \ - git rev-list --test-bitmap "$commit" && - - test_path_is_file $midx-$(midx_checksum $objdir).rev && - grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace - ' -} - -setup_bitmap_history - -test_expect_success 'create single-pack midx with bitmaps' ' - git repack -ad && - git multi-pack-index write --bitmap && - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap -' - -test_rev_exists HEAD - -basic_bitmap_tests - -test_expect_success 'create new additional packs' ' - for i in $(test_seq 1 16) - do - test_commit "$i" && - git repack -d || return 1 - done && - - git checkout -b other2 HEAD~8 && - for i in $(test_seq 1 8) - do - test_commit "side-$i" && - git repack -d || return 1 - done && - git checkout second -' - -test_expect_success 'create multi-pack midx with bitmaps' ' - git multi-pack-index write --bitmap && - - ls $objdir/pack/pack-*.pack >packs && - test_line_count = 25 packs && - - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap -' - -test_rev_exists HEAD - -basic_bitmap_tests - -test_expect_success '--no-bitmap is respected when bitmaps exist' ' - git multi-pack-index write --bitmap && - - test_commit respect--no-bitmap && - git repack -d && - - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - - git multi-pack-index write --no-bitmap && - - test_path_is_file $midx && - test_path_is_missing $midx-$(midx_checksum $objdir).bitmap && - test_path_is_missing $midx-$(midx_checksum $objdir).rev -' - -test_expect_success 'setup midx with base from later pack' ' - # Write a and b so that "a" is a delta on top of base "b", since Git - # prefers to delete contents out of a base rather than add to a shorter - # object. - test_seq 1 128 >a && - test_seq 1 130 >b && - - git add a b && - git commit -m "initial commit" && - - a=$(git rev-parse HEAD:a) && - b=$(git rev-parse HEAD:b) && - - # In the first pack, "a" is stored as a delta to "b". - p1=$(git pack-objects .git/objects/pack/pack <<-EOF - $a - $b - EOF - ) && - - # In the second pack, "a" is missing, and "b" is not a delta nor base to - # any other object. - p2=$(git pack-objects .git/objects/pack/pack <<-EOF - $b - $(git rev-parse HEAD) - $(git rev-parse HEAD^{tree}) - EOF - ) && - - git prune-packed && - # Use the second pack as the preferred source, so that "b" occurs - # earlier in the MIDX object order, rendering "a" unusable for pack - # reuse. - git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx && - - have_delta $a $b && - test $(midx_pack_source $a) != $(midx_pack_source $b) -' - -rev_list_tests 'full bitmap with backwards delta' - -test_expect_success 'clone with bitmaps enabled' ' - git clone --no-local --bare . clone-reverse-delta.git && - test_when_finished "rm -fr clone-reverse-delta.git" && - - git rev-parse HEAD >expect && - git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual && - test_cmp expect actual -' +midx_bitmap_core bitmap_reuse_tests() { from=$1 @@ -213,18 +86,7 @@ test_expect_success 'missing object closure fails gracefully' ' ) ' -test_expect_success 'setup partial bitmaps' ' - test_commit packed && - git repack && - test_commit loose && - git multi-pack-index write --bitmap 2>err && - test_path_is_file $midx && - test_path_is_file $midx-$(midx_checksum $objdir).bitmap -' - -test_rev_exists HEAD~ - -basic_bitmap_tests HEAD~ +midx_bitmap_partial_tests test_expect_success 'removing a MIDX clears stale bitmaps' ' rm -fr repo && @@ -398,35 +260,4 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' ' ) ' -test_expect_success 'changing the preferred pack does not corrupt bitmaps' ' - rm -fr repo && - git init repo && - test_when_finished "rm -fr repo" && - ( - cd repo && - - test_commit A && - test_commit B && - - git rev-list --objects --no-object-names HEAD^ >A.objects && - git rev-list --objects --no-object-names HEAD^.. >B.objects && - - A=$(git pack-objects $objdir/pack/pack indexes <<-EOF && - pack-$A.idx - pack-$B.idx - EOF - - git multi-pack-index write --bitmap --stdin-packs \ - --preferred-pack=pack-$A.pack Date: Tue, 25 Jan 2022 17:41:15 -0500 Subject: t/lib-bitmap.sh: parameterize tests over reverse index source To prepare for reading the reverse index data out of the MIDX itself, teach the `test_rev_exists` function to take an expected "source" for the reverse index data. When given "rev", it asserts that the MIDX's `.rev` file exists, and is loaded when verifying the integrity of its bitmaps. Otherwise, it ensures that trace2 reports the source of the reverse index data as the same string which was given to test_rev_exists(). The following patch will implement reading the reverse index data from the MIDX itself. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- t/lib-bitmap.sh | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index 48a8730a13..253895c04e 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -275,17 +275,23 @@ midx_pack_source () { test_rev_exists () { commit="$1" + kind="$2" - test_expect_success 'reverse index exists' ' + test_expect_success "reverse index exists ($kind)" ' GIT_TRACE2_EVENT=$(pwd)/event.trace \ git rev-list --test-bitmap "$commit" && - test_path_is_file $midx-$(midx_checksum $objdir).rev && - grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace + if test "rev" = "$kind" + then + test_path_is_file $midx-$(midx_checksum $objdir).rev + fi && + grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"$kind\"" event.trace ' } midx_bitmap_core () { + rev_kind="${1:-rev}" + setup_bitmap_history test_expect_success 'create single-pack midx with bitmaps' ' @@ -295,7 +301,7 @@ midx_bitmap_core () { test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' - test_rev_exists HEAD + test_rev_exists HEAD "$rev_kind" basic_bitmap_tests @@ -325,7 +331,7 @@ midx_bitmap_core () { test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' - test_rev_exists HEAD + test_rev_exists HEAD "$rev_kind" basic_bitmap_tests @@ -428,6 +434,8 @@ midx_bitmap_core () { } midx_bitmap_partial_tests () { + rev_kind="${1:-rev}" + test_expect_success 'setup partial bitmaps' ' test_commit packed && git repack && @@ -437,7 +445,7 @@ midx_bitmap_partial_tests () { test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' - test_rev_exists HEAD~ + test_rev_exists HEAD~ "$rev_kind" basic_bitmap_tests HEAD~ } -- cgit v1.2.3 From 7f514b7a5e775cd0b7a9543b02bf9dd38b164d02 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:17 -0500 Subject: midx: read `RIDX` chunk when present When a MIDX contains the new `RIDX` chunk, ensure that the reverse index is read from it instead of the on-disk .rev file. Since we need to encode the object order in the MIDX itself for correctness reasons, there is no point in storing the same data again outside of the MIDX. So, this patch stops writing separate .rev files, and reads it out of the MIDX itself. This is possible to do with relatively little new code, since the format of the RIDX chunk is identical to the data in the .rev file. In other words, we can implement this by pointing the `revindex_data` field at the reverse index chunk of the MIDX instead of the .rev file without any other changes. Note that we have two knobs that are adjusted for the new tests: GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls whether the MIDX .rev is written at all, and the latter controls whether we read the MIDX's RIDX chunk. Both are necessary to ensure that the test added at the beginning of this series continues to work. This is because we always need to write the RIDX chunk in the MIDX in order to change its checksum, but we want to make sure reading the existing .rev file still works (since the RIDX chunk takes precedence by default). Arguably this isn't a very interesting mode to test, because the precedence rules mean that we'll always read the RIDX chunk over the .rev file. But it makes it impossible for a user to induce corruption in their repository by adjusting the test knobs (since if we had an either/or knob they could stop writing the RIDX chunk, allowing them to tweak the MIDX's object order without changing its checksum). Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- midx.c | 6 +++++- midx.h | 1 + pack-revindex.c | 17 +++++++++++++++++ t/lib-bitmap.sh | 4 ++-- t/t5326-multi-pack-bitmaps.sh | 6 ++++++ t/t5327-multi-pack-bitmaps-rev.sh | 23 +++++++++++++++++++++++ t/t7700-repack.sh | 4 ---- 7 files changed, 54 insertions(+), 7 deletions(-) create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh diff --git a/midx.c b/midx.c index 0248c4577c..6e6cb0eb04 100644 --- a/midx.c +++ b/midx.c @@ -162,6 +162,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets); + if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1)) + pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex); + m->num_objects = ntohl(m->chunk_oid_fanout[255]); CALLOC_ARRAY(m->pack_names, m->num_packs); @@ -1429,7 +1432,8 @@ static int write_midx_internal(const char *object_dir, finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM); free_chunkfile(cf); - if (flags & MIDX_WRITE_REV_INDEX) + if (flags & MIDX_WRITE_REV_INDEX && + git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0)) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); if (flags & MIDX_WRITE_BITMAP) { if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx, diff --git a/midx.h b/midx.h index b7d79a515c..22e8e53288 100644 --- a/midx.h +++ b/midx.h @@ -36,6 +36,7 @@ struct multi_pack_index { const unsigned char *chunk_oid_lookup; const unsigned char *chunk_object_offsets; const unsigned char *chunk_large_offsets; + const unsigned char *chunk_revindex; const char **pack_names; struct packed_git **packs; diff --git a/pack-revindex.c b/pack-revindex.c index bd15ebad03..08dc160167 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -298,9 +298,26 @@ int load_midx_revindex(struct multi_pack_index *m) { struct strbuf revindex_name = STRBUF_INIT; int ret; + if (m->revindex_data) return 0; + if (m->chunk_revindex) { + /* + * If the MIDX `m` has a `RIDX` chunk, then use its contents for + * the reverse index instead of trying to load a separate `.rev` + * file. + * + * Note that we do *not* set `m->revindex_map` here, since we do + * not want to accidentally call munmap() in the middle of the + * MIDX. + */ + trace2_data_string("load_midx_revindex", the_repository, + "source", "midx"); + m->revindex_data = (const uint32_t *)m->chunk_revindex; + return 0; + } + trace2_data_string("load_midx_revindex", the_repository, "source", "rev"); diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index 253895c04e..a95537e759 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -290,7 +290,7 @@ test_rev_exists () { } midx_bitmap_core () { - rev_kind="${1:-rev}" + rev_kind="${1:-midx}" setup_bitmap_history @@ -434,7 +434,7 @@ midx_bitmap_core () { } midx_bitmap_partial_tests () { - rev_kind="${1:-rev}" + rev_kind="${1:-midx}" test_expect_success 'setup partial bitmaps' ' test_commit packed && diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 100ac90d15..c0924074c4 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -9,6 +9,12 @@ test_description='exercise basic multi-pack bitmap functionality' GIT_TEST_MULTI_PACK_INDEX=0 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 +# This test exercise multi-pack bitmap functionality where the object order is +# stored and read from a special chunk within the MIDX, so use the default +# behavior here. +sane_unset GIT_TEST_MIDX_WRITE_REV +sane_unset GIT_TEST_MIDX_READ_RIDX + midx_bitmap_core bitmap_reuse_tests() { diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh new file mode 100755 index 0000000000..d30ba632c8 --- /dev/null +++ b/t/t5327-multi-pack-bitmaps-rev.sh @@ -0,0 +1,23 @@ +#!/bin/sh + +test_description='exercise basic multi-pack bitmap functionality (.rev files)' + +. ./test-lib.sh +. "${TEST_DIRECTORY}/lib-bitmap.sh" + +# We'll be writing our own midx and bitmaps, so avoid getting confused by the +# automatic ones. +GIT_TEST_MULTI_PACK_INDEX=0 +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 + +# Unlike t5326, this test exercise multi-pack bitmap functionality where the +# object order is stored in a separate .rev file. +GIT_TEST_MIDX_WRITE_REV=1 +GIT_TEST_MIDX_READ_RIDX=0 +export GIT_TEST_MIDX_WRITE_REV +export GIT_TEST_MIDX_READ_RIDX + +midx_bitmap_core rev +midx_bitmap_partial_tests rev + +test_done diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index 4693f8dc2b..02a6633a16 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -311,16 +311,13 @@ test_expect_success 'cleans up MIDX when appropriate' ' checksum=$(midx_checksum $objdir) && test_path_is_file $midx && test_path_is_file $midx-$checksum.bitmap && - test_path_is_file $midx-$checksum.rev && test_commit repack-3 && GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx && test_path_is_file $midx && test_path_is_missing $midx-$checksum.bitmap && - test_path_is_missing $midx-$checksum.rev && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_path_is_file $midx-$(midx_checksum $objdir).rev && test_commit repack-4 && GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb && @@ -353,7 +350,6 @@ test_expect_success '--write-midx with preferred bitmap tips' ' test_line_count = 1 before && rm -fr $midx-$(midx_checksum $objdir).bitmap && - rm -fr $midx-$(midx_checksum $objdir).rev && rm -fr $midx && # instead of constructing the snapshot ourselves (c.f., the test -- cgit v1.2.3 From f8b60cf99b0ab10d915c7bfd4802a1af45d0d439 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 25 Jan 2022 17:41:20 -0500 Subject: pack-bitmap.c: gracefully fallback after opening pack/MIDX When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs. By design, these functions are supposed to be called over every pack and MIDX, since only one of them should have a valid bitmap. Ordinarily we return '0' from these two functions in order to indicate that we successfully loaded a bitmap To signal that we couldn't load a bitmap corresponding to the MIDX/pack (either because one doesn't exist, or because there was an error with loading it), we can return '-1'. In either case, the callers each enumerate all MIDXs/packs to ensure that at most one bitmap per-kind is present. But when we fail to load a bitmap that does exist (for example, loading a MIDX bitmap without finding a corresponding reverse index), we'll return -1 but leave the 'midx' field non-NULL. So when we fallback to loading a pack bitmap, we'll complain that the bitmap we're trying to populate already is "opened", even though it isn't. Rectify this by setting the '->pack' and '->midx' field back to NULL as appropriate. Two tests are added: one to ensure that the MIDX-to-pack bitmap fallback works, and another to ensure we still complain when there are multiple pack bitmaps in a repository. Signed-off-by: Taylor Blau Reviewed-by: Derrick Stolee Reviewed-by: Jonathan Tan Signed-off-by: Junio C Hamano --- pack-bitmap.c | 4 ++++ t/t5310-pack-bitmaps.sh | 28 ++++++++++++++++++++++++++++ t/t5326-multi-pack-bitmaps.sh | 19 +++++++++++++++++++ 3 files changed, 51 insertions(+) diff --git a/pack-bitmap.c b/pack-bitmap.c index f772d3cb7f..9c666cdb8b 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -358,7 +358,9 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, cleanup: munmap(bitmap_git->map, bitmap_git->map_size); bitmap_git->map_size = 0; + bitmap_git->map_pos = 0; bitmap_git->map = NULL; + bitmap_git->midx = NULL; return -1; } @@ -405,6 +407,8 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git munmap(bitmap_git->map, bitmap_git->map_size); bitmap_git->map = NULL; bitmap_git->map_size = 0; + bitmap_git->map_pos = 0; + bitmap_git->pack = NULL; return -1; } diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index d05ab716f6..f775fc1ce6 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -397,4 +397,32 @@ test_expect_success 'pack.preferBitmapTips' ' ) ' +test_expect_success 'complains about multiple pack bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit base && + + git repack -adb && + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" && + mv "$bitmap" "$bitmap.bak" && + + test_commit other && + git repack -ab && + + mv "$bitmap.bak" "$bitmap" && + + find .git/objects/pack -type f -name "*.pack" >packs && + find .git/objects/pack -type f -name "*.bitmap" >bitmaps && + test_line_count = 2 packs && + test_line_count = 2 bitmaps && + + git rev-list --use-bitmap-index HEAD 2>err && + grep "ignoring extra bitmap file" err + ) +' + test_done diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index c0924074c4..3c1ecc7e25 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -266,4 +266,23 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' ' ) ' +test_expect_success 'graceful fallback when missing reverse index' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit base && + + # write a pack and MIDX bitmap containing base + git repack -adb && + git multi-pack-index write --bitmap && + + GIT_TEST_MIDX_READ_RIDX=0 \ + git rev-list --use-bitmap-index HEAD 2>err && + ! grep "ignoring extra bitmap file" err + ) +' + test_done -- cgit v1.2.3