aboutsummaryrefslogtreecommitdiffstats
path: root/t/t5300-pack-object.sh (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-07-21Merge branch 'bc/use-sha256-by-default-in-3.0'Junio C Hamano1-3/+3
Prepare to flip the default hash function to SHA-256. * bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm
2025-07-01t5300: choose the built-in hash outside of a repobrian m. carlson1-3/+3
Right now, the built-in default hash is always SHA-1, but that will change in a future commit. Instead of assuming that operating outside of a repository will always use SHA-1, look up the default hash algorithm for operating outside of a repository using an appropriate environment variable, which will always be correct. Additionally, for operations outside of a repository, use the DEFAULT_HASH_ALGORITHM prerequisite rather than SHA1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-17Merge branch 'ds/path-walk-2'Junio C Hamano1-0/+19
"git pack-objects" learns to find delta bases from blobs at the same path, using the --path-walk API. * ds/path-walk-2: pack-objects: allow --shallow and --path-walk path-walk: add new 'edge_aggressive' option pack-objects: thread the path-based compression pack-objects: refactor path-walk delta phase scalar: enable path-walk during push via config pack-objects: enable --path-walk via config repack: add --path-walk option t5538: add tests to confirm deltas in shallow pushes pack-objects: introduce GIT_TEST_PACK_PATH_WALK p5313: add performance tests for --path-walk pack-objects: update usage to match docs pack-objects: add --path-walk option pack-objects: extract should_attempt_deltas()
2025-05-16pack-objects: refactor path-walk delta phaseDerrick Stolee1-2/+6
Previously, the --path-walk option to 'git pack-objects' would compute deltas inline with the path-walk logic. This would make the progress indicator look like it is taking a long time to enumerate objects, and then very quickly computed deltas. Instead of computing deltas on each region of objects organized by tree, store a list of regions corresponding to these groups. These can later be pulled from the list for delta compression before doing the "global" delta search. This presents a new progress indicator that can be used in tests to verify that this stage is happening. The current implementation is not integrated with threads, but we are setting it up to arrive in the next change. Since we do not attempt to sort objects by size until after exploring all trees, we can remove the previous change to t5530 due to a different error message appearing first. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: add --path-walk optionDerrick Stolee1-0/+15
In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07t: refactor tests depending on Perl to print dataPatrick Steinhardt1-11/+5
A bunch of tests rely on Perl to print data in various different ways. These usages fall into the following categories: - Print data conditionally by matching patterns. These usecases can be converted to use awk(1) rather easily. - Print data repeatedly. These usecases can typically be converted to use a combination of `test-tool genzeros` and sed(1). - Print data in reverse. These usecases can be converted to use awk(1) or `sort -r`. Refactor the tests accordingly so that we can drop a couple of PERL_TEST_HELPERS prerequisites. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07t: introduce PERL_TEST_HELPERS prerequisitePatrick Steinhardt1-0/+6
In the early days of Git, Perl was used quite prominently throughout the project. This has changed significantly as almost all of the executables we ship nowadays have eventually been rewritten in C. Only a handful of subsystems remain that require Perl: - gitweb, a read-only web interface. - A couple of scripts that allow importing repositories from GNU Arch, CVS and Subversion. - git-send-email(1), which can be used to send mails. - git-request-pull(1), which is used to request somebody to pull from a URL by sending an email. - git-filter-branch(1), which uses Perl with the `--state-branch` option. This command is typically recommended against nowadays in favor of git-filter-repo(1). - Our Perl bindings for Git. - The netrc Git credential helper. None of these subsystems can really be considered to be part of the "core" of Git, and an installation without them is fully functional. It is more likely than not that an end user wouldn't even notice that any features are missing if those tools weren't installed. But while Perl nowadays very much is an optional dependency of Git, there is a significant limitation when Perl isn't available: developers cannot run our test suite. Preceding commits have started to lift this restriction by removing the strict dependency on Perl in many central parts of the test library. But there are still many tests that rely on small Perl helpers to do various different things. Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests that require Perl. This prerequisite is explicitly different than the preexisting PERL prerequisite: - PERL records whether or not features depending on the Perl interpreter are built. - PERL_TEST_HELPERS records whether or not a Perl interpreter is available for our tests. By having these two separate prerequisites we can thus distinguish between tests that inherently depend on Perl because the underlying feature does, and those tests that depend on Perl because the test itself is using Perl. Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12Merge branch 'ds/name-hash-tweaks'Junio C Hamano1-0/+34
"git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version
2025-01-27pack-objects: add GIT_TEST_NAME_HASH_VERSIONDerrick Stolee1-2/+5
Add a new environment variable to opt-in to different values of the --name-hash-version=<n> option in 'git pack-objects'. This allows for extra testing of the feature without repeating all of the test scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add this to the linux-TEST-vars CI build as that test run is already overloaded. The behavior exposed by this test variable is of low risk and should be sufficient to allow manual testing when an issue arises. But this option isn't free. There are a few tests that change behavior with the variable enabled. First, there are a few tests that are very sensitive to certain delta bases being picked. These are both involving the generation of thin bundles and then counting their objects via 'git index-pack --fix-thin' which pulls the delta base into the new packfile. For these tests, disable the option as a decent long-term option. Second, there are some tests that compare the exact output of a 'git pack-objects' process when using bitmaps. The warning that ignores the --name-hash-version=2 and forces version 1 causes these tests to fail. Disable the environment variable to get around this issue. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27pack-objects: add --name-hash-version optionDerrick Stolee1-0/+31
The previous change introduced a new pack_name_hash_v2() function that intends to satisfy much of the hash locality features of the existing pack_name_hash() function while also distinguishing paths with similar final components of their paths. This change adds a new --name-hash-version option for 'git pack-objects' to allow users to select their preferred function version. This use of an integer version allows for future expansion and a direct way to later store a name hash version in the .bitmap format. For now, let's consider how effective this mechanism is when repacking a repository with different name hash versions. Specifically, we will execute 'git pack-objects' the same way a 'git repack -adf' process would, except we include --name-hash-version=<n> for testing. On the Git repository, we do not expect much difference. All path names are short. This is backed by our results: | Stage | Pack Size | Repack Time | |-----------------------|-----------|-------------| | After clone | 260 MB | N/A | | --name-hash-version=1 | 127 MB | 129s | | --name-hash-version=2 | 127 MB | 112s | This example demonstrates how there is some natural overhead coming from the cloned copy because the server is hosting many forks and has not optimized for exactly this set of reachable objects. But the full repack has similar characteristics for both versions. Let's consider some repositories that are hitting too many collisions with version 1. First, let's explore the kinds of paths that are commonly causing these collisions: * "/CHANGELOG.json" is 15 characters, and is created by the beachball [1] tool. Only the final character of the parent directory can differentiate different versions of this file, but also only the two most-significant digits. If that character is a letter, then this is always a collision. Similar issues occur with the similar "/CHANGELOG.md" path, though there is more opportunity for differences In the parent directory. * Localization files frequently have common filenames but differentiates via parent directories. In C#, the name "/strings.resx.lcl" is used for these localization files and they will all collide in name-hash. [1] https://github.com/microsoft/beachball I've come across many other examples where some internal tool uses a common name across multiple directories and is causing Git to repack poorly due to name-hash collisions. One open-source example is the fluentui [2] repo, which uses beachball to generate CHANGELOG.json and CHANGELOG.md files, and these files have very poor delta characteristics when comparing against versions across parent directories. | Stage | Pack Size | Repack Time | |-----------------------|-----------|-------------| | After clone | 694 MB | N/A | | --name-hash-version=1 | 438 MB | 728s | | --name-hash-version=2 | 168 MB | 142s | [2] https://github.com/microsoft/fluentui In this example, we see significant gains in the compressed packfile size as well as the time taken to compute the packfile. Using a collection of repositories that use the beachball tool, I was able to make similar comparisions with dramatic results. While the fluentui repo is public, the others are private so cannot be shared for reproduction. The results are so significant that I find it important to share here: | Repo | --name-hash-version=1 | --name-hash-version=2 | |----------|-----------------------|-----------------------| | fluentui | 440 MB | 161 MB | | Repo B | 6,248 MB | 856 MB | | Repo C | 37,278 MB | 6,755 MB | | Repo D | 131,204 MB | 7,463 MB | Future changes could include making --name-hash-version implied by a config value or even implied by default during a full repack. It is important to point out that the name hash value is stored in the .bitmap file format, so we must force --name-hash-version=1 when bitmaps are being read or written. Later, the bitmap format could be updated to be aware of the name hash version so deltas can be quickly computed across the bitmapped/not-bitmapped boundary. To promote the safety of this parameter, the validate_name_hash_version() method will die() if the given name-hash version is incorrect and will disable newer versions if not yet compatible with other features, such as --write-bitmap-index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'as/show-index-uninitialized-hash'Junio C Hamano1-0/+18
Regression fix for 'show-index' when run outside of a repository. * as/show-index-uninitialized-hash: t5300: add test for 'show-index --object-format' show-index: fix uninitialized hash function
2024-12-04Merge branch 'ps/leakfixes-part-10'Junio C Hamano1-1/+0
Leakfixes. * ps/leakfixes-part-10: (27 commits) t: remove TEST_PASSES_SANITIZE_LEAK annotations test-lib: unconditionally enable leak checking t: remove unneeded !SANITIZE_LEAK prerequisites t: mark some tests as leak free t5601: work around leak sanitizer issue git-compat-util: drop now-unused `UNLEAK()` macro global: drop `UNLEAK()` annotation t/helper: fix leaking commit graph in "read-graph" subcommand builtin/branch: fix leaking sorting options builtin/init-db: fix leaking directory paths builtin/help: fix leaks in `check_git_cmd()` help: fix leaking return value from `help_unknown_cmd()` help: fix leaking `struct cmdnames` help: refactor to not use globals for reading config builtin/sparse-checkout: fix leaking sanitized patterns split-index: fix memory leak in `move_cache_to_base_index()` git: refactor builtin handling to use a `struct strvec` git: refactor alias handling to use a `struct strvec` strvec: introduce new `strvec_splice()` function line-log: fix leak when rewriting commit parents ...
2024-11-21t: remove TEST_PASSES_SANITIZE_LEAK annotationsPatrick Steinhardt1-1/+0
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there is no longer a need to have that variable declared in all of our tests. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20index-pack: teach --promisor to forbid pack nameJonathan Tan1-3/+1
Currently, - Running "index-pack --promisor" outside a repo segfaults. - It may be confusing to a user that running "index-pack --promisor" within a repo may make changes to the repo's object DB, especially since the packs indexed by the index-pack invocation may not even be related to the repo. As discussed in [1] and [2], teaching --promisor to forbid a packfile name solves both these problems. This combination of arguments requires a repo (since we are writing the resulting .pack and .idx to it) and it is clear that the files are related to the repo. Currently, Git uses "index-pack --promisor" only when fetching into a repo, so it could be argued that we should teach "index-pack" a new argument (say, "--fetching-mode") instead of tying --promisor to a generic argument like the packfile name. However, this --promisor feature could conceivably be used whenever we have a packfile that is known to come from the promisor remote (whether obtained through Git's fetch protocol or through other means) so not using a new argument seems reasonable - one could envision a user-made script obtaining a packfile and then running "index-pack --promisor --stdin", for example. In fact, it might be possible to relax the restriction further (say, by also allowing --promisor when indexing a packfile that is in the object DB), but relaxing the restriction is backwards-compatible so we can revisit that later. One thing to watch out for is the possibility of a future Git feature that indexes a pack in the context of a repo, but does not necessarily write the resulting pack to it (and does not necessarily desire to make any changes to the object DB). One such feature would be fetch quarantine, which might need the repo context in order to detect hash collisions, but would also need to ensure that the object DB is undisturbed in case the fetch fails for whatever reason, even if the reason occurs only after the indexing is complete. It may not be obvious to the implementer of such a feature that "index-pack" could sometimes write packs other than the indexed pack to the object DB, but there are already other ways that "fetch" could write to the object DB (in particular, packfile URIs and bundle URIs), so hopefully the implementation of this future feature would already include a test that the object DB be undisturbed. This change requires the change to t5300 by 1f52cdfacb (index-pack: document and test the --promisor option, 2022-03-09) to be undone. (--promisor is already tested indirectly, so we don't need the explicit test here any more.) [1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/ [2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-11t5300: add test for 'show-index --object-format'Abhijeet Sonar1-0/+14
In 88a09a557c (builtin/show-index: provide options to determine hash algo), the flag --object-format was added to show-index builtin as a way to provide a hash algorithm explicitly. However, we do not have tests in place for that functionality. Add them. Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-11show-index: fix uninitialized hash functionAbhijeet Sonar1-0/+4
In c8aed5e8da (repository: stop setting SHA1 as the default object hash), we got rid of the default hash algorithm for the_repository. Due to this change, it is now the responsibility of the callers to set their own default when this is not present. As stated in the docs, show-index should use SHA1 as the default hash algorithm when run outside a repository. Make sure this promise is met by falling back to SHA1 when the_hash_algo is not present (i.e. when the command is run outside a repository). Also add a test that verifies this behavior. Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-02t5300: move --window clamp test next to unclampedJonathan Tan1-5/+5
A subsequent commit will change the behavior of "git index-pack --promisor", which is exercised in "build pack index for an existing pack", causing the unclamped and clamped versions of the --window test to exhibit different behavior. Move the clamp test closer to the unclamped test that it references. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04builtin/index-pack: fix segfaults when running outside of a repoPatrick Steinhardt1-0/+39
It was reported that git-verify-pack(1) has started to crash with Git v2.46.0 when run outside of a repository. This is another fallout from c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07), where we have stopped setting the default hash algorithm for `the_repository`. Consequently, code that relies on `the_hash_algo` will now crash when it hasn't explicitly been initialized, which may be the case when running outside of a Git repository. The crash is not in git-verify-pack(1) but instead in git-index-pack(1), which gets called by the former. Ideally, both of these programs should be able to identify the hash algorithm used by the packfile and index without having to rely on external information. But unfortunately, the format for neither of them is completely self-describing, so it is not possible to derive that information. This is a design issue that we should address by introducing a new packfile version that encodes its object hash. For now though the more important fix is to not make either of these programs crash anymore, which we do by falling back to SHA1 when the object hash is unconfigured. This pessimizes reading packfiles which use a different hash than SHA1, but restores previous behaviour. Reported-by: Ilya K <me@0upti.me> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`Patrick Steinhardt1-2/+2
In `wanted_peer_refs()` we first create a copy of the "HEAD" ref. This copy may not actually be passed back to the caller, but is not getting freed in this case. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-15t5300: fix test_with_bad_commit()John Cai1-1/+1
0f8edf7317 (index-pack: --fsck-objects to take an optional argument for fsck msgs, 2024-02-01) added a test function test_with_bad_commit() that contained two bugs. test_expect_fail was used instead of test_must_fail, and a && was not included at the end of the line. Fix these two issues in the test. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01index-pack: --fsck-objects to take an optional argument for fsck msgsJohn Cai1-5/+24
git-index-pack has a --strict option that can take an optional argument to provide a list of fsck issues to change their severity. --fsck-objects does not have such a utility, which would be useful if one would like to be more lenient or strict on data integrity in a repository. Like --strict, allow --fsck-objects to also take a list of fsck msgs to change the severity. Remove the "For internal use only" note for --fsck-objects, and document the option. This won't often be used by the normal end user, but it turns out it is useful for Git forges like GitLab. Reviewed-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01index-pack: test and document --strict=<msg-id>=<severity>...John Cai1-0/+22
5d477a334a (fsck (receive-pack): allow demoting errors to warnings, 2015-06-22) allowed a list of fsck msg to downgrade to be passed to --strict. However this is a hidden argument that was not documented nor tested. Though it is true that most users would not call this option directly, (nor use index-pack for that matter) it is still useful to document and test this feature. Reviewed-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-02tests: teach callers of test_i18ngrep to use test_grepJunio C Hamano1-2/+2
They are equivalents and the former still exists, so as long as the only change this commit makes are to rewrite test_i18ngrep to test_grep, there won't be any new bug, even if there still are callers of test_i18ngrep remaining in the tree, or when merged to other topics that add new uses of test_i18ngrep. This patch was produced more or less with git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' | xargs perl -p -i -e 's/test_i18ngrep /test_grep /' and a good way to sanity check the result yourself is to run the above in a checkout of c4603c1c (test framework: further deprecate test_i18ngrep, 2023-10-31) and compare the resulting working tree contents with the result of applying this patch to the same commit. You'll see that test_i18ngrep in a few t/lib-*.sh files corrected, in addition to the manual reproduction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-19t5300-pack-object: modernize test formatJohn Cai1-92/+92
Some tests still use the old format with four spaces indentation. Standardize the tests to the new format with tab indentation. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14pack-objects: split out `--stdin-packs` tests into separate filePatrick Steinhardt1-135/+0
The test suite for git-pack-objects(1) is quite huge, and we're about to add more tests that relate to the `--stdin-packs` option. Split out all tests related to this option into a standalone file so that it becomes easier to test the feature in isolation. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03Merge branch 'ns/batch-fsync'Junio C Hamano1-14/+27
Introduce a filesystem-dependent mechanism to optimize the way the bits for many loose object files are ensured to hit the disk platter. * ns/batch-fsync: core.fsyncmethod: performance tests for batch mode t/perf: add iteration setup mechanism to perf-lib core.fsyncmethod: tests for batch mode test-lib-functions: add parsing helpers for ls-files and ls-tree core.fsync: use batch mode and sync loose objects by default on Windows unpack-objects: use the bulk-checkin infrastructure update-index: use the bulk-checkin infrastructure builtin/add: add ODB transaction around add_files_to_cache cache-tree: use ODB transaction around writing a tree core.fsyncmethod: batched disk flushes for loose-objects bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-04-06core.fsyncmethod: tests for batch modeNeeraj Singh1-14/+27
Add test cases to exercise batch mode for: * 'git add' * 'git stash' * 'git update-index' * 'git unpack-objects' These tests ensure that the added data winds up in the object database. In this change we introduce a new test helper lib-unique-files.sh. The goal of this library is to create a tree of files that have different oids from any other files that may have been created in the current test repo. This helps us avoid missing validation of an object being added due to it already being in the repo. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09index-pack: document and test the --promisor optionDerrick Stolee1-1/+3
The --promisor option of 'git index-pack' was created in 88e2f9e (introduce fetch-object: fetch one promisor object, 2017-12-05) but was untested. It is currently unused within the Git codebase, but that will change in an upcoming change to 'git bundle unbundle' when there is a filter capability. For now, add documentation about the option and add a test to ensure it is working as expected. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13t5000-t5999: detect and signal failure within loopEric Sunshine1-3/+3
Failures within `for` and `while` loops can go unnoticed if not detected and signaled manually since the loop itself does not abort when a contained command fails, nor will a failure necessarily be detected when the loop finishes since the loop returns the exit code of the last command it ran on the final iteration, which may not be the command which failed. Therefore, detect and signal failures manually within loops using the idiom `|| return 1` (or `|| exit 1` within subshells). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13tests: use test_write_lines() to generate line-oriented outputEric Sunshine1-10/+2
Take advantage of test_write_lines() to generate line-oriented output rather than using for-loops or a series of `echo` commands. Not only is test_write_lines() a natural fit for such a task, but there is less opportunity for a broken &&-chain. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-09pack-objects: fix segfault in --stdin-packs optionÆvar Arnfjörð Bjarmason1-0/+19
Fix a segfault in the --stdin-packs option added in 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22). The read_packs_list_from_stdin() function didn't check that the lines it was reading were valid packs, and thus when doing the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field. The "util" field is used to associate the names of included/excluded packs with the packed_git structs they correspond to. The logic error was in assuming that we could iterate all packs and annotate the excluded and included packs we got, as opposed to checking the lines we got on stdin. There was a check for excluded packs, but included packs were simply assumed to be valid. As noted in the test we'll not report the first bad line, but whatever line sorted first according to the string-list.c API. In this case I think that's fine. There was further discussion of alternate approaches in [1]. Even though we're being lazy let's assert the line we do expect to get in the test, since whoever changes this code in the future might miss this case, and would want to update the test and comments. 1. http://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28pack-objects tests: cover blindspots in stdin handlingÆvar Arnfjörð Bjarmason1-0/+85
Cover blindspots in the testing of stdin handling, including the "!len" condition added in b5d97e6b0a0 (pack-objects: run rev-list equivalent internally., 2006-09-04). The codepath taken with --revs and read_object_list_from_stdin() acts differently in some of these common cases, let's test for those. The "--stdin --revs" test being added here stresses the combination of --stdin-packs and the revision.c --stdin argument, some of this was covered in a test added in 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22), but let's make sure that GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS=true keeps erroring out about --stdin, and it isn't picked up by the revision.c API's handling of that option. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-03pack-objects: clamp negative window size to 0Jeff King1-0/+5
A negative window size makes no sense, and the code in find_deltas() is not prepared to handle it. If you pass "-1", for example, we end up generate a 0-length array of "struct unpacked", but our loop assumes it has at least one entry in it (and we end up reading garbage memory). We could complain to the user about this, but it's more forgiving to just clamp it to 0, which means "do not find any deltas at all". The 0-case is already tested earlier in the script, so we'll make sure this does the same thing. Reported-by: Yiyuan guo <yguoaz@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-03t5300: check that we produced expected number of deltasJeff King1-3/+20
We pack a set of objects both with and without --window=0, assuming that the 0-length window will cause us not to produce any deltas. Let's confirm that this is the case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-03t5300: modernize basic testsJeff King1-158/+85
The first set of tests in t5300 goes back to 2005, and doesn't use some of our customary style and tools these days. In preparation for touching them, let's modernize a few things: - titles go on the line with test_expect_success, with a hanging open-quote to start the test body - test bodies should be indented with tabs - opening braces for shell blocks in &&-chains go on their own line - no space between redirect operators and files (">foo", not "> foo") - avoid doing work outside of test blocks; in this case, we can stick the setup of ".git2" into the appropriate blocks - avoid modifying and then cleaning up the environment or current directory by using subshells and "git -C" - this test does a curious thing when testing the unpacking: it sets GIT_OBJECT_DIRECTORY, and then does a "git init" in the _original_ directory, creating a weird mixed situation. Instead, it's much simpler to just "git init --bare" a new repository to unpack into, and check the results there. I renamed this "git2" instead of ".git2" to make it more clear it's a separate repo. - we can observe that the bodies of the no-delta, ref_delta, and ofs_delta cases are all virtually identical except for the pack creation, and factor out shared helper functions. I collapsed "do the unpack" and "check the results of the unpack" into a single test, since that makes the expected lifetime of the "git2" temporary directory more clear (that also lets us use test_when_finished to clean it up). This does make the "-v" output slightly less useful, but the improvement in reading the actual test code makes it worth it. - I dropped the "pwd" calls from some tests. These don't do anything functional, and I suspect may have been an aid for debugging when the script was more cavalier about leaving the working directory changed between tests. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-24Merge branch 'jk/fail-prereq-testfix'Junio C Hamano1-2/+4
GIT_TEST_FAIL_PREREQS is a mechanism to skip test pieces with prerequisites to catch broken tests that depend on the side effects of optional pieces, but did not work at all when negative prerequisites were involved. * jk/fail-prereq-testfix: t: annotate !PTHREADS tests with !FAIL_PREREQS
2021-03-24Merge branch 'tb/geometric-repack'Junio C Hamano1-0/+135
"git repack" so far has been only capable of repacking everything under the sun into a single pack (or split by size). A cleverer strategy to reduce the cost of repacking a repository has been introduced. * tb/geometric-repack: builtin/pack-objects.c: ignore missing links with --stdin-packs builtin/repack.c: reword comment around pack-objects flags builtin/repack.c: be more conservative with unsigned overflows builtin/repack.c: assign pack split later t7703: test --geometric repack with loose objects builtin/repack.c: do not repack single packs with --geometric builtin/repack.c: add '--geometric' option packfile: add kept-pack cache for find_kept_pack_entry() builtin/pack-objects.c: rewrite honor-pack-keep logic p5303: measure time to repack with keep p5303: add missing &&-chains builtin/pack-objects.c: add '--stdin-packs' option revision: learn '--no-kept-objects' packfile: introduce 'find_kept_pack_entry()'
2021-03-19builtin/pack-objects.c: ignore missing links with --stdin-packsTaylor Blau1-0/+38
When 'git pack-objects --stdin-packs' encounters a commit in a pack, it marks it as a starting point of a best-effort reachability traversal that is used to populate the name-hash of the objects listed in the given packs. The traversal expects that it should be able to walk the ancestors of all commits in a pack without issue. Ordinarily this is the case, but it is possible to having missing parents from an unreachable part of the repository. In that case, we'd consider any missing objects in the unreachable portion of the graph to be junk. This should be handled gracefully: since the traversal is best-effort (i.e., we don't strictly need to fill in all of the name-hash fields), we should simply ignore any missing links. This patch does that (by setting the 'ignore_missing_links' bit on the rev_info struct), and ensures we don't regress in the future by adding a test which demonstrates this case. It is a little over-eager, since it will also ignore missing links in reachable parts of the packs (which would indicate a corrupted repository), but '--stdin-packs' is explicitly *not* about reachability. So this step isn't making anything worse for a repository which contains packs missing reachable objects (since we never drop objects with '--stdin-packs'). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18t: annotate !PTHREADS tests with !FAIL_PREREQSJeff King1-2/+4
Some tests in t5300 and t7810 expect us to complain about a "--threads" argument when Git is compiled without pthread support. Running these under GIT_TEST_FAIL_PREREQS produces a confusing failure: we pretend to the tests that there is no pthread support, so they expect the warning, but of course the actual build is perfectly happy to respect the --threads argument. We never noticed before the recent a926c4b904 (tests: remove most uses of C_LOCALE_OUTPUT, 2021-02-11), because the tests also were marked as requiring the C_LOCALE_OUTPUT prerequisite. Which means they'd never have run in FAIL_PREREQS mode, since it would always pretend that the locale prereq was not satisfied. These tests can't possibly work in this mode; it is a mismatch between what the tests expect and what the build was told to do. So let's just mark them to be skipped, using the special prereq introduced by dfe1a17df9 (tests: add a special setup where prerequisites fail, 2019-05-13). Reported-by: Son Luong Ngoc <sluongng@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22builtin/pack-objects.c: add '--stdin-packs' optionTaylor Blau1-0/+97
In an upcoming commit, 'git repack' will want to create a pack comprised of all of the objects in some packs (the included packs) excluding any objects in some other packs (the excluded packs). This caller could iterate those packs themselves and feed the objects it finds to 'git pack-objects' directly over stdin, but this approach has a few downsides: - It requires every caller that wants to drive 'git pack-objects' in this way to implement pack iteration themselves. This forces the caller to think about details like what order objects are fed to pack-objects, which callers would likely rather not do. - If the set of objects in included packs is large, it requires sending a lot of data over a pipe, which is inefficient. - The caller is forced to keep track of the excluded objects, too, and make sure that it doesn't send any objects that appear in both included and excluded packs. But the biggest downside is the lack of a reachability traversal. Because the caller passes in a list of objects directly, those objects don't get a namehash assigned to them, which can have a negative impact on the delta selection process, causing 'git pack-objects' to fail to find good deltas even when they exist. The caller could formulate a reachability traversal themselves, but the only way to drive 'git pack-objects' in this way is to do a full traversal, and then remove objects in the excluded packs after the traversal is complete. This can be detrimental to callers who care about performance, especially in repositories with many objects. Introduce 'git pack-objects --stdin-packs' which remedies these four concerns. 'git pack-objects --stdin-packs' expects a list of pack names on stdin, where 'pack-xyz.pack' denotes that pack as included, and '^pack-xyz.pack' denotes it as excluded. The resulting pack includes all objects that are present in at least one included pack, and aren't present in any excluded pack. To address the delta selection problem, 'git pack-objects --stdin-packs' works as follows. First, it assembles a list of objects that it is going to pack, as above. Then, a reachability traversal is started, whose tips are any commits mentioned in included packs. Upon visiting an object, we find its corresponding object_entry in the to_pack list, and set its namehash parameter appropriately. To avoid the traversal visiting more objects than it needs to, the traversal is halted upon encountering an object which can be found in an excluded pack (by marking the excluded packs as kept in-core, and passing --no-kept-objects=in-core to the revision machinery). This can cause the traversal to halt early, for example if an object in an included pack is an ancestor of ones in excluded packs. But stopping early is OK, since filling in the namehash fields of objects in the to_pack list is only additive (i.e., having it helps the delta selection process, but leaving it blank doesn't impact the correctness of the resulting pack). Even still, it is unlikely that this hurts us much in practice, since the 'git repack --geometric' caller (which is introduced in a later commit) marks small packs as included, and large ones as excluded. During ordinary use, the small packs usually represent pushes after a large repack, and so are unlikely to be ancestors of objects that already exist in the repository. (I found it convenient while developing this patch to have 'git pack-objects' report the number of objects which were visited and got their namehash fields filled in during traversal. This is also included in the below patch via trace2 data lines). Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10tests: remove most uses of C_LOCALE_OUTPUTÆvar Arnfjörð Bjarmason1-2/+2
As a follow-up to d162b25f956 (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20) remove those uses of the now always true C_LOCALE_OUTPUT prerequisite from those tests which declare it as an argument to test_expect_{success,failure}. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18promisor-remote: lazy-fetch objects in subprocessJonathan Tan1-1/+1
Teach Git to lazy-fetch missing objects in a subprocess instead of doing it in-process. This allows any fatal errors that occur during the fetch to be isolated and converted into an error return value, instead of causing the current command being run to terminate. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11Merge branch 'bc/sha-256-part-3'Junio C Hamano1-2/+1
The final leg of SHA-256 transition. * bc/sha-256-part-3: (39 commits) t: remove test_oid_init in tests docs: add documentation for extensions.objectFormat ci: run tests with SHA-256 t: make SHA1 prerequisite depend on default hash t: allow testing different hash algorithms via environment t: add test_oid option to select hash algorithm repository: enable SHA-256 support by default setup: add support for reading extensions.objectformat bundle: add new version for use with SHA-256 builtin/verify-pack: implement an --object-format option http-fetch: set up git directory before parsing pack hashes t0410: mark test with SHA1 prerequisite t5308: make test work with SHA-256 t9700: make hash size independent t9500: ensure that algorithm info is preserved in config t9350: make hash size independent t9301: make hash size independent t9300: use $ZERO_OID instead of hard-coded object ID t9300: abstract away SHA-1-specific constants t8011: make hash size independent ...
2020-07-30t: remove test_oid_init in testsbrian m. carlson1-2/+1
Now that we call test_oid_init in the setup for all test scripts, there's no point in calling it individually. Remove all of the places where we've done so to help keep tests tidy. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-21pack-objects: prefetch objects to be packedJonathan Tan1-0/+36
When an object to be packed is noticed to be missing, prefetch all to-be-packed objects in one batch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19t5300: pass --object-format to git index-packbrian m. carlson1-4/+5
git index-pack by default reads the repository to determine the object format. However, when outside of a repository, it's necessary to specify the hash algorithm in use so that the pack can be properly indexed. Add an --object-format argument when invoking git index-pack outside of a repository. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31pack-objects tests: don't leave test .git corrupt at endÆvar Arnfjörð Bjarmason1-17/+20
Change the pack-objects tests to not leave their .git directory corrupt and the end. In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment was added warning against adding any subsequent tests, but since 4614043c8f ("index-pack: use streaming interface for collision test on large blobs", 2012-05-24) the comment has drifted away from the code, mentioning two test, when we actually have three. Instead of having this warning let's just create a new .git directory specifically for these tests. As an aside, it would be interesting to instrument the test suite to run a "git fsck" at the very end (in "test_done"). That would have errored before this change, and may find other issues #leftoverbits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31pack-objects test: modernize styleÆvar Arnfjörð Bjarmason1-15/+15
Modernize the quoting and indentation style of two tests added in 8685da4256 ("don't ever allow SHA1 collisions to exist by fetching a pack", 2007-03-20), and of a subsequent one added in 4614043c8f ("index-pack: use streaming interface for collision test on large blobs", 2012-05-24) which had copied the style of the first two. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16t5000-t5999: fix broken &&-chainsEric Sunshine1-1/+1
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-13Merge branch 'jk/index-pack-maint'Junio C Hamano1-0/+6
"index-pack --strict" has been taught to make sure that it runs the final object integrity checks after making the freshly indexed packfile available to itself. * jk/index-pack-maint: index-pack: correct install_packed_git() args index-pack: handle --strict checks of non-repo packs prepare_commit_graft: treat non-repository as a noop