aboutsummaryrefslogtreecommitdiffstats
path: root/builtin (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-02-21builtin/refs: add '--no-reflog' flag to drop reflogsKarthik Nayak1-0/+3
The "git refs migrate" subcommand converts the backend used for ref storage. It always migrates reflog data as well as refs. Introduce an option to exclude reflogs from migration, allowing them to be discarded when they are unnecessary. This is particularly useful in server-side repositories, where reflogs are typically not expected. However, some repositories may still have them due to historical reasons, such as bugs, misconfigurations, or administrative decisions to enable reflogs for debugging. In such repositories, it would be optimal to drop reflogs during the migration. To address this, introduce the '--no-reflog' flag, which prevents reflog migration. When this flag is used, reflogs from the original reference backend are migrated. Since only the new reference backend remains in the repository, all previous reflogs are permanently discarded. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Merge branch 'da/difftool-sans-the-repository'Junio C Hamano1-39/+56
"git difftool" code clean-up. * da/difftool-sans-the-repository: difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE difftool: eliminate use of the_repository difftool: eliminate use of global variables
2025-02-18Merge branch 'jt/rev-list-missing-print-info'Junio C Hamano1-17/+89
"git rev-list --missing=" learned to accept "print-info" that gives known details expected of the missing objects, like path and type. * jt/rev-list-missing-print-info: rev-list: extend print-info to print missing object type rev-list: add print-info action to print missing object path
2025-02-18Merge branch 'ds/backfill'Junio C Hamano1-0/+147
Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate
2025-02-18merge-tree: only use basic merge configPhillip Wood1-1/+1
Commit 9c93ba4d0ae (merge-recursive: honor diff.algorithm, 2024-07-13) replaced init_merge_options() with init_basic_merge_config() for use in plumbing commands and init_ui_merge_config() for use in porcelain commands. As "git merge-tree" is a plumbing command it should call init_basic_merge_config() rather than init_ui_merge_config(). The merge ort machinery ignores "diff.algorithm" so the behavior is unchanged by this commit but it future proofs us against any future changes to init_ui_merge_config(). Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18merge-tree: remove redundant codePhillip Wood1-5/+2
real_merge() only ever returns "0" or "1" as it dies if the merge status is less than zero. Therefore the check for "result < 0" is redundant and the result variable is not needed. The return value of real_merge() is ignored because exit status of "git merge-tree --stdin" is "0" for both successful and conflicted merges (the status of each merge is written to stdout). The return type of real_merge() is not changed as it is used for the program's exit status when "--stdin" is not given. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18merge-tree --stdin: flush stdout to avoid deadlockPhillip Wood1-0/+2
If a process tries to read the output from "git merge-tree --stdin" before it closes merge-tree's stdin then it deadlocks. This happens because merge-tree does not flush its output before trying to read another line of input and means that it is not possible to cherry-pick a sequence of commits using "git merge-tree --stdin". Fix this by calling maybe_flush_or_die() before trying to read the next line of input. Flushing the output after each merge does not seem to affect the performance, any difference is lost in the noise even after increasing the number of runs. $ git rev-list --merges --parents -n100 origin/master | sed 's/^[^ ]* //' >/tmp/merges $ hyperfine -L flush 0,1 --warmup 1 --runs 30 \ 'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges' Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges Time (mean ± σ): 546.6 ms ± 11.7 ms [User: 503.2 ms, System: 40.9 ms] Range (min … max): 535.9 ms … 567.7 ms 30 runs Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges Time (mean ± σ): 546.9 ms ± 12.0 ms [User: 505.9 ms, System: 38.9 ms] Range (min … max): 529.8 ms … 570.0 ms 30 runs Summary 'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran 1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges' Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18version: extend get_uname_info() to hide system detailsUsman Akinyemi1-1/+1
Currently, get_uname_info() function provides the full OS information. In a following commit, we will need it to provide only the OS name. Let's extend it to accept a "full" flag that makes it switch between providing full OS information and providing only the OS name. We may need to refactor this function in the future if an `osVersion.format` is added. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18version: refactor get_uname_info()Usman Akinyemi1-11/+2
Some code from "builtin/bugreport.c" uses uname(2) to get system information. Let's refactor this code into a new get_uname_info() function, so that we can reuse it in a following commit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14Merge branch 'bf/fetch-set-head-fix'Junio C Hamano1-13/+12
Fetching into a bare repository incorrectly assumed it always used a mirror layout when deciding to update remote-tracking HEAD, which has been corrected. * bf/fetch-set-head-fix: fetch set_head: fix non-mirror remotes in bare repositories fetch set_head: refactor to use remote directly
2025-02-14Merge branch 'tc/clone-single-revision'Junio C Hamano2-157/+200
"git clone" learned to make a shallow clone for a single commit that is not necessarily be at the tip of any branch. * tc/clone-single-revision: builtin/clone: teach git-clone(1) the --revision= option parse-options: introduce die_for_incompatible_opt2() clone: introduce struct clone_opts in builtin/clone.c clone: add tags refspec earlier to fetch refspec clone: refactor wanted_peer_refs() clone: make it possible to specify --tags clone: cut down on global variables in clone.c
2025-02-12Merge branch 'ms/refspec-cleanup'Junio C Hamano2-2/+2
Code clean-up. cf. <Z6G-toOJjMmK8iJG@pks.im> * ms/refspec-cleanup: refspec: relocate apply_refspecs and related funtions refspec: relocate matching related functions remote: rename query_refspecs functions refspec: relocate refname_matches_negative_refspec_item remote: rename function omit_name_by_refspec
2025-02-12Merge branch 'zh/gc-expire-to'Junio C Hamano1-2/+7
"git gc" learned the "--expire-to" option and passes it down to underlying "git repack". * zh/gc-expire-to: gc: add `--expire-to` option
2025-02-12Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo'Junio C Hamano1-1/+4
"git repack --keep-unreachable" to send unreachable objects to the main pack "git repack -ad" produces did not work when there is no existing packs, which has been corrected. * ps/repack-keep-unreachable-in-unpacked-repo: builtin/repack: fix `--keep-unreachable` when there are no packs
2025-02-12Merge branch 'ds/name-hash-tweaks'Junio C Hamano2-6/+66
"git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version
2025-02-10builtin/update-server-info: remove the_repository global variableUsman Akinyemi1-4/+4
Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/update-server-info.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_update_server_info()` function with `repo` set to NULL and then early in the function, "parse_options()" call will give the options help and exit, without having to consult much of the configuration file. So it is safe to omit reading the config when `repo` argument the caller gave us is NULL. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10Merge branch 'ps/hash-cleanup'Junio C Hamano5-41/+39
Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure
2025-02-07path: drop `git_common_path()` in favor of `repo_common_path()`Patrick Steinhardt1-4/+12
Remove `git_common_path()` in favor of the `repo_common_path()` family of functions, which makes the implicit dependency on `the_repository` go away. Note that `git_common_path()` used to return a string allocated via `get_pathname()`, which uses a rotating set of statically allocated buffers. Consequently, callers didn't have to free the returned string. The same isn't true for `repo_common_path()`, so we also have to add logic to free the returned strings. This refactoring also allows us to remove `repo_common_pathv()` from the public interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07worktree: return allocated string from `get_worktree_git_dir()`Patrick Steinhardt3-5/+17
The `get_worktree_git_dir()` function returns a string constant that does not need to be free'd by the caller. This string is computed for three different cases: - If we don't have a worktree we return a path into the Git directory. The returned string is owned by `the_repository`, so there is no need for the caller to free it. - If we have a worktree, but no worktree ID then the caller requests the main worktree. In this case we return a path into the common directory, which again is owned by `the_repository` and thus does not need to be free'd. - In the third case, where we have an actual worktree, we compute the path relative to "$GIT_COMMON_DIR/worktrees/". This string does not need to be released either, even though `git_common_path()` ends up allocating memory. But this doesn't result in a memory leak either because we write into a buffer returned by `get_pathname()`, which returns one out of four static buffers. We're about to drop `git_common_path()` in favor of `repo_common_path()`, which doesn't use the same mechanism but instead returns an allocated string owned by the caller. While we could adapt `get_worktree_git_dir()` to also use `get_pathname()` and print the derived common path into that buffer, the whole schema feels a lot like premature optimization in this context. There are some callsites where we call `get_worktree_git_dir()` in a loop that iterates through all worktrees. But none of these loops seem to be even remotely in the hot path, so saving a single allocation there does not feel worth it. Refactor the function to instead consistently return an allocated path so that we can start using `repo_common_path()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07path: drop `git_path_buf()` in favor of `repo_git_path_replace()`Patrick Steinhardt1-1/+1
Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The latter does essentially the same, with the only exception that it does not rely on `the_repository` but takes the repo as separate parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07path: drop `git_pathdup()` in favor of `repo_git_path()`Patrick Steinhardt10-16/+16
Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does essentially the same, with the only exception that it does not rely on `the_repository` but takes the repo as separate parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07path: refactor `repo_submodule_path()` family of functionsPatrick Steinhardt1-1/+1
As explained in an earlier commit, we're refactoring path-related functions to provide a consistent interface for computing paths into the commondir, gitdir and worktree. Refactor the "submodule" family of functions accordingly. Note that in contrast to the other `repo_*_path()` families, we have to pass in the repository as a non-constant pointer. This is because we end up calling `repo_read_gitmodules()` deep down in the callstack, which may end up modifying the repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07submodule: refactor `submodule_to_gitdir()` to accept a repoPatrick Steinhardt1-1/+1
The `submodule_to_gitdir()` function implicitly uses `the_repository` to resolve submodule paths. Refactor the function to instead accept a repo as parameter to remove the dependency on global state. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06difftool: eliminate use of USE_THE_REPOSITORY_VARIABLEDavid Aguilar1-2/+0
Remove the USE_THE_REPOSITORY_VARIABLE #define now that all state is passed to each function from callers. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06difftool: eliminate use of the_repositoryDavid Aguilar1-25/+29
Make callers pass a repository struct into each function instead of relying on the global the_repository variable. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06difftool: eliminate use of global variablesDavid Aguilar1-18/+33
Move difftool's global variables into a difftools_option struct in preparation for removal of USE_THE_REPOSITORY_VARIABLE. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06builtin/clone: teach git-clone(1) the --revision= optionToon Claes1-11/+46
The git-clone(1) command has the option `--branch` that allows the user to select the branch they want HEAD to point to. In a non-bare repository this also checks out that branch. Option `--branch` also accepts a tag. When a tag name is provided, the commit this tag points to is checked out and HEAD is detached. Thus `--branch` can be used to clone a repository and check out a ref kept under `refs/heads` or `refs/tags`. But some other refs might be in use as well. For example Git forges might use refs like `refs/pull/<id>` and `refs/merge-requests/<id>` to track pull/merge requests. These refs cannot be selected upon git-clone(1). Add option `--revision` to git-clone(1). This option accepts a fully qualified reference, or a hexadecimal commit ID. This enables the user to clone and check out any revision they want. `--revision` can be used in conjunction with `--depth` to do a minimal clone that only contains the blob and tree for a single revision. This can be useful for automated tests running in CI systems. Using option `--branch` and `--single-branch` together is a similar scenario, but serves a different purpose. Using these two options, a singlet remote tracking branch is created and the fetch refspec is set up so git-fetch(1) will receive updates on that branch from the remote. This allows the user work on that single branch. Option `--revision` on contrary detaches HEAD, creates no tracking branches, and writes no fetch refspec. Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im> [jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06parse-options: introduce die_for_incompatible_opt2()Toon Claes1-3/+4
The functions die_for_incompatible_opt3() and die_for_incompatible_opt4() already exist to die whenever a user specifies three or four options respectively that are not compatible. Introduce die_for_incompatible_opt2() which dies when two options that are incompatible are set. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: introduce struct clone_opts in builtin/clone.cToon Claes1-15/+29
There is a lot of state stored in global variables in builtin/clone.c. In the long run we'd like to remove many of those. Introduce `struct clone_opts` in this file. This struct will be used to contain all details needed to perform the clone. The struct object can be thrown around to all the functions that need these details. The first field we're adding is `wants_head`. In some scenarios (specifically when both `--single-branch` and `--branch` are given) we are not interested in `HEAD` on the remote. The field `wants_head` in `struct clone_opts` will hold this information. We could have put `option_branch` and `option_single_branch` into that struct instead, but in a following commit we'll be using `wants_head` as well. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: add tags refspec earlier to fetch refspecToon Claes1-16/+11
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs from the `remote->fetch` refspec into `ref_prefixes` of `transport_ls_refs_options`. Afterwards we add the tags prefix `refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we process refs using both `remote->fetch` and `TAG_REFSPEC`. Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before calling refspec_ref_prefixes(). To be able to do this, we set `option_tags` to 0 when --mirror is given. This is because --mirror mirrors (hence the name) all the refs, including tags and they do not need to be treated separately. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: refactor wanted_peer_refs()Toon Claes1-24/+15
The function wanted_peer_refs() is used to map the refs returned by the server to refs we will save in our clone. Over time this function grown to be very complex. Refactor it. Previously, there was a separate code path for when `option_single_branch` was set. It resulted in duplicated code and deeper nested conditions. After this refactor the code path for when `option_single_branch` is truthy modifies `refs` and then falls through to the common code path. This approach relies on the `refspec` being set correctly and thus only mapping refs that are relevant. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: make it possible to specify --tagsToon Claes1-7/+7
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option to clone without tags, 2017-04-26). At the time there was no need to support --tags as well, although there was some conversation about it[1]. To simplify the code and to prepare for future commits, invert the flag internally. Functionally there is no change, because the flag is default-enabled passing `--tags` has no effect, so there's no need to add tests for this. [1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/ Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: cut down on global variables in clone.cToon Claes1-94/+101
In clone.c the `struct option` which is used to parse the input options for git-clone(1) is a global variable. Due to this, many variables that are used to parse the value into, are also global. Make `builtin_clone_options` a local variable in cmd_clone() and carry along all variables that are only used in that function. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05rev-list: extend print-info to print missing object typeJustin Tobler1-3/+8
Additional information about missing objects found in git-rev-list(1) can be printed by specifying the `print-info` missing action for the `--missing` option. Extend this action to also print missing object type information inferred from its containing object. This token follows the form `type=<type>` and specifies the expected object type of the missing object. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05rev-list: add print-info action to print missing object pathJustin Tobler1-17/+84
Missing objects identified through git-rev-list(1) can be printed by setting the `--missing=print` option. Additional information about the missing object, such as its path and type, may be present in its containing object. Add the `print-info` missing action for the `--missing` option that, when set, prints additional insight about the missing object inferred from its containing object. Each line of output for a missing object is in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs containing additional information are separated from each other by a SP. The value is encoded in a token specific fashion, but SP or LF contained in value are always expected to be represented in such a way that the resulting encoded value does not have either of these two problematic bytes. This format is kept generic so it can be extended in the future to support additional information. For now, only a missing object path info is implemented. It follows the form `path=<path>` and specifies the full path to the object from the top-level tree. A path containing SP or special characters is enclosed in double-quotes in the C style as needed. In a subsequent commit, missing object type info will also be added. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04builtin/repack: fix `--keep-unreachable` when there are no packsPatrick Steinhardt1-1/+4
The "--keep-unreachable" flag is supposed to append any unreachable objects to the newly written pack. This flag is explicitly documented as appending both packed and loose unreachable objects to the new packfile. And while this works alright when repacking with preexisting packfiles, it stops working when the repository does not have any packfiles at all. The root cause are the conditions used to decide whether or not we want to append "--pack-loose-unreachable" to git-pack-objects(1). There are a couple of conditions here: - `has_existing_non_kept_packs()` checks whether there are existing packfiles. This condition makes sense to guard "--keep-pack=", "--unpack-unreachable" and "--keep-unreachable", because all of these flags only make sense in combination with existing packfiles. But it does not make sense to disable `--pack-loose-unreachable` when there aren't any preexisting packfiles, as loose objects can be packed into the new packfile regardless of that. - `delete_redundant` checks whether we want to delete any objects or packs that are about to become redundant. The documentation of `--keep-unreachable` explicitly says that `git repack -ad` needs to be executed for the flag to have an effect. It is not immediately obvious why such redundant objects need to be deleted in order for "--pack-unreachable-objects" to be effective. But as things are working as documented this is nothing we'll change for now. - `pack_everything & PACK_CRUFT` checks that we're not creating a cruft pack. This condition makes sense in the context of "--pack-loose-unreachable", as unreachable objects would end up in the cruft pack anyway. So while the second and third condition are sensible, it does not make any sense to condition `--pack-loose-unreachable` on the existence of packfiles. Fix the bug by splitting out the "--pack-loose-unreachable" and only making it depend on the second and third condition. Like this, loose unreachable objects will be packed regardless of any preexisting packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04remote: rename query_refspecs functionsMeet Soni1-1/+1
Rename functions related to handling refspecs in preparation for their move from `remote.c` to `refspec.c`. Update their names to better reflect their intent: - `query_refspecs()` -> `refspec_find_match()` for clarity, as it finds a single matching refspec. - `query_refspecs_multiple()` -> `refspec_find_all_matches()` to better reflect that it collects all matching refspecs instead of returning just the first match. - `query_matches_negative_refspec()` -> `refspec_find_negative_match()` for consistency with the updated naming convention, even though this static function didn't strictly require renaming. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04remote: rename function omit_name_by_refspecMeet Soni1-1/+1
Rename the function `omit_name_by_refspec()` to `refname_matches_negative_refspec_item()` to provide clearer intent. The previous function name was vague and did not accurately describe its purpose. By using `refname_matches_negative_refspec_item`, make the function's purpose more intuitive, clarifying that it checks if a reference name matches any negative refspec. Rename function parameters for consistency with existing naming conventions. Use `refname` instead of `name` to align with terminology in `refs.h`. Remove the redundant doc comment since the function name is now self-explanatory. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: assume --sparse when sparse-checkout is enabledDerrick Stolee1-0/+7
The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --sparse optionDerrick Stolee1-1/+14
One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 3 | 110 MB | | | 10K | 12 | 192 MB | 17.2s | | 15K | 9 | 192 MB | 15.5s | | 20K | 8 | 192 MB | 15.5s | | 25K | 7 | 192 MB | 14.7s | This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|------| | (Initial clone) | 2 | 1,876 MB | | | 10K | 11 | 2,187 MB | 46s | | 25K | 7 | 2,188 MB | 43s | | 50K | 5 | 2,194 MB | 44s | | 100K | 4 | 2,194 MB | 48s | This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --min-batch-size=<n> optionDerrick Stolee1-1/+3
Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. For now, we let the path-walk API batches determine the boundaries. To get a feeling for the value of specifying the --min-batch-size parameter, I tested a number of open source repositories available on GitHub. The procedure was generally: 1. git clone --filter=blob:none <url> 2. git backfill Checking the number of packfiles and the size of the .git/objects/pack directory helps to identify the effects of different batch sizes. For the Git repository, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 2 | 119 MB | | | 25K | 8 | 290 MB | 24s | | 50K | 5 | 290 MB | 24s | | 100K | 4 | 290 MB | 29s | Other than the packfile counts decreasing as we need fewer batches, the size and time required is not changing much for this small example. For the nodejs/node repository, we see these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|--------| | (Initial clone) | 2 | 330 MB | | | 25K | 19 | 1,222 MB | 1m 22s | | 50K | 11 | 1,221 MB | 1m 24s | | 100K | 7 | 1,223 MB | 1m 40s | | 250K | 4 | 1,224 MB | 2m 23s | | 500K | 3 | 1,216 MB | 4m 38s | Here, we don't have much difference in the size of the repo, though the 500K batch size results in a few MB gained. That comes at a cost of a much longer time. This extra time is due to server-side delta compression happening as the on-disk deltas don't appear to be reusable all the time. But for smaller batch sizes, the server is able to find reasonable deltas partly because we are asking for objects that appear in the same region of the directory tree and include all versions of a file at a specific path. To contrast this example, I tested the microsoft/fluentui repo, which has been known to have inefficient packing due to name hash collisions. These results are found before GitHub had the opportunity to repack the server with more advanced name hash versions: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|--------| | (Initial clone) | 2 | 105 MB | | | 5K | 53 | 348 MB | 2m 26s | | 10K | 28 | 365 MB | 2m 22s | | 15K | 19 | 407 MB | 2m 21s | | 20K | 15 | 393 MB | 2m 28s | | 25K | 13 | 417 MB | 2m 06s | | 50K | 8 | 509 MB | 1m 34s | | 100K | 5 | 535 MB | 1m 56s | | 250K | 4 | 698 MB | 1m 33s | | 500K | 3 | 696 MB | 1m 42s | Here, a larger variety of batch sizes were chosen because of the great variation in results. By asking the server to download small batches corresponding to fewer paths at a time, the server is able to provide better compression for these batches than it would for a regular clone. A typical full clone for this repository would require 738 MB. This example justifies the choice to batch requests by path name, leading to improved communication with a server that is not optimally packed. Finally, the same experiment for the Linux repository had these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|---------| | (Initial clone) | 2 | 2,153 MB | | | 25K | 63 | 6,380 MB | 14m 08s | | 50K | 58 | 6,126 MB | 15m 11s | | 100K | 30 | 6,135 MB | 18m 11s | | 250K | 14 | 6,146 MB | 18m 22s | | 500K | 8 | 6,143 MB | 33m 29s | Even in this example, where the default name hash algorithm leads to decent compression of the Linux kernel repository, there is value for selecting a smaller batch size, to a limit. The 25K batch size has the fastest time, but uses 250 MB more than the 50K batch size. The 500K batch size took much more time due to server compression time and thus we should avoid large batch sizes like this. Based on these experiments, a batch size of 50,000 was chosen as the default value. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: basic functionality and testsDerrick Stolee1-3/+99
The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. The path-walk API provides lists of objects in batches according to a common path, but that list could be very small. We want to balance the number of requests to the server with the ability to have the process interrupted with minimal repeated work to catch up in the next run. Based on some experiments (detailed in the next change) a minimum batch size of 50,000 is selected for the default. This batch size is a _minimum_. As the path-walk API emits lists of blob IDs, they are collected into a list of objects for a request to the server. When that list is at least the minimum batch size, then the request is sent to the server for the new objects. However, the list of blob IDs from the path-walk API could be much longer than the batch size. At this moment, it is unclear if there is a benefit to split the list when there are too many objects at the same path. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add builtin boilerplateDerrick Stolee1-0/+28
In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. Mark the builtin as experimental at this time, allowing breaking changes in the near future, if necessary. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano4-16/+20
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03Merge branch 'ps/3.0-remote-deprecation'Junio C Hamano1-0/+2
Following the procedure we established to introduce breaking changes for Git 3.0, allow an early opt-in for removing support of $GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure remotes. * ps/3.0-remote-deprecation: remote: announce removal of "branches/" and "remotes/" builtin/pack-redundant: remove subcommand with breaking changes ci: repurpose "linux-gcc" job for deprecations ci: merge linux-gcc-default into linux-gcc Makefile: wire up build option for deprecated features
2025-02-03Merge branch 'tb/unsafe-hash-cleanup'Junio C Hamano1-1/+1
The API around choosing to use unsafe variant of SHA-1 implementation has been updated in an attempt to make it harder to abuse. * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt5-32/+30
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: stop typedeffing the hash contextPatrick Steinhardt5-9/+9
We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanupJunio C Hamano1-1/+1
* tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper
2025-01-31Merge branch 'jc/show-index-h-update'Junio C Hamano1-1/+1
Doc and short-help text for "show-index" has been clarified to stress that the command reads its data from the standard input. * jc/show-index-h-update: show-index: the short help should say the command reads from its input