aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-02-26Git 2.49-rc0v2.49.0-rc0Junio C Hamano1-0/+6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25merge-strategies.adoc: detail submodule mergeLucas Seiki Oshiro1-0/+10
Submodule merges are, in general, similar to other merges based on oid three-way-merge. When a conflict happens, however, Git has two special cases (introduced in 68d03e4a6e44) on handling the conflict before yielding it to the user. From the merge-ort and merge-recursive sources: - "Case #1: a is contained in b or vice versa": both strategies try to perform a fast-forward in the submodules if the commit referred by the conflicted submodule is descendant of another; - "Case #2: There are one or more merges that contain a and b in the submodule. If there is only one, then present it as a suggestion to the user, but leave it marked unmerged so the user needs to confirm the resolution." Add a small paragraph on merge-strategies.adoc describing this behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25BreakingChanges: clarify branches/ and remotes/Junio C Hamano1-2/+2
As we have created an empty .git/branches/ hierarchy until fairly recently, these directories may be found in modern repositories, but it is highly unlikely that they are being used. Reported-by: Jakub Wilk <jwilk@jwilk.net> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25The fourteenth batchJunio C Hamano1-0/+16
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25Merge branch 'ad/set-default-target-in-makefiles'Junio C Hamano1-1/+4
Correct the default target in Documentation/Makefile, and future-proof all Makefiles from similar breakages by declaring the default target (which happens to be "all") upfront. * ad/set-default-target-in-makefiles: Makefile: set default goals in makefiles
2025-02-25Merge branch 'pw/merge-tree-stdin-deadlock-fix'Junio C Hamano1-3/+8
"git merge-tree --stdin" has been improved (including a workaround for a deadlock). * pw/merge-tree-stdin-deadlock-fix: merge-tree: fix link formatting in html docs merge-tree: improve docs for --stdin merge-tree: only use basic merge config merge-tree: remove redundant code merge-tree --stdin: flush stdout to avoid deadlock
2025-02-25Merge branch 'mh/doc-commit-title-not-subject'Junio C Hamano2-8/+8
The documentation of "git commit" and "git rebase" now refer to commit titles as such, not "subject". * mh/doc-commit-title-not-subject: doc: use 'title' consistently
2025-02-21The thirteenth batchJunio C Hamano1-0/+5
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-21Merge branch 'ac/doc-http-ssl-type-config'Junio C Hamano1-0/+15
Two configuration variables about SSL authentication material that weren't mentioned in the documentations are now mentioned. * ac/doc-http-ssl-type-config: docs: indicate http.sslCertType and sslKeyType
2025-02-21Merge branch 'en/doc-renormalize'Junio C Hamano3-4/+5
Doc updates. * en/doc-renormalize: doc: clarify the intent of the renormalize option in the merge machinery
2025-02-21Merge branch 'jc/doc-boolean-synonyms'Junio C Hamano3-7/+10
Doc updates. * jc/doc-boolean-synonyms: doc: centrally document various ways tospell `true` and `false`
2025-02-21builtin/refs: add '--no-reflog' flag to drop reflogsKarthik Nayak1-3/+8
The "git refs migrate" subcommand converts the backend used for ref storage. It always migrates reflog data as well as refs. Introduce an option to exclude reflogs from migration, allowing them to be discarded when they are unnecessary. This is particularly useful in server-side repositories, where reflogs are typically not expected. However, some repositories may still have them due to historical reasons, such as bugs, misconfigurations, or administrative decisions to enable reflogs for debugging. In such repositories, it would be optimal to drop reflogs during the migration. To address this, introduce the '--no-reflog' flag, which prevents reflog migration. When this flag is used, reflogs from the original reference backend are migrated. Since only the new reference backend remains in the repository, all previous reflogs are permanently discarded. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-19agent: advertise OS name via agent capabilityUsman Akinyemi1-3/+7
As some issues that can happen with a Git client can be operating system specific, it can be useful for a server to know which OS a client is using. In the same way it can be useful for a client to know which OS a server is using. Our current agent capability is in the form of "package/version" (e.g., "git/1.8.3.1"). Let's extend it to include the operating system name (os) i.e in the form "package/version-os" (e.g., "git/1.8.3.1-Linux"). Including OS details in the agent capability simplifies implementation, maintains backward compatibility, avoids introducing a new capability, encourages adoption across Git-compatible software, and enhances debugging by providing complete environment information without affecting functionality. The operating system name is retrieved using the 'sysname' field of the `uname(2)` system call or its equivalent. However, there are differences between `uname(1)` (command-line utility) and `uname(2)` (system call) outputs on Windows. These discrepancies complicate testing on Windows platforms. For example: - `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\ .2024-02-14.20:17.UTC.x86_64 - `uname(2)` output: Windows.10.0.20348 On Windows, uname(2) is not actually system-supplied but is instead already faked up by Git itself. We could have overcome the test issue on Windows by implementing a new `uname` subcommand in `test-tool` using uname(2), but except uname(2), which would be tested against itself, there would be nothing platform specific, so it's just simpler to disable the tests on Windows. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18The twelfth batchJunio C Hamano1-0/+17
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Merge branch 'jt/rev-list-missing-print-info'Junio C Hamano1-0/+19
"git rev-list --missing=" learned to accept "print-info" that gives known details expected of the missing objects, like path and type. * jt/rev-list-missing-print-info: rev-list: extend print-info to print missing object type rev-list: add print-info action to print missing object path
2025-02-18Merge branch 'ds/backfill'Junio C Hamano3-1/+82
Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate
2025-02-18promisor-remote: check advertised name or URLChristian Couder1-6/+16
A previous commit introduced a "promisor.acceptFromServer" configuration variable with only "None" or "All" as valid values. Let's introduce "KnownName" and "KnownUrl" as valid values for this configuration option to give more choice to a client about which promisor remotes it might accept among those that the server advertised. In case of "KnownName", the client will accept promisor remotes which are already configured on the client and have the same name as those advertised by the client. This could be useful in a corporate setup where servers and clients are trusted to not switch names and URLs, but where some kind of control is still useful. In case of "KnownUrl", the client will accept promisor remotes which have both the same name and the same URL configured on the client as the name and URL advertised by the server. This is the most secure option, so it should be used if possible. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Add 'promisor-remote' capability to protocol v2Christian Couder2-0/+71
When a server S knows that some objects from a repository are available from a promisor remote X, S might want to suggest to a client C cloning or fetching the repo from S that C may use X directly instead of S for these objects. Note that this could happen both in the case S itself doesn't have the objects and borrows them from X, and in the case S has the objects but knows that X is better connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections) than S. Implementation of the latter case, which would require S to omit in its response the objects available on X, is left for future improvement though. Then C might or might not, want to get the objects from X. If S and C can agree on C using X directly, S can then omit objects that can be obtained from X when answering C's request. To allow S and C to agree and let each other know about C using X or not, let's introduce a new "promisor-remote" capability in the protocol v2, as well as a few new configuration variables: - "promisor.advertise" on the server side, and: - "promisor.acceptFromServer" on the client side. By default, or if "promisor.advertise" is set to 'false', a server S will not advertise the "promisor-remote" capability. If S doesn't advertise the "promisor-remote" capability, then a client C replying to S shouldn't advertise the "promisor-remote" capability either. If "promisor.advertise" is set to 'true', S will advertise its promisor remotes with a string like: promisor-remote=<pr-info>[;<pr-info>]... where each <pr-info> element contains information about a single promisor remote in the form: name=<pr-name>[,url=<pr-url>] where <pr-name> is the urlencoded name of a promisor remote and <pr-url> is the urlencoded URL of the promisor remote named <pr-name>. For now, the URL is passed in addition to the name. In the future, it might be possible to pass other information like a filter-spec that the client may use when cloning from S, or a token that the client may use when retrieving objects from X. It is C's responsibility to arrange how it can reach X though, so pieces of information that are usually outside Git's concern, like proxy configuration, must not be distributed over this protocol. It might also be possible in the future for "promisor.advertise" to have other values. For example a value like "onlyName" could prevent S from advertising URLs, which could help in case C should use a different URL for X than the URL S is using. (The URL S is using might be an internal one on the server side for example.) By default or if "promisor.acceptFromServer" is set to "None", C will not accept to use the promisor remotes that might have been advertised by S. In this case, C will not advertise any "promisor-remote" capability in its reply to S. If "promisor.acceptFromServer" is set to "All" and S advertised some promisor remotes, then on the contrary, C will accept to use all the promisor remotes that S advertised and C will reply with a string like: promisor-remote=<pr-name>[;<pr-name>]... where the <pr-name> elements are the urlencoded names of all the promisor remotes S advertised. In a following commit, other values for "promisor.acceptFromServer" will be implemented, so that C will be able to decide the promisor remotes it accepts depending on the name and URL it received from S. So even if that name and URL information is not used much right now, it will be needed soon. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18doc: use 'title' consistentlyM Hickford2-8/+8
The first line of a commit message is variously called 'title' or 'subject'. Prefer 'title' unless discussing email. Signed-off-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18merge-tree: fix link formatting in html docsPhillip Wood1-1/+2
In the html documentation the link to the "OUTPUT" section is surrounded by square brackets. Fix this by adding explicit link text to the cross reference. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18merge-tree: improve docs for --stdinPhillip Wood1-2/+6
Add a section for --stdin in the list of options and document that it implies -z so readers know how to parse the output. Also correct the merge status documentation for --stdin as if the status is less than zero "git merge-tree" dies before printing it. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Makefile: set default goals in makefilesAdam Dinwoodie1-1/+4
Explicitly set the default goal at the very top of various makefiles. This is already present in some makefiles, but not all of them. In particular, this corrects a regression introduced in a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06). That commit added some config files as build targets for the Documentation directory, and put the target configuration in a sensible place. Unfortunately, that sensible place was above any other build target definitions, meaning the default goal changed to being those configuration files only, rather than the HTML and man page documentation. Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Helped-by: Junio C Hamano <gitster@pobox.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14The eleventh batchJunio C Hamano1-0/+14
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14Merge branch 'ps/doc-http-upload-archive-service'Junio C Hamano1-0/+4
Doc update. * ps/doc-http-upload-archive-service: doc: documentation for http.uploadarchive config option
2025-02-14Merge branch 'tc/clone-single-revision'Junio C Hamano1-7/+19
"git clone" learned to make a shallow clone for a single commit that is not necessarily be at the tip of any branch. * tc/clone-single-revision: builtin/clone: teach git-clone(1) the --revision= option parse-options: introduce die_for_incompatible_opt2() clone: introduce struct clone_opts in builtin/clone.c clone: add tags refspec earlier to fetch refspec clone: refactor wanted_peer_refs() clone: make it possible to specify --tags clone: cut down on global variables in clone.c
2025-02-14Merge branch 'bc/doc-adoc-not-txt'Junio C Hamano917-616/+617
All the documentation .txt files have been renamed to .adoc to help content aware editors. * bc/doc-adoc-not-txt: Remove obsolete ".txt" extensions for AsciiDoc files doc: use .adoc extension for AsciiDoc files gitattributes: mark AsciiDoc files as LF-only editorconfig: add .adoc extension doc: update gitignore for .adoc extension
2025-02-14config/remote.txt: improve wording for 'remote.<name>.followRemoteHEAD'Philippe Blain1-6/+6
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14config/remote.txt: reunite 'severOption' description paragraphsPhilippe Blain1-5/+5
When 'remote.<name>.followRemoteHEAD' was added in b7f7d16562 (fetch: add configuration for set_head behaviour, 2024-11-29), its description was added to remote.txt in between the two paragraphs describing 'remote.<name>.serverOption'. Reunite these two paragraphs. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12The tenth batchJunio C Hamano1-0/+22
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12Merge branch 'da/help-autocorrect-one-fix'Junio C Hamano1-2/+2
"git -c help.autocorrect=0 psuh" shows the suggested typofix, unlike the previous attempt in the base topic. * da/help-autocorrect-one-fix: help: add "show" as a valid configuration value help: show the suggested command when help.autocorrect is false
2025-02-12Merge branch 'sc/help-autocorrect-one'Junio C Hamano1-4/+5
"[help] autocorrect = 1" used to be a way to say "please wait for 0.1 second after suggesting a typofix of the command name before running that command"; now it means "yes, if there is a plausible typofix for the command name, please run it immediately". * sc/help-autocorrect-one: help: interpret boolean string values for help.autocorrect
2025-02-12Merge branch 'jp/doc-trailer-config'Junio C Hamano3-135/+140
Documentaiton updates. * jp/doc-trailer-config: config.txt: add trailer.* variables
2025-02-12Merge branch 'zh/gc-expire-to'Junio C Hamano1-0/+7
"git gc" learned the "--expire-to" option and passes it down to underlying "git repack". * zh/gc-expire-to: gc: add `--expire-to` option
2025-02-12Merge branch 'ds/name-hash-tweaks'Junio C Hamano2-2/+39
"git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version
2025-02-11doc: clarify the intent of the renormalize option in the merge machineryElijah Newren3-4/+5
The -X renormalize (or merge.renormalize config) option is intended to reduce conflicts due to normalization of newer versions of history. It does so by renormalizing files that it is about to do a three-way content merge on. Some folks thought it would renormalize all files throughout the tree, and the previous wording wasn't clear enough to dispell that misconception. Update the docs to make it clear that the merge machinery will only apply renormalization to files which need a three-way content merge. (Technically, the merge machinery also does renormalization on modify/delete conflicts, in order to see if the modification was merely a normalization; if so, it can accept the delete and not report a conflict. But it's not clear that this piece needs to be explained to users, and trying to distinguish it might feel like splitting hairs and overcomplicating the explanation, so we leave it out.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-11doc: centrally document various ways tospell `true` and `false`Junio C Hamano3-7/+10
We do not seem to centrally document exhaustively ways to spell Boolean values. The description in the Environment Variables of git(1) section assumes that the reader is already familiar with how "Boolean valued configuration variables" are specified, without referring to anything, so there is no way for the readers to find out more. The description of `bool` in the section on "--type <type>" in "git config --help" might be the place to do so, but it is not telling us all that much. The description of Boolean valued placeholders in the pretty formats section of "git log --help" enumerates the possible values with "etc." implying there may be other synonyms; shrink the list of samples and instead refer to the canonical and authoritative source of truth, which now is git-config(1). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10The ninth batchJunio C Hamano1-0/+16
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06The eighth batchJunio C Hamano1-0/+10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06doc: documentation for http.uploadarchive config optionPiotr Szlazak1-0/+4
In Git v2.44.0 support for 'git archive' over HTTP protocol was added, but it was nowhere documented how it should be enabled in git-http-backend. Add missing documentation. Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06builtin/clone: teach git-clone(1) the --revision= optionToon Claes1-0/+9
The git-clone(1) command has the option `--branch` that allows the user to select the branch they want HEAD to point to. In a non-bare repository this also checks out that branch. Option `--branch` also accepts a tag. When a tag name is provided, the commit this tag points to is checked out and HEAD is detached. Thus `--branch` can be used to clone a repository and check out a ref kept under `refs/heads` or `refs/tags`. But some other refs might be in use as well. For example Git forges might use refs like `refs/pull/<id>` and `refs/merge-requests/<id>` to track pull/merge requests. These refs cannot be selected upon git-clone(1). Add option `--revision` to git-clone(1). This option accepts a fully qualified reference, or a hexadecimal commit ID. This enables the user to clone and check out any revision they want. `--revision` can be used in conjunction with `--depth` to do a minimal clone that only contains the blob and tree for a single revision. This can be useful for automated tests running in CI systems. Using option `--branch` and `--single-branch` together is a similar scenario, but serves a different purpose. Using these two options, a singlet remote tracking branch is created and the fetch refspec is set up so git-fetch(1) will receive updates on that branch from the remote. This allows the user work on that single branch. Option `--revision` on contrary detaches HEAD, creates no tracking branches, and writes no fetch refspec. Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im> [jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: make it possible to specify --tagsToon Claes1-7/+10
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option to clone without tags, 2017-04-26). At the time there was no need to support --tags as well, although there was some conversation about it[1]. To simplify the code and to prepare for future commits, invert the flag internally. Functionally there is no change, because the flag is default-enabled passing `--tags` has no effect, so there's no need to add tests for this. [1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/ Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05docs: indicate http.sslCertType and sslKeyTypeAndrew Carter1-0/+15
0a01d41ee4 (http: add support for different sslcert and sslkey types., 2023-03-20) added useful SSL config options, but did not document them. Signed-off-by: Andrew Carter <andrew@emailcarter.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05rev-list: extend print-info to print missing object typeJustin Tobler1-0/+3
Additional information about missing objects found in git-rev-list(1) can be printed by specifying the `print-info` missing action for the `--missing` option. Extend this action to also print missing object type information inferred from its containing object. This token follows the form `type=<type>` and specifies the expected object type of the missing object. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05rev-list: add print-info action to print missing object pathJustin Tobler1-0/+16
Missing objects identified through git-rev-list(1) can be printed by setting the `--missing=print` option. Additional information about the missing object, such as its path and type, may be present in its containing object. Add the `print-info` missing action for the `--missing` option that, when set, prints additional insight about the missing object inferred from its containing object. Each line of output for a missing object is in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs containing additional information are separated from each other by a SP. The value is encoded in a token specific fashion, but SP or LF contained in value are always expected to be represented in such a way that the resulting encoded value does not have either of these two problematic bytes. This format is kept generic so it can be extended in the future to support additional information. For now, only a missing object path info is implemented. It follows the form `path=<path>` and specifies the full path to the object from the top-level tree. A path containing SP or special characters is enclosed in double-quotes in the C style as needed. In a subsequent commit, missing object type info will also be added. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: assume --sparse when sparse-checkout is enabledDerrick Stolee1-1/+2
The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --sparse optionDerrick Stolee2-1/+13
One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 3 | 110 MB | | | 10K | 12 | 192 MB | 17.2s | | 15K | 9 | 192 MB | 15.5s | | 20K | 8 | 192 MB | 15.5s | | 25K | 7 | 192 MB | 14.7s | This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|------| | (Initial clone) | 2 | 1,876 MB | | | 10K | 11 | 2,187 MB | 46s | | 25K | 7 | 2,188 MB | 43s | | 50K | 5 | 2,194 MB | 44s | | 100K | 4 | 2,194 MB | 48s | This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --min-batch-size=<n> optionDerrick Stolee1-1/+11
Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. For now, we let the path-walk API batches determine the boundaries. To get a feeling for the value of specifying the --min-batch-size parameter, I tested a number of open source repositories available on GitHub. The procedure was generally: 1. git clone --filter=blob:none <url> 2. git backfill Checking the number of packfiles and the size of the .git/objects/pack directory helps to identify the effects of different batch sizes. For the Git repository, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 2 | 119 MB | | | 25K | 8 | 290 MB | 24s | | 50K | 5 | 290 MB | 24s | | 100K | 4 | 290 MB | 29s | Other than the packfile counts decreasing as we need fewer batches, the size and time required is not changing much for this small example. For the nodejs/node repository, we see these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|--------| | (Initial clone) | 2 | 330 MB | | | 25K | 19 | 1,222 MB | 1m 22s | | 50K | 11 | 1,221 MB | 1m 24s | | 100K | 7 | 1,223 MB | 1m 40s | | 250K | 4 | 1,224 MB | 2m 23s | | 500K | 3 | 1,216 MB | 4m 38s | Here, we don't have much difference in the size of the repo, though the 500K batch size results in a few MB gained. That comes at a cost of a much longer time. This extra time is due to server-side delta compression happening as the on-disk deltas don't appear to be reusable all the time. But for smaller batch sizes, the server is able to find reasonable deltas partly because we are asking for objects that appear in the same region of the directory tree and include all versions of a file at a specific path. To contrast this example, I tested the microsoft/fluentui repo, which has been known to have inefficient packing due to name hash collisions. These results are found before GitHub had the opportunity to repack the server with more advanced name hash versions: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|--------| | (Initial clone) | 2 | 105 MB | | | 5K | 53 | 348 MB | 2m 26s | | 10K | 28 | 365 MB | 2m 22s | | 15K | 19 | 407 MB | 2m 21s | | 20K | 15 | 393 MB | 2m 28s | | 25K | 13 | 417 MB | 2m 06s | | 50K | 8 | 509 MB | 1m 34s | | 100K | 5 | 535 MB | 1m 56s | | 250K | 4 | 698 MB | 1m 33s | | 500K | 3 | 696 MB | 1m 42s | Here, a larger variety of batch sizes were chosen because of the great variation in results. By asking the server to download small batches corresponding to fewer paths at a time, the server is able to provide better compression for these batches than it would for a regular clone. A typical full clone for this repository would require 738 MB. This example justifies the choice to batch requests by path name, leading to improved communication with a server that is not optimally packed. Finally, the same experiment for the Linux repository had these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|---------| | (Initial clone) | 2 | 2,153 MB | | | 25K | 63 | 6,380 MB | 14m 08s | | 50K | 58 | 6,126 MB | 15m 11s | | 100K | 30 | 6,135 MB | 18m 11s | | 250K | 14 | 6,146 MB | 18m 22s | | 500K | 8 | 6,143 MB | 33m 29s | Even in this example, where the default name hash algorithm leads to decent compression of the Linux kernel repository, there is value for selecting a smaller batch size, to a limit. The 25K batch size has the fastest time, but uses 250 MB more than the 50K batch size. The 500K batch size took much more time due to server compression time and thus we should avoid large batch sizes like this. Based on these experiments, a batch size of 50,000 was chosen as the default value. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: basic functionality and testsDerrick Stolee2-1/+33
The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. The path-walk API provides lists of objects in batches according to a common path, but that list could be very small. We want to balance the number of requests to the server with the ability to have the process interrupted with minimal repeated work to catch up in the next run. Based on some experiments (detailed in the next change) a minimum batch size of 50,000 is selected for the default. This batch size is a _minimum_. As the path-walk API emits lists of blob IDs, they are collected into a list of objects for a request to the server. When that list is at least the minimum batch size, then the request is sent to the server for the new objects. However, the list of blob IDs from the path-walk API could be much longer than the batch size. At this moment, it is unclear if there is a benefit to split the list when there are too many objects at the same path. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add builtin boilerplateDerrick Stolee2-0/+26
In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. Mark the builtin as experimental at this time, allowing breaking changes in the near future, if necessary. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03help: add "show" as a valid configuration valueDavid Aguilar1-1/+1
Add a literal value for showing the suggested autocorrection for consistency with the rest of the help.autocorrect options. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>