aboutsummaryrefslogtreecommitdiffstats
path: root/cache-tree.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-11-08doc: add an explanation of Git's data modelJulia Evans4-2/+306
Git very often uses the terms "object", "reference", or "index" in its documentation. However, it's hard to find a clear explanation of these terms and how they relate to each other in the documentation. The closest candidates currently are: 1. `gitglossary`. This makes a good effort, but it's an alphabetically ordered dictionary and a dictionary is not a good way to learn concepts. You have to jump around too much and it's not possible to present the concepts in the order that they should be explained. 2. `gitcore-tutorial`. This explains how to use the "core" Git commands. This is a nice document to have, but it's not necessary to learn how `update-index` works to understand Git's data model, and we should not be requiring users to learn how to use the "plumbing" commands if they want to learn what the term "index" or "object" means. 3. `gitrepository-layout`. This is a great resource, but it includes a lot of information about configuration and internal implementation details which are not related to the data model. It also does not explain how commits work. The result of this is that Git users (even users who have been using Git for 15+ years) struggle to read the documentation because they don't know what the core terms mean, and it's not possible to add links to help them learn more. Add an explanation of Git's data model. Some choices I've made in deciding what "core data model" means: 1. Omit pseudorefs like `FETCH_HEAD`, because it's not clear to me if those are intended to be user facing or if they're more like internal implementation details. 2. Don't talk about submodules other than by mentioning how they relate to trees. This is because Git has a lot of special features, and explaining how they all work exhaustively could quickly go down a rabbit hole which would make this document less useful for understanding Git's core behaviour. 3. Don't discuss the structure of a commit message (first line, trailers etc). 4. Don't mention configuration. 5. Don't mention the `.git` directory, to avoid getting too much into implementation details Signed-off-by: Julia Evans <julia@jvns.ca> [jc: fixed up xml validation error in the documentation part] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-07submodule: fix case-folding gitdir filesystem colisionsAdrian Ratiu4-2/+73
Add a new check in validate_submodule_git_dir() to detect and prevent case-folding filesystem colisions. When this new check is triggered, a stricter casefolding aware URI encoding is used to percent-encode uppercase characters, e.g. Foo becomes %46oo. By using this check/retry mechanism the uppercase encoding is only applied when necessary, so case-sensitive filesystems are not affected. Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-07submodule: add extension to encode gitdir pathsAdrian Ratiu12-26/+294
Add a submoduleEncoding extension which fixes filesystem collisions by encoding gitdir paths. At a high level, this implements a mechanism to encode -> validate -> retry until a working gitdir path is found. Credit goes to Junio for coming up with this design: encoding is only applied when necessary, e.g. uppercase characters are encoded only on case-folding filesystems and only if a real conflict is detected. To make this work, we rely on the submodule.<name>.gitdir config as the single source of truth for gitidir paths: the config is always set when the extension is enabled. Users who care about gitdir paths are expected to get/set the config and not the underlying encoding implementation. This commit adds the basic encoding logic which addresses nested gitdirs. The next commit fixes case-folding, the next commit fixes names longer than NAME_MAX. The idea is the encoding can be improved over time in a way which is transparent to users. Suggested-by: Junio C Hamano <gitster@pobox.com> Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Suggested-by: Patrick Steinhardt <ps@pks.im> Based-on-patch-by: Brandon Williams <bwilliams.eng@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-07builtin/credential-store: move is_rfc3986_unreserved to url.[ch]Adrian Ratiu3-6/+14
is_rfc3986_unreserved() was moved to credential-store.c and was made static by f89854362c (credential-store: move related functions to credential-store file, 2023-06-06) under a correct assumption, at the time, that it was the only place using it. However now we need it to apply URL-encoding to submodule names when constructing gitdir paths, to avoid conflicts, so bring it back as a public function exposed via url.h, instead of the old helper path (strbuf), which has nothing to do with 3986 encoding/decoding anymore. This function will be used by submodule.c in the next commit. Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-07submodule--helper: use submodule_name_to_gitdir in add_submoduleAdrian Ratiu1-6/+7
While testing submodule gitdir path encoding, I noticed submodule--helper is still using a hardcoded modules gitdir path leading to test failures. Call the submodule_name_to_gitdir() helper instead, which was invented exactly for this purpose and is already used by all the other locations which work on gitdirs. Also narrow the scope of the submod_gitdir_path variable which is not used anymore in the updated "else" branch. Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-07blame: make diff algorithm configurableAntonin Delpeuch6-21/+278
The diff algorithm used in 'git-blame(1)' is set to 'myers', without the possibility to change it aside from the `--minimal` option. There has been long-standing interest in changing the default diff algorithm to "histogram", and Git 3.0 was floated as a possible occasion for taking some steps towards that: https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/ As a preparation for this move, it is worth making sure that the diff algorithm is configurable where useful. Make it configurable in the `git-blame(1)` command by introducing the `--diff-algorithm` option and make honor the `diff.algorithm` config variable. Keep Myers diff as the default. Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-07xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASKAntonin Delpeuch3-5/+1
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience and histogram diffs, not for the minimal one. This means that when reseting the diff algorithm to the default one, one needs to separately clear the bit for the minimal diff. There are places in the code that fail to do that: merge-ort.c and builtin/merge-file.c. Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate clearing of this bit in the places where it hasn't been forgotten. Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06object: fix performance regression when peeling tagsPatrick Steinhardt5-11/+11
Our Bencher dashboards [1] have recently alerted us about a bunch of performance regressions when writing references, specifically with the reftable backend. There is a 3x regression when writing many refs with preexisting refs in the reftable format, and a 10x regression when migrating refs between backends in either of the formats. Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled object IDs for invalid tags, 2025-10-23). The gist of the commit is that we may end up storing peeled objects in both reftables and packed-refs for corrupted tags, where the claimed tagged object type is different than the actual tagged object type. This will then cause us to create the `struct object *` with a wrong type, as well, and obviously nothing good comes out of that. The fix for this issue was to introduce a new flag to `peel_object()` that causes us to verify the tagged object's type before writing it into the refdb -- if the tag is corrupt, we skip writing the peeled value. To verify whether the peeled value is correct we have to look up the object type via the ODB and compare the actual type with the claimed type, and that additional object lookup is costly. This also explains why we see the regression only when writing refs with the reftable backend, but we see the regression with both backends when migrating refs: - The reftable backend knows to store peeled values in the new table immediately, so it has to try and peel each ref it's about to write to the transaction. So the performance regression is visible for all writes. - The files backend only stores peeled values when writing the packed-refs file, so it wouldn't hit the performance regression for normal writes. But on ref migrations we know to write all new values into the packed-refs file immediately, and that's why we see the regression for both backends there. Taking a step back though reveals an oddity in the new verification logic: we not only verify the _tagged_ object's type, but we also verify the type of the tag itself. But this isn't really needed, as we wouldn't hit the bug in such a case anyway, as we only hit the issue with corrupt tags claiming an invalid type for the tagged object. The consequence of this is that we now started to look up the target object of every single reference we're about to write, regardless of whether it even is a tag or not. And that is of course quite costly. Fix the issue by only verifying the type of the tagged objects. This means that we of course still have a performance hit for actual tags. But this only happens for writes anyway, and I'd claim it's preferable to not store corrupted data in the refdb than to be fast here. Rename the flag accordingly to clarify that we only verify the tagged object's type. This fix brings performance back to previous levels: Benchmark 1: baseline Time (mean ± σ): 46.0 ms ± 0.4 ms [User: 40.0 ms, System: 5.7 ms] Range (min … max): 45.0 ms … 47.1 ms 54 runs Benchmark 2: regression Time (mean ± σ): 140.2 ms ± 1.3 ms [User: 77.5 ms, System: 60.5 ms] Range (min … max): 138.0 ms … 142.7 ms 20 runs Benchmark 3: fix Time (mean ± σ): 46.2 ms ± 0.4 ms [User: 40.2 ms, System: 5.7 ms] Range (min … max): 45.0 ms … 47.3 ms 55 runs Summary update-ref: baseline 1.00 ± 0.01 times faster than fix 3.05 ± 0.04 times faster than regression [1]: https://bencher.dev/perf/git/plots Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06ci: update {download,upload}-artifact Action versionsJohannes Schindelin1-10/+10
Bumps `actions/upload-artifact` from 4 to 5. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v5) Bumps `actions/download-artifact` from 5 to 6. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v5...v6) Originally-authored-by: dependabot[bot] <support@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06maintenance: add 'is-needed' subcommandKarthik Nayak3-17/+105
The 'git-maintenance(1)' command provides tooling to run maintenance tasks over Git repositories. The 'run' subcommand, as the name suggests, runs the maintenance tasks. When used with the '--auto' flag, it uses heuristics to determine if the required thresholds are met for running said maintenance tasks. There is however a lack of insight into these heuristics. Meaning, the checks are linked to the execution. Add a new 'is-needed' subcommand to 'git-maintenance(1)' which allows users to simply check if it is needed to run maintenance without performing it. This subcommand can check if it is needed to run maintenance without actually running it. Ideally it should be used with the '--auto' flag, which would allow users to check if the thresholds required are met. The subcommand also supports the '--task' flag which can be used to check specific maintenance tasks. While adding the respective tests in 't/t7900-maintenance.sh', remove a duplicate of the test: 'worktree-prune task with --auto honors maintenance.worktree-prune.auto'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06maintenance: add checking logic in `pack_refs_condition()`Karthik Nayak2-10/+21
The 'git-maintenance(1)' command supports an '--auto' flag. Usage of the flag ensures to run maintenance tasks only if certain thresholds are met. The heuristic is defined on a task level, wherein each task defines an 'auto_condition', which states if the task should be run. The 'pack-refs' task is hard-coded to return 1 as: 1. There was never a way to check if the reference backend needs to be optimized without actually performing the optimization. 2. We can pass in the '--auto' flag to 'git-pack-refs(1)' which would optimize based on heuristics. The previous commit added a `refs_optimize_required()` function, which can be used to check if a reference backend required optimization. Use this within `pack_refs_condition()`. This allows us to add a 'git maintenance is-needed' subcommand which can notify the user if maintenance is needed without actually performing the optimization. Without this change, the reference backend would always state that optimization is needed. Since we import 'revision.h', we need to remove the definition for 'SEEN' which is duplicated in the included header. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06refs: add a `optimize_required` field to `struct ref_storage_be`Karthik Nayak7-0/+82
To allow users of the refs namespace to check if the reference backend requires optimization, add a new field `optimize_required` field to `struct ref_storage_be`. This field is of type `optimize_required_fn` which is also introduced in this commit. Modify the debug, files, packed and reftable backend to implement this field. A following commit will expose this via 'git pack-refs' and 'git refs optimize'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06reftable/stack: add function to check if optimization is requiredKarthik Nayak3-7/+58
The reftable backend performs auto-compaction as part of its regular flow, which is required to keep the number of tables part of a stack at bay. This allows it to stay optimized. Compaction can also be triggered voluntarily by the user via the 'git pack-refs' or the 'git refs optimize' command. However, currently there is no way for the user to check if optimization is required without actually performing it. Extract out the heuristics logic from 'reftable_stack_auto_compact()' into an internal function 'update_segment_if_compaction_required()'. Then use this to add and expose `reftable_stack_compaction_required()` which will allow users to check if the reftable backend can be optimized. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06reftable/stack: return stack segments directlyKarthik Nayak1-11/+12
The `stack_table_sizes_for_compaction()` function returns individual sizes of each reftable table. This function is only called by `reftable_stack_auto_compact()` to decide which tables need to be compacted, if any. Modify the function to directly return the segments, which avoids the extra step of receiving the sizes only to pass it to `suggest_compaction_segment()`. A future commit will also add functionality for checking whether auto-compaction is necessary without performing it. This change allows code re-usability in that context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06meson: make GIT_HTML_PATH configurableD. Ben Knoble7-13/+20
Makefile-based builds can configure Git's internal HTML_PATH by defining htmldir, which is useful for packagers that put documentation in different locations. Gentoo, for example, uses version-suffixed directories like ${prefix}/share/doc/git-2.51 and puts the HTML documentation in an 'html' subdirectory of the same. Propagate the same configuration knob to Meson-based builds so that "git --html-path" on such systems can be configured to output the correct directory. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06perl: also mark git-contacts executableD. Ben Knoble1-1/+1
When installing git-contacts with Meson via -Dcontrib=contacts, the default Perl generation fails to mark it executable. As a result, "git contacts" reports "'contacts' is not a git command." Unlike generate-script.sh, we aren't testing the basename here; so, glob the script name in the case arm to match wherever the input comes from. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06wincred: align Makefile with other Makefiles in contribThomas Uhle1-8/+10
* Replace $(LOADLIBES) because it is deprecated since long and it is used nowhere else in the git project. * Use $(gitexecdir) instead of $(libexecdir) because config.mak defines $(libexecdir) as $(prefix)/libexec, not as $(prefix)/libexec/git-core. * Similar to other Makefiles, let install target rule create $(gitexecdir) to make sure the directory exists before copying the executable and also let it respect $(DESTDIR). * Shuffle the lines for the default settings to align them with the other Makefiles in contrib/credential. * Define .PHONY for all special targets (all, install, clean). Signed-off-by: Thomas Uhle <thomas.uhle@mailbox.tu-dresden.de> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06doc: clarify server behavior for invalid 'want' lines in HTTP protocolQueen Ediri Jessa1-1/+2
Update the documentation to clearly describe how the server responds when a client sends an invalid or malformed `want` line during the HTTP protocol exchange. The server includes the offending object name in its error message. Signed-off-by: Queen Ediri Jessa <qjessa662@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06rebase: support --trailerLi Chen8-4/+262
Implement a new `--trailer <text>` option for `git rebase` (support merge backend only now), which appends arbitrary trailer lines to each rebased commit message. Reject it if the user passes an option that requires the apply backend (git am) since it lacks message‑filter/trailer hook. otherwise we can just use the merge backend. Automatically set REBASE_FORCE when any trailer is supplied. And reject invalid input before user edits the interactive file. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06trailer: append trailers in-process and drop the fork to `interpret-trailers`Li Chen7-56/+90
Route all trailer insertion through trailer_process() and make builtin/interpret-trailers just do file I/O before calling into it. amend_file_with_trailers() now shares the same code path. This removes the fork/exec and tempfile juggling, cutting overhead and simplifying error handling. No functional change. It also centralizes logic to prepare for follow-up rebase --trailer patch. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06trailer: move process_trailers to trailer.hLi Chen3-36/+39
This function would be used by trailer_process in following commits. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06interpret-trailers: factor out buffer-based processing to process_trailers()Li Chen1-22/+29
Extracted trailer processing into a helper that accumulates output in a strbuf before writing. Updated interpret_trailers() to reuse the helper, buffer output, and clean up both input and output buffers after writing. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05refs: add missing space in messagesPeter Krefting2-2/+2
Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05builtin/history: implement "split" subcommandPatrick Steinhardt4-0/+713
It is quite a common use case that one wants to split up one commit into multiple commits by moving parts of the changes of the original commit out into a separate commit. This is quite an involved operation though: 1. Identify the commit in question that is to be dropped. 2. Perform an interactive rebase on top of that commit's parent. 3. Modify the instruction sheet to "edit" the commit that is to be split up. 4. Drop the commit via "git reset HEAD~". 5. Stage changes that should go into the first commit and commit it. 6. Stage changes that should go into the second commit and commit it. 7. Finalize the rebase. This is quite complex, and overall I would claim that most people who are not experts in Git would struggle with this flow. Introduce a new "split" subcommand for git-history(1) to make this way easier. All the user needs to do is to say `git history split $COMMIT`. From hereon, Git asks the user which parts of the commit shall be moved out into a separate commit and, once done, asks the user for the commit message. Git then creates that split-out commit and applies the original commit on top of it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05cache-tree: allow writing in-memory index as treePatrick Steinhardt3-5/+6
The function `write_in_core_index_as_tree()` takes a repository and writes its index into a tree object. What this function cannot do though is to take an _arbitrary_ in-memory index. Introduce a new `struct index_state` parameter so that the caller can pass a different index than the one belonging to the repository. This will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05add-patch: add support for in-memory index patchingPatrick Steinhardt2-4/+116
With `run_add_p()` callers have the ability to apply changes from a specific revision to a repository's index. This infra supports several different modes, like for example applying changes to the index, working tree or both. One feature that is missing though is the ability to apply changes to an in-memory index different from the repository's index. Add a new function `run_add_p_index()` to plug this gap. This new function will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05add-patch: remove dependency on "add-interactive" subsystemPatrick Steinhardt1-33/+37
With the preceding commit we have split out interactive configuration that is used by both "git add -p" and "git add -i". But we still initialize that configuration in the "add -p" subsystem by calling `init_add_i_state()`, even though we only do so to initialize the interactive configuration as well as a repository pointer. Stop doing so and instead store and initialize the interactive configuration in `struct add_p_state` directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05add-patch: split out `struct interactive_options`Patrick Steinhardt10-239/+270
The `struct add_p_opt` is reused both by our infra for "git add -p" and "git add -i". Users of `run_add_i()` for example are expected to pass `struct add_p_opt`. This is somewhat confusing and raises the question of which options apply to what part of the stack. But things are even more confusing than that: while callers are expected to pass in `struct add_p_opt`, these options ultimately get used to initialize a `struct add_i_state` that is used by both subsystems. So we are basically going full circle here. Refactor the code and split out a new `struct interactive_options` that hosts common options used by both. These options are then applied to a `struct interactive_config` that hosts common configuration. This refactoring doesn't yet fully detangle the two subsystems from one another, as we still end up calling `init_add_i_state()` in the "git add -p" subsystem. This will be fixed in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05add-patch: split out header from "add-interactive.h"Patrick Steinhardt3-20/+30
While we have a "add-patch.c" code file, its declarations are part of "add-interactive.h". This makes it somewhat harder than necessary to find relevant code and to identify clear boundaries between the two subsystems. Split up concerns and move declarations that relate to "add-patch.c" into a new "add-patch.h" header. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05builtin/history: implement "reword" subcommandPatrick Steinhardt5-9/+573
Implement a new "reword" subcommand for git-history(1). This subcommand is essentially the same as if a user performed an interactive rebase with a single commit changed to use the "reword" verb. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05builtin: add new "history" commandPatrick Steinhardt11-0/+91
When rewriting history via git-rebase(1) there are a couple of very common use cases: - The ordering of two commits should be reversed. - A commit should be split up into two commits. - A commit should be dropped from the history completely. - Multiple commits should be squashed into one. While these operations are all doable, it often feels needlessly kludgey to do so by doing an interactive rebase, using the editor to say what one wants, and then perform the actions. Furthermore, some operations like splitting up a commit into two are way more involved than that and require a whole series of commands. Add a new "history" command to plug this gap. This command will have several different subcommands to imperatively rewrite history for common use cases like the above. These subcommands will be implemented in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: stop using `the_repository`Patrick Steinhardt1-1/+1
In `create_commit()` we're using `the_repository` even though we already have a repository passed to use as an argument. Fix this. Note that we still cannot get rid of `USE_THE_REPOSITORY_VARIABLE`. This is because we use `DEFAULT_ABBREV and `get_commit_output_encoding()`, both of which are stored as global variables that can be modified via the Git configuration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: extract logic to pick commitsPatrick Steinhardt5-107/+143
We're about to add a new git-history(1) command that will reuse some of the same infrastructure as git-replay(1). To prepare for this, extract the logic to pick a commit into a new "replay.c" file so that it can be shared between both commands. Rename the function to have a "replay_" prefix to clearly indicate its subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05wt-status: provide function to expose status for treesPatrick Steinhardt2-0/+33
The "wt-status" subsystem is responsible for printing status information around the current state of the working tree. This most importantly includes information around whether the working tree or the index have any changes. We're about to introduce a new command where the changes in neither of them are actually relevant to us. Instead, what we want is to format the changes between two different trees. While it is a little bit of a stretch to add this as functionality to _working tree_ status, it doesn't make any sense to open-code this functionality, either. Implement a new function `wt_status_collect_changes_trees()` that diffs two trees and formats the status accordingly. This function is not yet used, but will be in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05Git 2.52-rc1v2.52.0-rc1Junio C Hamano2-1/+8
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05attr: enable incomplete-line whitespace error for this projectJunio C Hamano1-2/+2
Now "git diff --check" and "git apply --whitespace=warn/fix" learned incomplete line is a whitespace error, enable them for this project to prevent patches to add new incomplete lines to our sources. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: highlight and error out on incomplete linesJunio C Hamano2-6/+86
Teach "git diff" to highlight "\ No newline at end of file" message as a whitespace error when incomplete-line whitespace error class is in effect. Thanks to the previous refactoring of complete rewrite code path, we can do this at a single place. Unlike whitespace errors in the payload where we need to annotate in line, possibly using colors, the line that has whitespace problems, we have a dedicated line already that can serve as the error message, so paint it as a whitespace error message. Also teach "git diff --check" to notice incomplete lines as whitespace errors and report when incomplete-line whitespace error class is in effect. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05apply: check and fix incomplete linesJunio C Hamano3-1/+213
The final line of a file that lacks the terminating newline at its end is called an incomplete line. In general they are frowned upon for many reasons (imagine concatenating two files with "cat A B" and what happens when A ends in an incomplete line, for example), and text-oriented tools often mishandle such a line. Implement checks in "git apply" for incomplete lines, which is off by default for backward compatibility's sake, so that "git apply --whitespace={fix,warn,error}" can notice, warn against, and fix them. As one of the new test shows, if you modify contents on an incomplete line in the original and leave the resulting line incomplete, it is still considered a whitespace error, the reasoning being that "you'd better fix it while at it if you are making a change on an incomplete line anyway", which may controversial. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05whitespace: allocate a few more bits and define WS_INCOMPLETE_LINEJunio C Hamano5-12/+21
Reserve a few more bits in the diff flags word to be used for future whitespace rules. Add WS_INCOMPLETE_LINE without implementing the behaviour (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05apply: revamp the parsing of incomplete linesJunio C Hamano1-21/+49
A patch file represents the incomplete line at the end of the file with two lines, one that is the usual "context" with " " as the first letter, "added" with "+" as the first letter, or "removed" with "-" as the first letter that shows the content of the line, plus an extra "\ No newline at the end of file" line that comes immediately after it. Ever since the apply machinery was written, the "git apply" machinery parses "\ No newline at the end of file" line independently, without even knowing what line the incomplete-ness applies to, simply because it does not even remember what the previous line was. This poses a problem if we want to check and warn on an incomplete line. Revamp the code that parses a fragment, to actually drop the '\n' at the end of the incoming patch file that terminates a line, so that check_whitespace() calls made from the code path actually sees an incomplete as incomplete. Note that the result of this parsing is not directly used by the code path that applies the patch. apply_one_fragment() function already checks if each of the patch text it handles is followed by a line that begins with a backslash to drop the newline at the end of the current line it is looking at. In a sense, this patch harmonizes the behaviour of the parsing side to what is already done in the application side. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: update the way rewrite diff handles incomplete linesJunio C Hamano1-14/+19
The diff_symbol based output framework uses one DIFF_SYMBOL_* enum value per the kind of output lines of "git diff", which corresponds to one output line from the xdiff machinery used internally. Most notably, DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS that correspond to "+" and "-" lines are designed to always take a complete line, even if the output from xdiff machinery may produce "\ No newline at the end of file" immediately after them. But this is not true in the rewrite-diff codepath, which completely bypasses the xdiff machinery. Since the code path feeds the bytes directly from the payload to the output routines, the output layer has to deal with an incomplete line with DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS, which never would see an incomplete line in the normal code paths. This lack of final newline is compensated by an ugly hack for a fabricated DIFF_SYMBOL_NO_LF_EOF token to inject an extra newline to the output to simulate output coming from the xdiff machinery. Revamp the way the complete-rewrite code path feeds the lines to the output layer by treating the last line of the pre/post image when it is an incomplete line specially. This lets us remove the DIFF_SYMBOL_NO_LF_EOF hack and use the usual DIFF_SYMBOL_CONTEXT_INCOMPLETE code path, which will later learn how to handle whitespace errors. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: call emit_callback ecbdata everywhereJunio C Hamano1-6/+6
Everybody else, except for emit_rewrite_lines(), calls the emit_callback data ecbdata. Make sure we call the same thing by the same name for consistency. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: refactor output of incomplete lineJunio C Hamano1-2/+12
Create a helper function that reacts to "\ No newline at the end of file" in preparation for unifying the incomplete line handling in the code path that handles xdiff output and the code path that bypasses xdiff and produces complete rewrite patch. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: fix incorrect counting of line numbersJunio C Hamano1-1/+17
The "\ No newline at the end of the file" can come after any of the "-" (deleted preimage line), " " (unchanged line), or "+" (added postimage line). Incrementing only the preimage line number upon seeing it does not make any sense. We can keep track of what the previous line was, and increment lno_in_{pre,post}image variables properly, like this patch does. I do not think it matters, as these numbers are used only to compare them with blank_at_eof_in_{pre,post}image to issue the warning every time we see an added line, but by definition, after we see "\ No newline at the end of the file" for an added line, we will not see an added line for the file. Keeping track of what the last line was (in other words, "is it that the file used to end in an incomplete line? The file ends in an incomplete line after the change? Both the file before and after the change ends in an incomplete line that did not change?") will be independently useful. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: correct suppress_blank_empty hackJunio C Hamano1-16/+11
The suppress-blank-empty feature abused the CONTEXT_INCOMPLETE symbol that was meant to be used only for "\ No newline at the end of file" code path. The intent of the feature was to turn a context line we receive from xdiff machinery (which always uses ' ' for context lines, even an empty one) and spit it out as a truly empty line. Perform such a conversion very locally at where a line from xdiff that begins with ' ' is handled for output; there are many checks before the control reaches such place that checks the first letter of the diff output line to see if it is a context line, and having to check for '\n' and treat it as a special case is error prone. In order to catch similar hacks in the future, make sure the code path that is meant for "\ No newline" case checks the first byte is indeed a backslash. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05diff: emit_line_ws_markup() if/else style fixJunio C Hamano1-4/+4
Apply the simple rule: if you need {} in one arm of the if/else if/else... cascade, have {} in all of them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05whitespace: correct bit assignment commentsJunio C Hamano3-16/+22
A comment in diff.c claimed that bits up to 12th (counting from 0th) are whitespace rules, and 13th thru 15th are for new/old/context, but it turns out it was miscounting. Correct them, and clarify where the whitespace rule bits come from in the comment. Extend bit assignment comments to cover bits used for color-moved, which weren't described. Also update the way these bit constants are defined to use (1 << N) notation, instead of octal constants, as it tends to make it easier to notice a breakage like this. Sprinkle a few blank lines between logically distinct groups of CPP macro definitions to make them easier to read. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: add replay.refAction config optionSiddharth Asthana4-4/+79
Add a configuration variable to control the default behavior of git replay for updating references. This allows users who prefer the traditional pipeline output to set it once in their config instead of passing --ref-action=print with every command. The config variable uses string values that mirror the behavior modes: * replay.refAction = update (default): atomic ref updates * replay.refAction = print: output commands for pipeline Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Christian Couder <christian.couder@gmail.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: make atomic ref updates the default behaviorSiddharth Asthana3-40/+199
The git replay command currently outputs update commands that can be piped to update-ref to achieve a rebase, e.g. git replay --onto main topic1..topic2 | git update-ref --stdin This separation had advantages for three special cases: * it made testing easy (when state isn't modified from one step to the next, you don't need to make temporary branches or have undo commands, or try to track the changes) * it provided a natural can-it-rebase-cleanly (and what would it rebase to) capability without automatically updating refs, similar to a --dry-run * it provided a natural low-level tool for the suite of hash-object, mktree, commit-tree, mktag, merge-tree, and update-ref, allowing users to have another building block for experimentation and making new tools However, it should be noted that all three of these are somewhat special cases; users, whether on the client or server side, would almost certainly find it more ergonomic to simply have the updating of refs be the default. For server-side operations in particular, the pipeline architecture creates process coordination overhead. Server implementations that need to perform rebases atomically must maintain additional code to: 1. Spawn and manage a pipeline between git-replay and git-update-ref 2. Coordinate stdout/stderr streams across the pipe boundary 3. Handle partial failure states if the pipeline breaks mid-execution 4. Parse and validate the update-ref command output Change the default behavior to update refs directly, and atomically (at least to the extent supported by the refs backend in use). This eliminates the process coordination overhead for the common case. For users needing the traditional pipeline workflow, add a new --ref-action=<mode> option that preserves the original behavior: git replay --ref-action=print --onto main topic1..topic2 | git update-ref --stdin The mode can be: * update (default): Update refs directly using an atomic transaction * print: Output update-ref commands for pipeline use Test suite changes: All existing tests that expected command output now use --ref-action=print to preserve their original behavior. This keeps the tests valid while allowing them to verify that the pipeline workflow still works correctly. New tests were added to verify: - Default atomic behavior (no output, refs updated directly) - Bare repository support (server-side use case) - Equivalence between traditional pipeline and atomic updates - Real atomicity using a lock file to verify all-or-nothing guarantee - Test isolation using test_when_finished to clean up state - Reflog messages include replay mode and target A following commit will add a replay.refAction configuration option for users who prefer the traditional pipeline output as their default behavior. Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Christian Couder <christian.couder@gmail.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: use die_for_incompatible_opt2() for option validationSiddharth Asthana1-3/+3
In preparation for adding the --ref-action option, convert option validation to use die_for_incompatible_opt2(). This helper provides standardized error messages for mutually exclusive options. The following commit introduces --ref-action which will be incompatible with certain other options. Using die_for_incompatible_opt2() now means that commit can cleanly add its validation using the same pattern, keeping the validation logic consistent and maintainable. This also aligns git-replay's option handling with how other Git commands manage option conflicts, using the established die_for_incompatible_opt*() helper family. Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>