aboutsummaryrefslogtreecommitdiffstats
path: root/diffcore-rename.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-11-06meson: make GIT_HTML_PATH configurableD. Ben Knoble7-13/+20
Makefile-based builds can configure Git's internal HTML_PATH by defining htmldir, which is useful for packagers that put documentation in different locations. Gentoo, for example, uses version-suffixed directories like ${prefix}/share/doc/git-2.51 and puts the HTML documentation in an 'html' subdirectory of the same. Propagate the same configuration knob to Meson-based builds so that "git --html-path" on such systems can be configured to output the correct directory. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06perl: also mark git-contacts executableD. Ben Knoble1-1/+1
When installing git-contacts with Meson via -Dcontrib=contacts, the default Perl generation fails to mark it executable. As a result, "git contacts" reports "'contacts' is not a git command." Unlike generate-script.sh, we aren't testing the basename here; so, glob the script name in the case arm to match wherever the input comes from. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06wincred: align Makefile with other Makefiles in contribThomas Uhle1-8/+10
* Replace $(LOADLIBES) because it is deprecated since long and it is used nowhere else in the git project. * Use $(gitexecdir) instead of $(libexecdir) because config.mak defines $(libexecdir) as $(prefix)/libexec, not as $(prefix)/libexec/git-core. * Similar to other Makefiles, let install target rule create $(gitexecdir) to make sure the directory exists before copying the executable and also let it respect $(DESTDIR). * Shuffle the lines for the default settings to align them with the other Makefiles in contrib/credential. * Define .PHONY for all special targets (all, install, clean). Signed-off-by: Thomas Uhle <thomas.uhle@mailbox.tu-dresden.de> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05refs: add missing space in messagesPeter Krefting2-2/+2
Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05Git 2.52-rc1v2.52.0-rc1Junio C Hamano2-1/+8
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04GitHub CI: macos-13 images are no moreJunio C Hamano1-4/+4
As this image was deprecated on Sep 22nd, and will be dropped on Dec 4th, replace these jobs to use macos-14 images instead. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: remove unreachable codeJunio C Hamano1-2/+0
At this point in the code after running skip_prefix() on the variable and receiving the result in the same variable, the contents of the variable can never be NULL. The function either (1) updates the variable to point at a later part of the string it originally pointed at, or (2) leaves it intact if the string does not have the prefix. (1) will never make the variable NULL, and (2) cannot be the source of NULL, because the variable cannot be NULL before calling skip_prefix(), which would die immediately by dereferencing the NULL pointer in that case. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: restore const qualifier to parsed filenameD. Ben Knoble1-1/+1
This was unintentionally dropped in ccfcaf399f (parseopt: values of pathname type can be prefixed with :(optional), 2025-09-28). Notably, continue dropping the const qualifier when free'ing value; see 4049b9cfc0 (fix const issues with some functions, 2007-10-16) or 83838d5c1b (cast variable in call to free() in builtin/diff.c and submodule.c, 2011-11-06) for more details on why. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04config: use boolean type for a simple flagD. Ben Knoble1-1/+1
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: use boolean type for a simple flagD. Ben Knoble1-2/+2
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04doc: clarify command equivalence commentD. Ben Knoble1-1/+1
Documentation of command parsing for :(optional) includes a terse comment; expand it to be clearer to readers. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: fix :(optional) at command line to only ignore missing filesD. Ben Knoble1-1/+1
Unlike the configuration option magic, the parseopt code also ignores empty files: compare implementations from ccfcaf399f (parseopt: values of pathname type can be prefixed with :(optional), 2025-09-28) and 749d6d166d (config: values of pathname type can be prefixed with :(optional), 2025-09-28). Unify the 2 by not ignoring empty files, which is less surprising and the intended semantics from the first patch for config. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04A bit more before rc1Junio C Hamano1-0/+25
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04t/pack-refs-tests: move the 'test_done' to calleesKarthik Nayak3-2/+4
In ac0bad0af4 (t0601: refactor tests to be shareable, 2025-09-19), we refactored 't/t0601-reffiles-pack-refs.sh' to move all of the tests to 't/pack-refs-tests.sh', which became a common test suite which was also used by 't/t1463-refs-optimize.sh'. This also moved the 'test_done' directive to 't/pack-refs-tests.sh'. Which inhibits additional tests from being added to either of the tests. Let's move the directive out to both the tests, so that we can add additional specific tests to them. Also the test flow logic shouldn't be part of tests which can be embedded in other test scripts. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: rename 'pack_refs_opts' to 'refs_optimize_opts'Karthik Nayak8-30/+30
The previous commit removed all references to 'pack_refs()' within the refs subsystem. Continue this cleanup by also renaming 'pack_refs_opts' to 'refs_optimize_opts' and the respective flags accordingly. Keeping the naming consistent will make the code easier to maintain. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: move to using the '.optimize' functionsKarthik Nayak7-44/+12
The `struct ref_store` variable exposes two ways to optimize a reftable backend: 1. pack_refs 2. optimize The former was specific to the 'files' + 'packed' refs backend. The latter is more generic and covers all backends. While the naming is different, both of these functions perform the same functionality. Consolidate this code to only maintain the 'optimize' functions. Do this by modifying the backends so that they exclusively implement the `optimize` callback, only. All users of the refs subsystem already use the 'optimize' function so there is no changes needed on the callee side. Finally, cleanup all references to the 'pack_refs' field of the structure and code around it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04t7004: do not chdir around in the main processJunio C Hamano1-18/+20
Move down to no-contains subdirectory inside a subshell, just like the previous step that created and used it does. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: fix stale parsed objectsPatrick Steinhardt2-0/+22
In 054f5f457e (ref-filter: parse objects on demand, 2025-10-23) we have started to skip parsing some objects in case we don't need to access their values in the first place. This was done by introducing a new member `struct expand_data::maybe_object` that gets populated on demand via `get_or_parse_object()`. This has led to a regression though where the object now gets reused because we don't reset it properly. The `oi` structure is declared in global scope, and there is no single place where we reset it before invoking `get_object()`. The consequence is that the `maybe_object` member doesn't get reset across calls, so subsequent calls will end up reusing the same object. This is only an issue for a subset of retrieved values, as not all of the infrastructure ends up calling `get_or_parse_object()`. So the effect is limited, which is probably why the issue wasn't detected earlier. Fix the issue by resetting `maybe_object` in `get_object()`. Reported-by: Junio C Hamano <gitster@pobox.com> Based-on-patch-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: parse objects on demandPatrick Steinhardt1-36/+106
When formatting an arbitrary object we parse that object regardless of whether or not we actually need any parsed data. In fact, many of the atoms we have don't require any. Refactor the code so that we parse the data on demand when we see an atom that wants to access the objects. This leads to a small speedup, for example in the Chromium repository with around 40000 refs: Benchmark 1: for-each-ref --format='%(raw)' (HEAD~) Time (mean ± σ): 388.7 ms ± 1.1 ms [User: 322.2 ms, System: 65.0 ms] Range (min … max): 387.3 ms … 390.8 ms 10 runs Benchmark 2: for-each-ref --format='%(raw)' (HEAD) Time (mean ± σ): 344.7 ms ± 0.7 ms [User: 287.8 ms, System: 55.1 ms] Range (min … max): 343.9 ms … 345.7 ms 10 runs Summary for-each-ref --format='%(raw)' (HEAD) ran 1.13 ± 0.00 times faster than for-each-ref --format='%(raw)' (HEAD~) With this change, we now spend ~90% of the time decompressing objects, which is almost as good as it gets regarding git-for-each-ref(1)'s own infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: detect broken tags when dereferencing themPatrick Steinhardt2-2/+5
Users can ask git-for-each-ref(1) to peel tags and return information of the tagged object by adding an asterisk to the format, like for example "%(*$objectname)". If so, git-for-each-ref(1) peels that object to the first non-tag object and then returns its values. As mentioned in preceding commits, it can happen that the tagged object type and the claimed object type differ, effectively resulting in a corrupt tag. git-for-each-ref(1) would notice this mismatch, print an error and then bail out when trying to peel the tag. But we only notice this corruption in some very specific edge cases! While we have a test in "t/for-each-ref-tests.sh" that verifies the above scenario, this test is specifically crafted to detect the issue at hand. Namely, we create two tags: - One tag points to a specific object with the correct type. - The other tag points to the *same* object with a different type. The fact that both tags point to the same object is important here: `peel_object()` wouldn't notice the corruption if the tagged objects were different. The root cause is that `peel_object()` calls `lookup_${type}()` eventually, where the type is the same type declared in the tag object. Consequently, when we have two tags pointing to the same object but with different declared types we'll call two different lookup functions. The first lookup will store the object with an unverified type A, whereas the second lookup will try to look up the object with a different unverified type B. And it is only now that we notice the discrepancy in object types, even though type A could've already been the wrong type. Fix the issue by verifying the object type in `populate_value()`. With this change we'll also notice type mismatches when only dereferencing a tag once. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: don't store peeled object IDs for invalid tagsPatrick Steinhardt4-2/+63
Both the "files" and "reftable" backend store peeled object IDs for references that point to tags: - The "files" backend stores the value when packing refs, where each peeled object ID is prefixed with "^". - The "reftable" backend stores the value whenever writing a new reference that points to a tag via a special ref record type. Both of these backends use `peel_object()` to find the peeled object ID. But as explained in the preceding commit, that function does not detect the case where the tag's tagged object and its claimed type mismatch. The consequence of storing these bogus peeled object IDs is that we're less likely to detect such corruption in other parts of Git. git-for-each-ref(1) for example does not notice anymore that the tag is broken when using "--format=%(*objectname)" to dereference tags. One could claim that this is good, because it still allows us to mostly use the tag as intended. But the biggest problem here is that we now have different behaviour for such a broken tag depending on whether or not we have its peeled value in the refdb. Fix the issue by verifying the object type when peeling the object. If that verification fails we simply skip storing the peeled value in either of the reference formats. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04object: add flag to `peel_object()` to verify object typePatrick Steinhardt9-25/+38
When peeling a tag to a non-tag object we repeatedly call `parse_object()` on the tagged object until we find the first object that isn't a tag. While this feels sensible at first, there is a big catch here: `parse_object()` doesn't actually verify the type of the tagged object. The relevant code path here eventually ends up in `parse_tag_buffer()`. Here, we parse the various fields of the tag, including the "type". Once we've figured out the type and the tagged object ID, we call one of the `lookup_${type}()` functions for whatever type we have found. There is two possible outcomes in the successful case: 1. The object is already part of our cached objects. In that case we double-check whether the type we're trying to look up matches the type that was cached. 2. The object is _not_ part of our cached objects. In that case, we simply create a new object with the expected type, but we don't parse that object. In the first case we might notice type mismatches, but only in the case where our cache has the object with the correct type. In the second case, we'll blindly assume that the type is correct and then go with it. We'll only notice that the type might be wrong when we try to parse the object at a later point. Now arguably, we could change `parse_tag_buffer()` to verify the tagged object's type for us. But that would have the effect that such a tag cannot be parsed at all anymore, and we have a small bunch of tests for exactly this case that assert we still can open such tags. So this change does not feel like something we can retroactively tighten, even though one shouldn't ever hit such corrupted tags. Instead, add a new `flags` field to `peel_object()` that allows the caller to opt in to strict object verification. This will be wired up at a subset of callsites over the next few commits. Note that this change also inlines `deref_tag_noverify()`. There's only been two callsites of that function, the one we're changing and one in our test helpers. The latter callsite can trivially use `deref_tag()` instead, so by inlining the function we avoid having to pass down the flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: drop infrastructure to peel via iteratorsPatrick Steinhardt8-141/+1
Now that the peeled object ID gets propagated via the `struct reference` there is no need anymore to call into the reference iterator itself to dereference an object. Remove this infrastructure. Most of the changes are straight-forward deletions of code. There is one exception though in `refs/packed-backend.c::write_with_updates()`. Here we stop peeling the iterator and instead just pass the peeled object ID of that iterator directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: drop `current_ref_iter` hackPatrick Steinhardt3-28/+0
In preceding commits we have refactored all callers of `peel_iterated_oid()` to instead use `reference_get_peeled_oid()`. This allows us to thus get rid of the former function. Getting rid of that function is nice, but even nicer is that this also allows us to get rid of the `current_ref_iter` hack. This global variable tracked the currently-active ref iterator so that we can use it to peel an object ID. Now that the peeled object ID is propagated via `struct reference` though we don't have to depend on this hack anymore, which makes for a more robust and easier-to-understand infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04builtin/show-ref: convert to use `reference_get_peeled_oid()`Patrick Steinhardt1-13/+19
The git-show-ref(1) command has multiple different modes: - It knows to show all references matching a pattern. - It knows to list all references that are an exact match to whatever the user has provided. - It knows to check for reference existence. The first two commands use mostly the same infrastructure to print the references via `show_one()`. But while the former mode uses a proper iterator and thus has a `struct reference` available in its context, the latter calls `refs_read_ref()` and thus doesn't. Consequently, we cannot easily use `reference_get_peeled_oid()` to print the peeled value. Adapt the code so that we manually construct a `struct reference` when verifying refs. We wouldn't ever have the peeled value available anyway as we're not using an iterator here, so we can simply plug in the values we _do_ have. With this change we now have a `struct reference` available at both callsites of `show_one()` and can thus pass it, which allows us to use `reference_get_peeled_oid()` instead of `peel_iterated_oid()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: propagate peeled object IDPatrick Steinhardt5-32/+45
When queueing a reference in the "ref-filter" subsystem we end up creating a new ref array item that contains the reference's info. One bit of info that we always discard though is the peeled object ID, and because of that we are forced to use `peel_iterated_oid()`. Refactor the code to propagate the peeled object ID via the ref array, if available. This allows us to manually peel tags without having to go through the object database. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04upload-pack: convert to use `reference_get_peeled_oid()`Patrick Steinhardt1-9/+13
The `write_v0_ref()` callback is invoked from two callsites: - Once via `send_ref()` which is a callback passed to `for_each_namespaced_ref_1()` and `refs_head_ref_namespaced()`. - Once manually to announce capabilities. When sending references to the client we also send the peeled value of tags. As we don't have a `struct reference` available in the second case, we cannot easily peel by calling `reference_get_peeled_oid()`, but we instead have to depend on on global state via `peel_iterated_oid()`. We do have a reference available though in the first case, it's only the second case that keeps us from using `reference_get_peeled_oid()`. But that second case only announces capabilities anyway, so we're not really handling a reference at all here. Adapt that case to construct a reference manually and pass that to `write_v0_ref()`. Start to use `reference_get_peeled_oid()` now that we always have a `struct reference` available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: expose peeled object ID via the iteratorPatrick Steinhardt12-10/+48
Both the "files" and "reftable" backend are able to store peeled values for tags in the respective formats. This allows for a more efficient lookup of the target object of such a tag without having to manually peel via the object database. The infrastructure to access these peeled object IDs is somewhat funky though. When iterating through objects, we store a pointer reference to the current iterator in a global variable. The callbacks invoked by that iterator are then expected to call `peel_iterated_oid()`, which checks whether the globally-stored iterator's current reference refers to the one handed into that function. If so, we ask the iterator to peel the object, otherwise we manually peel the object via the object database. Depending on global state like this is somewhat weird and also quite fragile. Introduce a new `struct reference::peeled_oid` field that can be populated by the reference backends. This field can be accessed via a new function `reference_get_peeled_oid()` that either uses that value, if set, or alternatively peels via the ODB. With this change we don't have to rely on global state anymore, but make the peeled object ID available to the callback functions directly. Adjust trivial callers that already have a `struct reference` available. Remaining callers will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: refactor reference status flagsPatrick Steinhardt1-20/+21
The reference flags encode information like whether or not a reference is a symbolic reference or whether it may be broken. This information is stored in a `int flags` bitfield, which is in conflict with our modern best practices; we tend to use an unsigned integer to store flags. Change the type of the field to be `unsigned`. While at it, refactor the individual flags to be part of an `enum` instead of using preprocessor defines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: fully reset `struct ref_iterator::ref` on iterationPatrick Steinhardt3-1/+4
With the introduction of the `struct ref_iterator::ref` field it now is a whole lot easier to introduce new fields that become accessible to the caller without having to adapt every single callsite. But there's a downside: when a new field is introduced we always have to adapt all backends to set that field. This isn't something we can avoid in the general case: when the new field is expected to be populated by all backends we of course cannot avoid doing so. But new fields may be entirely optional, in which case we'd still have such churn. And furthermore, it is very easy right now to leak state from a previous iteration into the next iteration. Address this issue by ensuring that the reference backends all fully reset the field on every single iteration. This ensures that no state from previous iterations can leak into the next one. And it ensures that any newly introduced fields will be zeroed out by default. Note that we don't have to explicitly adapt the "files" backend, as it uses the `cache_ref_iterator` internally. Furthermore, other "wrapping" iterators like for example the `prefix_ref_iterator` copy around the whole reference, so these don't need to be adapted either. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce `.ref` field for the base iteratorPatrick Steinhardt8-100/+75
The base iterator has a couple of fields that tracks the name, target, object ID and flags for the current reference. Due to this design we have to create a new `struct reference` whenever we want to hand over that reference to the callback function, which is tedious and not very efficient. Convert the structure to instead contain a `struct reference` as member. This member is expected to be populated by the implementations of the iterator and is handed over to the callback directly. While at it, simplify `should_pack_ref()` to take a `struct reference` directly instead of passing its respective fields. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt49-462/+392
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04Revert "Merge branch 'ps/ref-peeled-tags' into next"Junio C Hamano66-783/+834
This reverts commit 8ac48a10de61267858d66383c34833e55a5e9d02, reversing changes made to 9ab444edfb825dfbc555b3d7916df03071db94c3.
2025-11-04Revert "Merge branch 'kn/refs-optim-cleanup' into next"Junio C Hamano11-42/+72
This reverts commit 8c2d7a4413436c0f91af0c0ab978cba8af905696, reversing changes made to 8ac48a10de61267858d66383c34833e55a5e9d02.
2025-11-03last-modified: implement faster algorithmToon Claes3-16/+237
The current implementation of git-last-modified(1) works by doing a revision walk, and inspecting the diff at each level of that walk to annotate entries remaining in the hashmap of paths. In other words, if the diff at some level touches a path which has not yet been associated with a commit, then that commit becomes associated with the path. While a perfectly reasonable implementation, it can perform poorly in either one of two scenarios: 1. There are many entries of interest, in which case there is simply a lot of work to do. 2. Or, there are (even a few) entries which have not been updated in a long time, and so we must walk through a lot of history in order to find a commit that touches that path. This patch rewrites the last-modified implementation that addresses the second point. The idea behind the algorithm is to propagate a set of 'active' paths (a path is 'active' if it does not yet belong to a commit) up to parents and do a truncated revision walk. The walk is truncated because it does not produce a revision for every change in the original pathspec, but rather only for active paths. More specifically, consider a priority queue of commits sorted by generation number. First, enqueue the set of boundary commits with all paths in the original spec marked as interesting. Then, while the queue is not empty, do the following: 1. Pop an element, say, 'c', off of the queue, making sure that 'c' isn't reachable by anything in the '--not' set. 2. For each parent 'p' (with index 'parent_i') of 'c', do the following: a. Compute the diff between 'c' and 'p'. b. Pass any active paths that are TREESAME from 'c' to 'p'. c. If 'p' has any active paths, push it onto the queue. 3. Any path that remains active on 'c' is associated to that commit. This ends up being equivalent to doing something like 'git log -1 -- $path' for each path simultaneously. But, it allows us to go much faster than the original implementation by limiting the number of diffs we compute, since we can avoid parts of history that would have been considered by the revision walk in the original implementation, but are known to be uninteresting to us because we have already marked all paths in that area to be inactive. To avoid computing many first-parent diffs, add another trick on top of this and check if all paths active in 'c' are DEFINITELY NOT in c's Bloom filter. Since the commit-graph only stores first-parent diffs in the Bloom filters, we can only apply this trick to first-parent diffs. Comparing the performance of this new algorithm shows about a 2.5x improvement on git.git: Benchmark 1: master no bloom Time (mean ± σ): 2.868 s ± 0.023 s [User: 2.811 s, System: 0.051 s] Range (min … max): 2.847 s … 2.926 s 10 runs Benchmark 2: master with bloom Time (mean ± σ): 949.9 ms ± 15.2 ms [User: 907.6 ms, System: 39.5 ms] Range (min … max): 933.3 ms … 971.2 ms 10 runs Benchmark 3: HEAD no bloom Time (mean ± σ): 782.0 ms ± 6.3 ms [User: 740.7 ms, System: 39.2 ms] Range (min … max): 776.4 ms … 798.2 ms 10 runs Benchmark 4: HEAD with bloom Time (mean ± σ): 307.1 ms ± 1.7 ms [User: 276.4 ms, System: 29.9 ms] Range (min … max): 303.7 ms … 309.5 ms 10 runs Summary HEAD with bloom ran 2.55 ± 0.02 times faster than HEAD no bloom 3.09 ± 0.05 times faster than master with bloom 9.34 ± 0.09 times faster than master no bloom In short, the existing implementation is comparably fast *with* Bloom filters as the new implementation is *without* Bloom filters. So, most repositories should get a dramatic speed-up by just deploying this (even without computing Bloom filters), and all repositories should get faster still when computing Bloom filters. When comparing a more extreme example of `git last-modified -- COPYING t`, the difference is even 5 times better: Benchmark 1: master Time (mean ± σ): 4.372 s ± 0.057 s [User: 4.286 s, System: 0.062 s] Range (min … max): 4.308 s … 4.509 s 10 runs Benchmark 2: HEAD Time (mean ± σ): 826.3 ms ± 22.3 ms [User: 784.1 ms, System: 39.2 ms] Range (min … max): 810.6 ms … 881.2 ms 10 runs Summary HEAD ran 5.29 ± 0.16 times faster than master As an added benefit, results are more consistent now. For example implementation in 'master' gives: $ git log --max-count=1 --format=%H -- pkt-line.h 15df15fe07ef66b51302bb77e393f3c5502629de $ git last-modified -- pkt-line.h 15df15fe07ef66b51302bb77e393f3c5502629de pkt-line.h $ git last-modified | grep pkt-line.h 5b49c1af03e600c286f63d9d9c9fb01403230b9f pkt-line.h With the changes in this patch the results of git-last-modified(1) always match those of `git log --max-count=1`. One thing to note though, the results might be outputted in a different order than before. This is not considerd to be an issue because nowhere is documented the order is guaranteed. Based-on-patches-by: Derrick Stolee <stolee@gmail.com> Based-on-patches-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Taylor Blau <me@ttaylorr.com> [jc: tweaked use of xcalloc() to unbreak coccicheck] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03Git 2.52-rc0v2.52.0-rc0Junio C Hamano2-1/+17
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30The 27th batchJunio C Hamano1-0/+23
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: track packs via the MRU list exclusivelyPatrick Steinhardt3-32/+26
We track packfiles via two different lists: - `struct packfile_store::packs` is a list that sorts local packs first. In addition, these packs are sorted so that younger packs are sorted towards the front. - `struct packfile_store::mru` is a list that sorts packs so that most-recently used packs are at the front. The reasoning behind the ordering in the `packs` list is that younger objects stored in the local object store tend to be accessed more frequently, and that is certainly true for some cases. But there are going to be lots of cases where that isn't true. Especially when traversing history it is likely that one needs to access many older objects, and due to our housekeeping it is very likely that almost all of those older objects will be contained in one large pack that is oldest. So whether or not the ordering makes sense really depends on the use case at hand. A flexible approach like our MRU list addresses that need, as it will sort packs towards the front that are accessed all the time. Intuitively, this approach is thus able to satisfy more use cases more efficiently. This reasoning casts some doubt on whether or not it really makes sense to track packs via two different lists. It causes confusion, and it is not clear whether there are use cases where the `packs` list really is such an obvious choice. Merge these two lists into one most-recently-used list. Note that there is one important edge case: `for_each_packed_object()` uses the MRU list to iterate through packs, and then it lists each object in those packs. This would have the effect that we now sort the current pack towards the front, thus modifying the list of packfiles we are iterating over, with the consequence that we'll see an infinite loop. This edge case is worked around by introducing a new field that allows us to skip updating the MRU. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: always add packfiles to MRU when adding a packPatrick Steinhardt2-2/+1
When preparing the packfile store we know to also prepare the MRU list of packfiles with all packs that are currently loaded in the store via `packfile_store_prepare_mru()`. So we know that the list of packs in the MRU list should match the list of packs in the non-MRU list. But there are some direct or indirect callsites that add a packfile to the store via `packfile_store_add_pack()` without adding the pack to the MRU. And while functions that access the MRU (e.g. `find_pack_entry()`) know to call `packfile_store_prepare()`, which knows to prepare the MRU via `packfile_store_prepare_mru()`, that operation will be turned into a no-op because the packfile store is already prepared. So this will not cause us to add the packfile to the MRU, and consequently we won't be able to find the packfile in our MRU list. There are only a handful of callers outside of "packfile.c" that add a packfile to the store: - "builtin/fast-import.c" adds multiple packs of imported objects, but it knows to look up objects via `packfile_store_get_packs()`. This function does not use the MRU, so we're good. - "builtin/index-pack.c" adds the indexed pack to the store in case it needs to perform consistency checks on its objects. - "http.c" adds the fetched pack to the store so that we can access its objects. In all of these cases we actually want to access the contained objects. And luckily, reading these objects works as expected: 1. We eventually end up in `do_oid_object_info_extended()`. 2. Calling `find_pack_entry()` fails because the MRU list doesn't contain the newly added packfile. 3. The callers don't pass `OBJECT_INFO_QUICK`, so we end up repreparing the object database. This will also cause us to reprepare the MRU list. 4. We now retry reading the object via `find_pack_entry()`, and now we succeed because the MRU list got populated. This logic feels quite fragile: we intentionally add the packfile to the store, but we then ultimately rely on repreparing the entire store only to make the packfile accessible. While we do the correct thing in `do_oid_object_info_extended()`, other sites that access the MRU may not know to reprepare. But besides being fragile it's also a waste of resources: repreparing the object database requires us to re-read the alternates file and discard any caches. Refactor the code so that we unconditionally add packfiles to the MRU when adding them to a packfile store. This makes the logic less fragile and ensures that we don't have to reprepare the store to make the pack accessible. Note that this does not allow us to drop `packfile_store_prepare_mru()` just yet: while the MRU list is already populated with all packs now, the order in which we add these packs is indeterministic for most of the part. So by first calling `sort_pack()` on the other packfile list and then re-preparing the MRU list we inherit its sorting. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: move list of packs into the packfile storePatrick Steinhardt3-60/+43
Move the list of packs into the packfile store. This follows the same logic as in a previous commit, where we moved the most-recently-used list of packs, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30builtin/pack-objects: simplify logic to find kept or nonlocal objectsPatrick Steinhardt1-14/+14
The function `has_sha1_pack_kept_or_nonlocal()` takes an object ID and then searches through packed objects to figure out whether the object exists in a kept or non-local pack. As a performance optimization we remember the packfile that contains a given object ID so that the next call to the function first checks that same packfile again. The way this is written is rather hard to follow though, as the caching mechanism is intertwined with the loop that iterates through the packs. Consequently, we need to do some gymnastics to re-start the iteration if the cached pack does not contain the objects. Refactor this so that we check the cached packfile at the beginning. We don't have to re-verify whether the packfile meets the properties as we have already verified those when storing the pack in `last_found` in the first place. So all we need to do is to use `find_pack_entry_one()` to check whether the pack contains the object ID, and to skip the cached pack in the loop so that we don't search it twice. Furthermore, stop using the `(void *)1` sentinel value and instead use a simple `NULL` pointer to indicate that we don't have a last-found pack yet. This refactoring significantly simplifies the logic and makes it much easier to follow. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: fix approximation of object countsPatrick Steinhardt1-4/+4
When approximating the number of objects in a repository we only take into account two data sources, the multi-pack index and the packfile indices, as both of these data structures allow us to easily figure out how many objects they contain. But the way we currently approximate the number of objects is broken in presence of a multi-pack index. This is due to two separate reasons: - We have recently introduced initial infrastructure for incremental multi-pack indices. Starting with that series, `num_objects` only counts the number of objects of a specific layer of the MIDX chain, so we do not take into account objects from parent layers. This issue is fixed by adding `num_objects_in_base`, which contains the sum of all objects in previous layers. - When using the multi-pack index we may count objects contained in packfiles twice: once via the multi-pack index, but then we again count them via the packfile itself. This issue is fixed by skipping any packfiles that have an MIDX. Overall, given that we _always_ count the packs, we can only end up overestimating the number of objects, and the overestimation is limited to a factor of two at most. The consequences of those issues are very limited though, as we only approximate object counts in a small number of cases: - When writing a commit-graph we use the approximate object count to display the upper limit of a progress display. - In `repo_find_unique_abbrev_r()` we use it to specify a lower limit of how many hex digits we want to abbreviate to. Given that we use power-of-two here to derive the lower limit we may end up with an abbreviated hash that is one digit longer than required. - In `estimate_repack_memory()` we may end up overestimating how much memory a repack needs to pack objects. Conseuqently, we may end up dropping some packfiles from a repack. None of these are really game-changing. But it's nice to fix those issues regardless. While at it, convert the code to use `repo_for_each_pack()`. Furthermore, use `odb_prepare_alternates()` instead of explicitly preparing the packfile store. We really only want to prepare the object database sources, and `get_multi_pack_index()` already knows to prepare the packfile store for us. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30http: refactor subsystem to use `packfile_list`sPatrick Steinhardt6-35/+40
The dumb HTTP protocol directly fetches packfiles from the remote server and temporarily stores them in a list of packfiles. Those packfiles are not yet added to the repository's packfile store until we finalize the whole fetch. Refactor the code to instead use a `struct packfile_list` to store those packs. This prepares us for a subsequent change where the `->next` pointer of `struct packed_git` will go away. Note that this refactoring creates some temporary duplication of code, as we now have both `packfile_list_find_oid()` and `find_oid_pack()`. The latter function will be removed in a subsequent commit though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: move the MRU list into the packfile storePatrick Steinhardt4-18/+104
Packfiles have two lists associated to them: - A list that keeps track of packfiles in the order that they were added to a packfile store. - A list that keeps track of packfiles in most-recently-used order so that packfiles that are more likely to contain a specific object are ordered towards the front. Both of these lists are hosted by `struct packed_git` itself, So to identify all packfiles in a repository you simply need to grab the first packfile and then iterate the `->next` pointers or the MRU list. This pattern has the problem that all packfiles are part of the same list, regardless of whether or not they belong to the same object source. With the upcoming pluggable object database effort this needs to change: packfiles should be contained by a single object source, and reading an object from any such packfile should use that source to look up the object. Consequently, we need to break up the global lists of packfiles into per-object-source lists. A first step towards this goal is to move those lists out of `struct packed_git` and into the packfile store. While the packfile store is currently sitting on the `struct object_database` level, the intent is to push it down one level into the `struct odb_source` in a subsequent patch series. Introduce a new `struct packfile_list` that is used to manage lists of packfiles and use it to store the list of most-recently-used packfiles in `struct packfile_store`. For now, the new list type is only used in a single spot, but we'll expand its usage in subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: use a `strmap` to store packs by namePatrick Steinhardt2-22/+6
To allow fast lookups of a packfile by name we use a hashmap that has the packfile name as key and the pack itself as value. But while this is the perfect use case for a `strmap`, we instead use `struct hashmap` and store the hashmap entry in the packfile itself. Simplify the code by using a `strmap` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30gpg-interface: mark a string for translationChristian Couder1-1/+1
Previous commits have marked a number of error or warning messages in "builtin/fast-export.c" and "builtin/fast-import.c" for translation. As "gpg-interface.c" code is used by the fast-export and fast-import code, we should make sure that error or warning messages are also all marked for translation in "gpg-interface.c". To ensure that, let's mark for translation an error message in a die() function. With this, all the error and warning messages emitted by fast-export and fast-import can be properly translated. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30fast-import: mark strings for translationChristian Couder2-150/+150
Some error or warning messages in "builtin/fast-import.c" are marked for translation, but many are not. To be more consistent and provide a better experience to people using a translated version, let's mark all the remaining error or warning messages for translation. While at it, let's make the following small changes: - replace "GIT" or "git" in a few error messages to just "Git", - replace "Expected from command, got %s" to "expected 'from' command, got '%s'", which makes it clearer that "from" is a command and should not be translated, - downcase error and warning messages that start with an uppercase, - fix test cases in "t9300-fast-import.sh" that broke because an error or warning message was downcased, - split error and warning messages that are too long, - adjust the indentation of some arguments of the error functions. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30fast-export: mark strings for translationChristian Couder1-39/+40
Some error or warning messages in "builtin/fast-export.c" are marked for translation, but many are not. To be more consistent and provide a better experience to people using a translated version, let's mark all the remaining error or warning messages for translation. While at it: - improve how some arguments to some error functions are indented, - remove "Error:" at the start of an error message, - downcase error and warning messages that start with an uppercase. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30gpg-interface: use left shift to define GPG_VERIFY_*Christian Couder1-3/+3
In "gpg-interface.h", the definitions of the GPG_VERIFY_* boolean flags are currently using 1, 2 and 4 while we often prefer the bitwise left shift operator, `<<`, for that purpose to make it clearer that they are boolean. Let's use the left shift operator here too. Let's also fix an indent issue with "4" while at it. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30gpg-interface: simplify ssh fingerprint parsingChristian Couder1-1/+1
In "gpg-interface.c", the 'parse_ssh_output()' function takes a 'struct signature_check *sigc' argument and populates many members of this 'sigc' using information parsed from 'sigc->output' which contains the ouput of an `ssh-keygen -Y ...` command that was used to verify an SSH signature. When it populates 'sigc->fingerprint' though, it uses `xstrdup(strstr(line, "key ") + 4)` while `strstr(line, "key ")` has already been computed a few lines above and is already available in the `key` variable. Let's simplify this. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>