git/object-name.c, branch v2.46.2

Merge branch 'ps/leakfixes-more'

2024-07-08T21:53:10Z

More memory leaks have been plugged. * ps/leakfixes-more: (29 commits) builtin/blame: fix leaking ignore revs files builtin/blame: fix leaking prefixed paths blame: fix leaking data for blame scoreboards line-range: plug leaking find functions merge: fix leaking merge bases builtin/merge: fix leaking `struct cmdnames` in `get_strategy()` sequencer: fix memory leaks in `make_script_with_merges()` builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()` apply: fix leaking string in `match_fragment()` sequencer: fix leaking string buffer in `commit_staged_changes()` commit: fix leaking parents when calling `commit_tree_extended()` config: fix leaking "core.notesref" variable rerere: fix various trivial leaks builtin/stash: fix leak in `show_stash()` revision: free diff options builtin/log: fix leaking commit list in git-cherry(1) merge-recursive: fix memory leak when finalizing merge builtin/merge-recursive: fix leaking object ID bases builtin/difftool: plug memory leaks in `run_dir_diff()` object-name: free leaking object contexts ...

Merge branch 'ps/use-the-repository'

2024-07-02T16:59:00Z

A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions

global: introduce `USE_THE_REPOSITORY_VARIABLE` macro

2024-06-14T17:26:33Z

Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

object-name: don't try to abbreviate to lengths greater than hexsz

2024-06-12T19:57:18Z

When given a length that equals the current hash algorithm's hex size, then `repo_find_unique_abbrev_r()` exits early without trying to find an abbreviation. This is only sensible because there is nothing to abbreviate in the first place, so searching through objects to find a unique prefix would be a waste of compute. What we don't handle though is the case where the user passes a length greater than the hash length. This is fine in practice as we still compute the correct result. But at the very least, this is a waste of resources as we try to abbreviate a value that cannot be abbreviated, which causes us to hit the object database. Start to explicitly handle values larger than hexsz to avoid this performance penalty, which leads to a measureable speedup. The following benchmark has been executed in linux.git: Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~) Time (mean ± σ): 12.812 s ± 0.040 s [User: 12.225 s, System: 0.554 s] Range (min … max): 12.723 s … 12.857 s 10 runs Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD) Time (mean ± σ): 11.095 s ± 0.029 s [User: 10.546 s, System: 0.521 s] Range (min … max): 11.037 s … 11.122 s 10 runs Summary git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran 1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~) Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

object-name: free leaking object contexts

2024-06-11T20:15:05Z

While it is documented in `struct object_context::path` that this variable needs to be released by the caller, this fact is rather easy to miss given that we do not ever provide a function to release the object context. And of course, while some callers dutifully release the path, many others don't. Introduce a new `object_context_release()` function that releases the path. Convert callsites that used to free the path to use that new function and add missing calls to callsites that were leaking memory. Refactor those callsites as required to have a single return path, only. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

Merge branch 'eb/hash-transition'

2024-03-28T21:13:50Z

Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...

Merge branch 'js/merge-base-with-missing-commit'

2024-03-11T21:12:30Z

Make sure failure return from merge_bases_many() is properly caught. * js/merge-base-with-missing-commit: merge-ort/merge-recursive: do report errors in `merge_submodule()` merge-recursive: prepare for `merge_submodule()` to report errors commit-reach(repo_get_merge_bases_many_dirty): pass on errors commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors commit-reach(get_octopus_merge_bases): pass on "missing commits" errors commit-reach(repo_get_merge_bases): pass on "missing commits" errors commit-reach(get_merge_bases_many_0): pass on "missing commits" errors commit-reach(merge_bases_many): pass on "missing commits" errors commit-reach(paint_down_to_common): start reporting errors commit-reach(paint_down_to_common): prepare for handling shallow commits commit-reach(repo_in_merge_bases_many): report missing commits commit-reach(repo_in_merge_bases_many): optionally expect missing commits commit-reach(paint_down_to_common): plug two memory leaks

commit-reach(repo_get_merge_bases): pass on "missing commits" errors

2024-02-29T16:06:01Z

The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `repo_get_merge_bases()` macro) is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next step: adjust the callers of `get_octopus_merge_bases()`. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano

get_oid_basic(): special-case ref@{n} for oldest reflog entry

2024-02-26T18:05:32Z

The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog, 2021-01-07) was that if we have "n" entries in a reflog, we should still be able to resolve ref@{n} by looking at the "old" value of the oldest entry. Commit 6436a20284 tried to put the logic into read_ref_at() by shifting its idea of "n" by one. But we reverted that in the previous commit, since it led to bugs in other callers which cared about the details of the reflog entry we found. Instead, let's put the special case into the caller that resolves @{n}, as it cares only about the oid. read_ref_at() is even kind enough to return the "old" value from the final reflog; it just returns "1" to signal to us that we ran off the end of the reflog. But we can notice in the caller that we read just enough records for that "old" value to be the one we're looking for, and use it. Note that read_ref_at() could notice this case, too, and just return 0. But we don't want to do that, because the caller must be made aware that we only found the oid, not an actual reflog entry (and the call sites in show-branch do care about this). There is one complication, though. When read_ref_at() hits a truncated reflog, it will return the "old" value of the oldest entry only if it is not the null oid. Otherwise, it actually returns the "new" value from that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a ref created recently will produce the initial value as a convenience (even though technically it did not exist 20 years ago). But this convenience is only useful for time-based cutoffs. For count-based cutoffs, get_oid_basic() has always simply complained about going too far back: $ git rev-parse HEAD@{20} fatal: log for 'HEAD' only has 16 entries and we should continue to do so, rather than returning a nonsense value (there's even a test in t1508 already which covers this). So let's have the d1a4489a56 code kick in only when doing timestamp-based cutoffs. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

treewide: remove unnecessary includes in source files

2023-12-26T20:04:31Z

Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano