aboutsummaryrefslogtreecommitdiffstats
path: root/refs/packed-backend.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano1-1/+1
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-07-23config: drop `git_config_get_int()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15refs: selectively set prefix in the seek functionsKarthik Nayak1-6/+11
The ref iterator exposes a `ref_iterator_seek()` function. The name suggests that this would seek the iterator to a specific reference in some ways similar to how `fseek()` works for the filesystem. However, the function actually sets the prefix for refs iteration. So further iteration would only yield references which match the particular prefix. This is a bit confusing. Let's add a 'flags' field to the function, which when set with the 'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the iteration in-line with the existing behavior. Otherwise, the reference backends will simply seek to the specified reference and clears any previously set prefix. This allows users to start iteration from a specific reference. In the packed and reftable backend, since references are available in a sorted list, the changes are simply setting the prefix if needed. The changes on the files-backend are a little more involved, since the files backend uses the 'ref-cache' mechanism. We move out the existing logic within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()` which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We then parse the provided seek string and set the required levels and their indexes to ensure that seeking is possible. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: mmap large "packed-refs" file during fsckshejialuo1-12/+7
During fsck, we use "strbuf_read" to read the content of "packed-refs" without using mmap mechanism. This is a bad practice which would consume more memory than using mmap mechanism. Besides, as all code paths in "packed-backend.c" use this way, we should make "fsck" align with the current codebase. As we have introduced the helper function "allocate_snapshot_buffer", we can simply use this function to use mmap mechanism. Suggested-by: Jeff King <peff@peff.net> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: extract snapshot allocation in `load_contents`shejialuo1-22/+31
"load_contents" would choose which way to load the content of the "packed-refs". However, we cannot directly use this function when checking the consistency due to we don't want to open the file. And we also need to reuse the logic to avoid causing repetition. Let's create a new helper function "allocate_snapshot_buffer" to extract the snapshot allocation logic in "load_contents" and update the "load_contents" to align with the behavior. Suggested-by: Jeff King <peff@peff.net> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: fsck should warn when "packed-refs" file is emptyshejialuo1-0/+9
We assume the "packed-refs" won't be empty and instead has at least one line in it (even when there are no refs packed, there is the file header line). Because there is no terminating LF in the empty file, we will report "packedRefEntryNotTerminated(ERROR)" to the user. However, the runtime code paths would accept an empty "packed-refs" file, for example, "create_snapshot" would simply return the "snapshot" without checking the content of "packed-refs". So, we should skip checking the content of "packed-refs" when it is empty during fsck. After 694b7a1999 (repack_without_ref(): write peeled refs in the rewritten file, 2013-04-22), we would always write a header into the "packed-refs" file. So, versions of Git that are not too ancient never write such an empty "packed-refs" file. As an empty file often indicates a sign of a filesystem-level issue, the way we want to resolve this inconsistency is not make everybody totally silent but notice and report the anomaly. Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report to the users that "packed-refs" is empty. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17Merge branch 'ps/refname-avail-check-optim'Junio C Hamano1-2/+2
Incorrect sorting of refs with bytes with high-bit set on platforms with signed char led to a BUG, which has been corrected. * ps/refname-avail-check-optim: refs/packed: fix BUG when seeking refs with UTF-8 characters
2025-04-16Merge branch 'kn/non-transactional-batch-updates'Junio C Hamano1-32/+37
Updating multiple references have only been possible in all-or-none fashion with transactions, but it can be more efficient to batch multiple updates even when some of them are allowed to fail in a best-effort manner. A new "best effort batches of updates" mode has been introduced. * kn/non-transactional-batch-updates: update-ref: add --batch-updates flag for stdin mode refs: support rejection in batch updates during F/D checks refs: implement batch reference update support refs: introduce enum-based transaction error types refs/reftable: extract code from the transaction preparation refs/files: remove duplicate duplicates check refs: move duplicate refname update check to generic layer refs/files: remove redundant check in split_symref_update()
2025-04-09refs/packed: fix BUG when seeking refs with UTF-8 charactersPatrick Steinhardt1-2/+2
It was reported that using git-pull(1) in a repository whose remote contains branches with emojis leads to the following bug: $ git pull remote: Enumerating objects: 161255, done. remote: Counting objects: 100% (55884/55884), done. remote: Compressing objects: 100% (5518/5518), done. remote: Total 161255 (delta 54253), reused 50509 (delta 50364), pack-reused 105371 (from 4) Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done. Resolving deltas: 100% (118048/118048), completed with 13416 local objects. From github.com:github/github 97ab7ae3f3745..8fb2f9fa180ed master -> origin/master [...snip many screenfuls of updates to origin remotes...] BUG: refs/packed-backend.c:984: packed-refs backend yielded reference preceding its prefix error: fetch died of signal 6 This issue bisects to 22600c04529 (refs/iterator: implement seeking for packed-ref iterators, 2025-03-12) where we have implemented seeking for the packed-ref iterator. As part of that change we introduced a check that verifies that the iterator only returns refnames bigger than the prefix. In theory, this check should always hold: when a prefix is set we know that we would've seeked that prefix first, so we should never see a reference sorting before that prefix. But in practice the check itself is misbehaving when handling unicode characters. The particular issue triggered with a branch that got the "shaved ice" unicode character in its name, which is composed of the bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname "refs/heads/<shaved-ice>" to something like "refs/heads/z", and it specifically hits when comparing the first byte, "0xEE". The root cause is that the most-significant bit of 0xEE is set. The `refname` and `prefix` pointers that we use to compare bytes with one another are both pointers to signed characters. As such, when we dereference the 0xEE byte the result is a _negative_ value, and this value will of course compare smaller than "z". We can see that this issue is avoided in `cmp_packed_refname()`, where we explicitly cast each byte to its unsigned form. Fix the bug by doing the same in `packed_ref_iterator_advance()`. Reported-by: Elijah Newren <newren@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: implement batch reference update supportKarthik Nayak1-2/+25
Git supports making reference updates with or without transactions. Updates with transactions are generally better optimized. But transactions are all or nothing. This means, if a user wants to batch updates to take advantage of the optimizations without the hard requirement that all updates must succeed, there is no way currently to do so. Particularly with the reftable backend where batching multiple reference updates is more efficient than performing them sequentially. Introduce batched update support with a new flag, 'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from transactions, use the transaction infrastructure under the hood. When enabled, this flag allows individual reference updates that would typically cause the entire transaction to fail due to non-system-related errors to be marked as rejected while permitting other updates to proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC' continue to result in the entire transaction failing. This approach enhances flexibility while preserving transactional integrity where necessary. The implementation introduces several key components: - Add 'rejection_err' field to struct `ref_update` to track failed updates with failure reason. - Add a new struct `ref_transaction_rejections` and a field within `ref_transaction` to this struct to allow quick iteration over rejected updates. - Modify reference backends (files, packed, reftable) to handle partial transactions by using `ref_transaction_set_rejected()` instead of failing the entire transaction when `REF_TRANSACTION_ALLOW_FAILURE` is set. - Add `ref_transaction_for_each_rejected_update()` to let callers examine which updates were rejected and why. This foundational change enables batched update support throughout the reference subsystem. A following commit will expose this capability to users by adding a `--batch-updates` flag to 'git-update-ref(1)', providing both a user-facing feature and a testable implementation. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: introduce enum-based transaction error typesKarthik Nayak1-9/+14
Replace preprocessor-defined transaction errors with a strongly-typed enum `ref_transaction_error`. This change: - Improves type safety and function signature clarity. - Makes error handling more explicit and discoverable. - Maintains existing error cases, while adding new error cases for common scenarios. This refactoring paves the way for more comprehensive error handling which we will utilize in the upcoming commits to add batch reference update support. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: move duplicate refname update check to generic layerKarthik Nayak1-24/+1
Move the tracking of refnames in `affected_refnames` from individual backends into the generic layer in 'refs.c'. This centralizes the duplicate refname detection that was previously handled separately by each backend. Make some changes to accommodate this move: - Add a `string_list` field `refnames` to `ref_transaction` to contain all the references in a transaction. This field is updated whenever a new update is added via `ref_transaction_add_update`, so manual additions in reference backends are dropped. - Modify the backends to use this field internally as needed. The backends need to check if an update for refname already exists when splitting symrefs or adding an update for 'HEAD'. - In the reftable backend, within `reftable_be_transaction_prepare()`, move the `string_list_has_string()` check above `ref_transaction_add_update()`. Since `ref_transaction_add_update()` automatically adds the refname to `transaction->refnames`, performing the check after will always return true, so we perform the check before adding the update. This helps reduce duplication of functionality between the backends and makes it easier to make changes in a more centralized manner. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-29Merge branch 'ps/refname-avail-check-optim'Junio C Hamano1-37/+55
The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`
2025-03-12refs/iterator: implement seeking for packed-ref iteratorsPatrick Steinhardt1-22/+43
Implement seeking of `packed-ref` iterators. The implementation is again straight forward, except that we cannot continue to use the prefix iterator as we would otherwise not be able to reseek the iterator anymore in case one first asks for an empty and then for a non-empty prefix. Instead, we open-code the logic to in `advance()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/iterator: separate lifecycle from iterationPatrick Steinhardt1-15/+12
The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the "packed-refs" is sortedshejialuo1-16/+100
When there is a "sorted" trait in the header of the "packed-refs" file, it means that each entry is sorted increasingly by comparing the refname. We should add checks to verify whether the "packed-refs" is sorted in this case. Update the "packed_fsck_ref_header" to know whether there is a "sorted" trail in the header. It may seem that we could record all refnames during the parsing process and then compare later. However, this is not a good design due to the following reasons: 1. Because we need to store the state across the whole checking lifetime, we would consume a lot of memory if there are many entries in the "packed-refs" file. 2. We cannot reuse the existing compare function "cmp_packed_ref_records" which cause repetition. Because "cmp_packed_ref_records" needs an extra parameter "struct snaphost", extract the common part into a new function "cmp_packed_ref_records" to reuse this function to compare. Then, create a new function "packed_fsck_ref_sorted" to parse the file again and user the new fsck message "packedRefUnsorted(ERROR)" to report to the user if the file is not sorted. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: add "packed-refs" entry consistency checkshejialuo1-1/+121
"packed-backend.c::next_record" will parse the ref entry to check the consistency. This function has already checked the following things: 1. Parse the main line of the ref entry to inspect whether the oid is not correct. Then, check whether the next character is oid. Then check the refname. 2. If the next line starts with '^', it would continue to parse the peeled oid and check whether the last character is '\n'. As we decide to implement the ref consistency check for "packed-refs", let's port these two checks and update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the refname contains NUL charactersshejialuo1-0/+18
"packed-backend.c::next_record" will use "check_refname_format" to check the consistency of the refname. If it is not OK, the program will die. However, it is reported in [1], we cannot catch some corruption. But we already have the code path and we must miss out something. We use the following code to get the refname: strbuf_add(&iter->refname_buf, p, eol - p); iter->base.refname = iter->refname_buf.buf In the above code, `p` is the start pointer of the refname and `eol` is the next newline pointer. We calculate the length of the refname by subtracting the two pointers. Then we add the memory range between `p` and `eol` to get the refname. However, if there are some NUL characters in the memory range between `p` and `eol`, we will see the refname as a valid ref name as long as the memory range between `p` and first occurred NUL character is valid. In order to catch above corruption, create a new function "refname_contains_nul" by searching the first NUL character. If it is not at the end of the string, there must be some NUL characters in the refname. Use this function in "next_record" function to die the program if "refname_contains_nul" returns true. [1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de/ Reported-by: R. Diez <rdiez-temp3@rd10.de> Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: add "packed-refs" header consistency checkshejialuo1-0/+73
In "packed-backend.c::create_snapshot", if there is a header (the line which starts with '#'), we will check whether the line starts with "# pack-refs with: ". However, we need to consider other situations and discuss whether we need to add checks. 1. If the header does not exist, we should not report an error to the user. This is because in older Git version, we never write header in the "packed-refs" file. Also, we do allow no header in "packed-refs" in runtime. 2. If the header content does not start with "# packed-ref with: ", we should report an error just like what "create_snapshot" does. So, create a new fsck message "badPackedRefHeader(ERROR)" for this. 3. If the header content is not the same as the constant string "PACKED_REFS_HEADER". This is expected because we make it extensible intentionally and runtime "create_snapshot" won't complain about unknown traits. In order to align with the runtime behavior. There is no need to report. As we have analyzed, we only need to check the case 2 in the above. In order to do this, use "open_nofollow" function to get the file descriptor and then read the "packed-refs" file via "strbuf_read". Like what "create_snapshot" and other functions do, we could split the line by finding the next newline in the buffer. When we cannot find a newline, we could report an error. So, create a function "packed_fsck_ref_next_line" to find the next newline and if there is no such newline, use "packedRefEntryNotTerminated(ERROR)" to report an error to the user. Then, parse the first line to apply the checks. Update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check if header starts with "# pack-refs with: "shejialuo1-1/+1
We always write a space after "# pack-refs with:" but we don't align with this rule in the "create_snapshot" method where we would check whether header starts with "# pack-refs with:". It might seem that we should undoubtedly tighten this rule, however, we don't have any technical documentation about this and there is a possibility that we would break the compatibility for other third-party libraries. By investigating influential third-party libraries, we could conclude how these libraries handle the header of "packed-refs" file: 1. libgit2 is fine and always writes the space. It also expects the whitespace to exist. 2. JGit does not expect th header to have a trailing space, but expects the "peeled" capability to have a leading space, which is mostly equivalent because that capability is typically the first one we write. It always writes the space. 3. gitoxide expects the space t exist and writes it. 4. go-git doesn't create the header by default. As many third-party libraries expect a single space after "# pack-refs with:", if we forget to write the space after the colon, "create_snapshot" won't catch this. And we would break other re-implementations. So, we'd better tighten the rule by checking whether the header starts with "# pack-refs with: ". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the "packed-refs" is regular fileshejialuo1-4/+48
Although "git-fsck(1)" and "packed-backend.c" will check some consistency and correctness of "packed-refs" file, they never check the filetype of the "packed-refs". Let's verify that the "packed-refs" has the expected filetype, confirming it is created by "git pack-refs" command. We could use "open_nofollow" wrapper to open the raw "packed-refs" file. If the returned "fd" value is less than 0, we could check whether the "errno" is "ELOOP" to report an error to the user. And then we use "fstat" to check whether the "packed-refs" file is a regular file. Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to the user if "packed-refs" is not a regular file. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'sj/ref-contents-check'Junio C Hamano1-1/+7
"git fsck" learned to issue warnings on "curiously formatted" ref contents that have always been taken valid but something Git wouldn't have written itself (e.g., missing terminating end-of-line after the full object name). * sj/ref-contents-check: ref: add symlink ref content check for files backend ref: check whether the target of the symref is a ref ref: add basic symref content check for files backend ref: add more strict checks for regular refs ref: port git-fsck(1) regular refs check for files backend ref: support multiple worktrees check for refs ref: initialize ref name outside of check functions ref: check the full refname instead of basename ref: initialize "fsck_ref_report" with zero
2024-11-21ref: support multiple worktrees check for refsshejialuo1-1/+7
We have already set up the infrastructure to check the consistency for refs, but we do not support multiple worktrees. However, "git-fsck(1)" will check the refs of worktrees. As we decide to get feature parity with "git-fsck(1)", we need to set up support for multiple worktrees. Because each worktree has its own specific refs, instead of just showing the users "refs/worktree/foo", we need to display the full name such as "worktrees/<id>/refs/worktree/foo". So we should know the id of the worktree to get the full name. Add a new parameter "struct worktree *" for "refs-internal.h::fsck_fn". Then change the related functions to follow this new interface. The "packed-refs" only exists in the main worktree, so we should only check "packed-refs" in the main worktree. Use "is_main_worktree" method to skip checking "packed-refs" in "packed_fsck" function. Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add "worktree/<id>/" prefix when we are not in the main worktree. Last, add a new test to check the refname when there are multiple worktrees to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: introduce "initial" transaction flagPatrick Steinhardt1-8/+0
There are two different ways to commit a transaction: - `ref_transaction_commit()` can be used to commit a regular transaction and is what almost every caller wants. - `initial_ref_transaction_commit()` can be used when it is known that the ref store that the transaction is committed for is empty and when there are no concurrent processes. This is used when cloning a new repository. Implementing this via two separate functions has a couple of downsides. First, every reference backend needs to implement a separate callback even in the case where they don't special-case the initial transaction. Second, backends are basically forced to reimplement the whole logic for how to commit the transaction like the "files" backend does, even though backends may wish to only tweak certain behaviour of a "normal" commit. Third, it is awkward that callers must never prepare the transaction as this is somewhat different than how a transaction typically works. Refactor the code such that we instead mark initial transactions via a separate flag when starting the transaction. This addresses all of the mentioned painpoints, where the most important part is that it will allow backends to have way more leeway in how exactly they want to handle the initial transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12Merge branch 'ps/pack-refs-auto-heuristics'Junio C Hamano1-0/+18
"git pack-refs --auto" for the files backend was too aggressive, which has been a bit tamed. * ps/pack-refs-auto-heuristics: refs/files: use heuristic to decide whether to repack with `--auto` t0601: merge tests for auto-packing of refs wrapper: introduce `log2u()`
2024-09-04refs/files: use heuristic to decide whether to repack with `--auto`Patrick Steinhardt1-0/+18
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide whether or not a repack is in order. This switch has been introduced mostly with the "reftable" backend in mind, which already knows to auto-compact its tables during normal operations. When the flag is set, then it will use the same auto-compaction mechanism and thus end up doing nothing in most cases. The "files" backend does not have any such heuristic yet and instead packs any loose references unconditionally. So we rewrite the complete "packed-refs" file even if there's only a single loose reference to be packed. Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using `git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is unconditionally executed via our auto maintenance, so we end up repacking references every single time auto maintenance kicks in. And while that commit already mentioned that the "files" backend unconditionally packs refs now, the author obviously didn't quite think about the consequences thereof. So while the idea was sound, we really should have added a heuristic to the "files" backend before implementing it. Introduce a heuristic that decides whether or not it is worth to pack loose references. The important factors to decide here are the number of loose references in comparison to the overall size of the "packed-refs" file. The bigger the "packed-refs" file, the longer it takes to rewrite it and thus we scale up the limit of allowed loose references before we repack. As is the nature of heuristics, this mechansim isn't obviously "correct", but should rather be seen as a tradeoff between how much resources we spend packing refs and how inefficient the ref store becomes. For all I can say, we have successfully been using the exact same heuristic in Gitaly for several years by now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26Merge branch 'jk/mark-unused-parameters'Junio C Hamano1-2/+2
Mark unused parameters as UNUSED to squelch -Wunused warnings. * jk/mark-unused-parameters: t-hashmap: stop calling setup() for t_intern() test scalar: mark unused parameters in dummy function daemon: mark unused parameters in non-posix fallbacks setup: mark unused parameter in config callback test-mergesort: mark unused parameters in trivial callback t-hashmap: mark unused parameters in callback function reftable: mark unused parameters in virtual functions reftable: drop obsolete test function declarations reftable: ignore unused argc/argv in test functions unit-tests: ignore unused argc/argv t/helper: mark more unused argv/argc arguments oss-fuzz: mark unused argv/argc argument refs: mark unused parameters in do_for_each_reflog_helper() refs: mark unused parameters in ref_store fsck callbacks update-ref: mark more unused parameters in parser callbacks imap-send: mark unused parameter in ssl_socket_connect() fallback
2024-08-23Merge branch 'ps/config-wo-the-repository'Junio C Hamano1-0/+2
Use of API functions that implicitly depend on the_repository object in the config subsystem has been rewritten to pass a repository object through the callchain. * ps/config-wo-the-repository: config: hide functions using `the_repository` by default global: prepare for hiding away repo-less config functions config: don't depend on `the_repository` with branch conditions config: don't have setters depend on `the_repository` config: pass repo to functions that rename or copy sections config: pass repo to `git_die_config()` config: pass repo to `git_config_get_expiry_in_days()` config: pass repo to `git_config_get_expiry()` config: pass repo to `git_config_get_max_percent_split_change()` config: pass repo to `git_config_get_split_index()` config: pass repo to `git_config_get_index_threads()` config: expose `repo_config_clear()` config: introduce missing setters that take repo as parameter path: hide functions using `the_repository` by default path: stop relying on `the_repository` in `worktree_git_path()` path: stop relying on `the_repository` when reporting garbage hooks: remove implicit dependency on `the_repository` editor: do not rely on `the_repository` for interactive edits path: expose `do_git_common_path()` as `repo_common_pathv()` path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-17refs: mark unused parameters in ref_store fsck callbacksJeff King1-2/+2
Commit ab6f79d8df (refs: set up ref consistency check infrastructure, 2024-08-08) added virtual functions to the ref store for doing fsck checks. But the packed and reftable backends do not yet do anything. Let's annotate them to silence -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-16Merge branch 'sj/ref-fsck'Junio C Hamano1-0/+8
"git fsck" infrastructure has been taught to also check the sanity of the ref database, in addition to the object database. * sj/ref-fsck: fsck: add ref name check for files backend files-backend: add unified interface for refs scanning builtin/refs: add verify subcommand refs: set up ref consistency check infrastructure fsck: add refs report function fsck: add a unified interface for reporting fsck messages fsck: make "fsck_error" callback generic fsck: rename objects-related fsck error functions fsck: rename "skiplist" to "skip_oids"
2024-08-13global: prepare for hiding away repo-less config functionsPatrick Steinhardt1-0/+2
We're about to hide config functions that implicitly depend on `the_repository` behind the `USE_THE_REPOSITORY_VARIABLE` macro. This will uncover a bunch of dependents that transitively relied on the global variable, but didn't define the macro yet. Adapt them such that we define the macro to prepare for this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08refs: set up ref consistency check infrastructureshejialuo1-0/+8
The "struct ref_store" is the base class which contains the "be" pointer which provides backend-specific functions whose interfaces are defined in the "ref_storage_be". We could reuse this polymorphism to define only one interface. For every backend, we need to provide its own function pointer. The interfaces defined in the `ref_storage_be` are carefully structured in semantic. It's organized as the five parts: 1. The name and the initialization interfaces. 2. The ref transaction interfaces. 3. The ref internal interfaces (pack, rename and copy). 4. The ref filesystem interfaces. 5. The reflog related interfaces. To keep consistent with the git-fsck(1), add a new interface named "fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be grouped into any above five categories. Explicitly add blank line to make it different from others. Last, implement placeholder functions for each ref backends. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-30refs/packed: stop using `the_repository`Patrick Steinhardt1-8/+6
Convert the packed ref backend to stop using `the_repository` in favor of the repo that gets passed in via `struct ref_store`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `oidread()` and `oidclr()`Patrick Steinhardt1-3/+3
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06refs: implement removal of ref storagesPatrick Steinhardt1-0/+15
We're about to introduce logic to migrate ref storages. One part of the migration will be to delete the files that are part of the old ref storage format. We don't yet have a way to delete such data generically across ref backends though. Implement a new `delete` callback and expose it via a new `ref_storage_delete()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-23Merge branch 'ps/refs-without-the-repository-updates' into ↵Junio C Hamano1-25/+42
ps/ref-storage-migration * ps/refs-without-the-repository-updates: refs/packed: remove references to `the_hash_algo` refs/files: remove references to `the_hash_algo` refs/files: use correct repository refs: remove `dwim_log()` refs: drop `git_default_branch_name()` refs: pass repo when peeling objects refs: move object peeling into "object.c" refs: pass ref store when detecting dangling symrefs refs: convert iteration over replace refs to accept ref store refs: retrieve worktree ref stores via associated repository refs: refactor `resolve_gitlink_ref()` to accept a repository refs: pass repo when retrieving submodule ref store refs: track ref stores via strmap refs: implement releasing ref storages refs: rename `init_db` callback to avoid confusion refs: adjust names for `init` and `init_db` callbacks
2024-05-17refs/packed: remove references to `the_hash_algo`Patrick Steinhardt1-12/+20
Remove references to `the_hash_algo` in favor of the hash algo specified by the repository associated with the packed ref store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: pass repo when peeling objectsPatrick Steinhardt1-5/+3
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on `the_repository` to look up objects. Despite the fact that we want to get rid of `the_repository`, it also leads to some restrictions in our ref iterators when trying to retrieve the peeled value for a repository other than `the_repository`. Refactor these functions such that both take a repository as argument and remove the now-unnecessary restrictions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: implement releasing ref storagesPatrick Steinhardt1-0/+11
Ref storages are typically only initialized once for `the_repository` and then never released. Until now we got away with that without causing memory leaks because `the_repository` stays reachable, and because the ref backend is reachable via `the_repository` its memory basically never leaks. This is about to change though because of the upcoming migration logic, which will create a secondary ref storage. In that case, we will either have to release the old or new ref storage to avoid leaks. Implement a new `release` callback and expose it via a new `ref_storage_release()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: rename `init_db` callback to avoid confusionPatrick Steinhardt1-4/+4
Reference backends have two callbacks `init` and `init_db`. The similarity of these two callbacks has repeatedly confused me whenever I was looking at them, where I always had to look up which of them does what. Rename the `init_db` callback to `create_on_disk`, which should hopefully be clearer. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: adjust names for `init` and `init_db` callbacksPatrick Steinhardt1-8/+8
The names of the functions that implement the `init` and `init_db` callbacks in the "files" and "packed" backends do not match the names of the callbacks, which is inconsistent. Rename them so that they match, which makes it easier to discover their respective implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07refs: remove `create_symref` and associated dead codeKarthik Nayak1-1/+0
In the previous commits, we converted `refs_create_symref()` to utilize transactions to perform symref updates. Earlier `refs_create_symref()` used `create_symref()` to do the same. We can now remove `create_symref()` and any code associated with it which is no longer used. We remove `create_symref()` code from all the reference backends and also remove it entirely from the `ref_storage_be` struct. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: always treat iterators as orderedPatrick Steinhardt1-1/+1
In the preceding commit we have converted the reflog iterator of the "files" backend to be ordered, which was the only remaining ref iterator that wasn't ordered. Refactor the ref iterator infrastructure so that we always assume iterators to be ordered, thus simplifying the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-26Merge branch 'ps/worktree-refdb-initialization'Junio C Hamano1-0/+1
Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs
2024-01-16Merge branch 'ps/refstorage-extension'Junio C Hamano1-1/+0
Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-1/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-08refs: prepare `refs_init_db()` for initializing worktree refsPatrick Steinhardt1-0/+1
The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02refs: refactor logic to look up storage backendsPatrick Steinhardt1-1/+0
In order to look up ref storage backends, we're currently using a linked list of backends, where each backend is expected to set up its `next` pointer to the next ref storage backend. This is kind of a weird setup as backends need to be aware of other backends without much of a reason. Refactor the code so that the array of backends is centrally defined in "refs.c", where each backend is now identified by an integer constant. Expose functions to translate from those integer constants to the name and vice versa, which will be required by subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>