git/hex.c, branch jch

refs/files-backend: drop const to fix strchr() warning

2026-04-02T05:08:53Z

In show_one_reflog_ent(), we're fed a writable strbuf buffer, which we parse into the various reflog components. We write a NUL over email_end to tie off one of the fields, and thus email_end must be non-const. But with a C23 implementation of libc, strchr() will now complain when assigning the result to a non-const pointer from a const one. So we can fix this by making the source pointer non-const. But there's a catch. We derive that source pointer by parsing the line with parse_oid_hex_algop(), which requires a const pointer for its out-parameter. We can work around that by teaching it to use our CONST_OUTPARAM() trick, just like skip_prefix(). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

global: trivial conversions to fix `-Wsign-compare` warnings

2024-12-06T11:20:04Z

We have a bunch of loops which iterate up to an unsigned boundary using a signed index, which generates warnigs because we compare a signed and unsigned value in the loop condition. Address these sites for trivial cases and enable `-Wsign-compare` warnings for these code units. This patch only adapts those code units where we can drop the `DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

global: mark code units that generate warnings with `-Wsign-compare`

2024-12-06T11:20:02Z

Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

global: introduce `USE_THE_REPOSITORY_VARIABLE` macro

2024-06-14T17:26:33Z

Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

global: ensure that object IDs are always padded

2024-06-14T17:26:32Z

The `oidcmp()` and `oideq()` functions only compare the prefix length as specified by the given hash algorithm. This mandates that the object IDs have a valid hash algorithm set, or otherwise we wouldn't be able to figure out that prefix. As we do not have a hash algorithm in many cases, for example when handling null object IDs, this assumption cannot always be fulfilled. We thus have a fallback in place that instead uses `the_repository` to derive the hash function. This implicit dependency is hidden away from callers and can be quite surprising, especially in contexts where there may be no repository. In theory, we can adapt those functions to always memcmp(3P) the whole length of their hash arrays. But there exist a couple of sites where we populate `struct object_id`s such that only the prefix of its hash that is actually used by the hash algorithm is populated. The remaining bytes are left uninitialized. The fact that those bytes are uninitialized also leads to warnings under Valgrind in some places where we copy those bytes. Refactor callsites where we populate object IDs to always initialize all bytes. This also allows us to get rid of `oidcpy_with_padding()`, for one because the input is now fully initialized, and because `oidcpy()` will now always copy the whole hash array. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

hex-ll: separate out non-hash-algo functions

2023-09-29T22:14:56Z

In order to further reduce all-in-one headers, separate out functions in hex.h that do not operate on object hashes into its own file, hex-ll.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano

hex: retire get_sha1_hex()

2023-07-24T23:11:23Z

The naming convention around get_sha1_hex() and its friends is awkward these days, after "struct object_id" was introduced. There are three public functions around this area: * get_sha1_hex() - use the implied the_hash_algo, fill uchar * * get_oid_hex() - use the implied the_hash_algo, fill oid * * get_oid_hex_algop() - use the passed algop, fill oid * Between the latter two, the "_algop" suffix signals whether the the_hash_algo is used as the implied algorithm or the caller should pass an algorithm explicitly. That is very much understandable and is a good convention. Between the former two, however, the "SHA1" vs "OID" in the names differentiate in what type of variable the result is stored. We could argue that it makes sense to use "SHA1" to mean "flat byte buffer" to honor the historical practice in the days before "struct object_id" was invented, but the natural fourth friend of the above group would take an algop and fill a flat byte buffer, and it would be strange to name it get_sha1_hex_algop(). Do we use the passed in algo, or are we limited to SHA-1 ;-)? In fact, such a function exists, albeit as a private helper function used by the implementation of these functions, and is named a lot more sensibly: get_hash_hex_algop(). Correct the misnomer of get_sha1_hex() and use "hash", instead of "sha1", as "flat byte buffer that stores binary (as opposed to hexadecimal) representation of the hash". The four (2x2) friends now become: * get_hash_hex() - use the implied the_hash_algo, fill uchar * * get_oid_hex() - use the implied the_hash_algo, fill oid * * get_hash_hex_algop() - use the passed algop, fill uchar * * get_oid_hex_algop() - use the passed algop, fill oid * As there are only two remaining calls to get_sha1_hex() in the codebase right now, the blast radious of this change is fairly small. Signed-off-by: Junio C Hamano

hash-ll.h: split out of hash.h to remove dependency on repository.h

2023-04-24T19:47:32Z

hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

hex.h: move some hex-related declarations from cache.h

2023-02-24T01:25:28Z

hex.c contains code for hex-related functions, but for some reason these functions were declared in the catch-all cache.h. Move the function declarations into a hex.h header instead. This also allows us to remove includes of cache.h from a few C files. For now, we make cache.h include hex.h, so that it is easier to review the direct changes being made by this patch. In the next patch, we will remove that, and add the necessary direct '#include "hex.h"' in the hundreds of C files that need it. Note that reviewing the header changes in this commit might be simplified via git log --no-walk -p --color-moved $COMMIT -- '*.h'` In particular, it highlights the simple movement of code in .h files rather nicely. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

hex: print objects using the hash algorithm member

2021-04-27T07:31:39Z

Now that all code paths correctly set the hash algorithm member of struct object_id, write an object's hex representation using the hash algorithm member embedded in it. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano