git/hex.c, branch v2.45.4

hex-ll: separate out non-hash-algo functions

2023-09-29T22:14:56Z

In order to further reduce all-in-one headers, separate out functions in hex.h that do not operate on object hashes into its own file, hex-ll.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano

hex: retire get_sha1_hex()

2023-07-24T23:11:23Z

The naming convention around get_sha1_hex() and its friends is awkward these days, after "struct object_id" was introduced. There are three public functions around this area: * get_sha1_hex() - use the implied the_hash_algo, fill uchar * * get_oid_hex() - use the implied the_hash_algo, fill oid * * get_oid_hex_algop() - use the passed algop, fill oid * Between the latter two, the "_algop" suffix signals whether the the_hash_algo is used as the implied algorithm or the caller should pass an algorithm explicitly. That is very much understandable and is a good convention. Between the former two, however, the "SHA1" vs "OID" in the names differentiate in what type of variable the result is stored. We could argue that it makes sense to use "SHA1" to mean "flat byte buffer" to honor the historical practice in the days before "struct object_id" was invented, but the natural fourth friend of the above group would take an algop and fill a flat byte buffer, and it would be strange to name it get_sha1_hex_algop(). Do we use the passed in algo, or are we limited to SHA-1 ;-)? In fact, such a function exists, albeit as a private helper function used by the implementation of these functions, and is named a lot more sensibly: get_hash_hex_algop(). Correct the misnomer of get_sha1_hex() and use "hash", instead of "sha1", as "flat byte buffer that stores binary (as opposed to hexadecimal) representation of the hash". The four (2x2) friends now become: * get_hash_hex() - use the implied the_hash_algo, fill uchar * * get_oid_hex() - use the implied the_hash_algo, fill oid * * get_hash_hex_algop() - use the passed algop, fill uchar * * get_oid_hex_algop() - use the passed algop, fill oid * As there are only two remaining calls to get_sha1_hex() in the codebase right now, the blast radious of this change is fairly small. Signed-off-by: Junio C Hamano

hash-ll.h: split out of hash.h to remove dependency on repository.h

2023-04-24T19:47:32Z

hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

hex.h: move some hex-related declarations from cache.h

2023-02-24T01:25:28Z

hex.c contains code for hex-related functions, but for some reason these functions were declared in the catch-all cache.h. Move the function declarations into a hex.h header instead. This also allows us to remove includes of cache.h from a few C files. For now, we make cache.h include hex.h, so that it is easier to review the direct changes being made by this patch. In the next patch, we will remove that, and add the necessary direct '#include "hex.h"' in the hundreds of C files that need it. Note that reviewing the header changes in this commit might be simplified via git log --no-walk -p --color-moved $COMMIT -- '*.h'` In particular, it highlights the simple movement of code in .h files rather nicely. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

hex: print objects using the hash algorithm member

2021-04-27T07:31:39Z

Now that all code paths correctly set the hash algorithm member of struct object_id, write an object's hex representation using the hash algorithm member embedded in it. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hex: default to the_hash_algo on zero algorithm value

2021-04-27T07:31:39Z

There are numerous places in the codebase where we assume we can initialize data by zeroing all its bytes. However, when we do that with a struct object_id, it leaves the structure with a zero value for the algorithm, which is invalid. We could forbid this pattern and require that all struct object_id instances be initialized using oidclr, but this seems burdensome and it's unnatural to most C programmers. Instead, if the algorithm is zero, assume we wanted to use the default hash algorithm instead. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hash: set, copy, and use algo field in struct object_id

2021-04-27T07:31:38Z

Now that struct object_id has an algorithm field, we should populate it. This will allow us to handle object IDs in any supported algorithm and distinguish between them. Ensure that the field is written whenever we write an object ID by storing it explicitly every time we write an object. Set values for the empty blob and tree values as well. In addition, use the algorithm field to compare object IDs. Note that because we zero-initialize struct object_id in many places throughout the codebase, we default to the default algorithm in cases where the algorithm field is zero rather than explicitly initialize all of those locations. This leads to a branch on every comparison, but the alternative is to compare the entire buffer each time and padding the buffer for SHA-1. That alternative ranges up to 3.9% worse than this approach on the perf t0001, t1450, and t1451. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hex: add functions to parse hex object IDs in any algorithm

2020-02-24T17:33:21Z

There are some places where we need to parse a hex object ID in any algorithm without knowing beforehand which algorithm is in use. An example is when parsing fast-import marks. Add a get_oid_hex_any to parse an object ID and return the algorithm it belongs to, and additionally add parse_oid_hex_any which is the equivalent change for parse_oid_hex. If the object is not parseable, we return GIT_HASH_UNKNOWN. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hex: introduce parsing variants taking hash algorithms

2020-02-24T17:33:21Z

Introduce variants of get_oid_hex and parse_oid_hex that parse an arbitrary hash algorithm, implementing internal functions to avoid duplication. These functions can be used in the transport code to parse refs properly. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hex: drop sha1_to_hex()

2019-11-13T01:09:10Z

There's only a single caller left of sha1_to_hex(), since everybody that has an object name in "unsigned char[]" now uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we print a hex sha1 when we find a collision. This one will always be sha1, regardless of the current hash algorithm, so we can't use hash_to_hex() here. In practice we'd probably not be running sha1 at all if it isn't the current algorithm, but it's possible we might still occasionally need to compute a sha1 in a post-sha256 world. Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's call that ourselves. There's value in getting rid of the sha1-specific wrapper to de-clutter the global namespace, and to make sure nobody uses it (and as with sha1_to_hex_r() in the previous patch, we'll drop the coccinelle transformations, too). The sha1_to_hex() function is mentioned in a comment; we can easily swap that out for oid_to_hex() to give a better example. Also update the comment that was left stale when we added "struct object_id *" as a way to name an object and added functions to convert it to hex. The function is also mentioned in some test vectors in t4100, but that's not runnable code, so there's no point in trying to clean it up. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano