git/hash.h, branch v2.35.2

hash.h: provide constants for the hash IDs

2021-09-07T17:47:43Z

This will simplify referencing them from code that is not deeply integrated with Git, in particular, the reftable library. Signed-off-by: Han-Wen Nienhuys Signed-off-by: Junio C Hamano

oidtree: avoid unaligned access to crit-bit tree

2021-08-15T20:13:50Z

The flexible array member "k" of struct cb_node is used to store the key of the crit-bit tree node. It offers no alignment guarantees -- in fact the current struct layout puts it one byte after a 4-byte aligned address, i.e. guaranteed to be misaligned. oidtree uses a struct object_id as cb_node key. Since cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) it requires 4-byte alignment. The mismatch is reported by UndefinedBehaviorSanitizer at runtime like this: hash.h:277:2: runtime error: member access within misaligned address 0x00015000802d for type 'struct object_id', which requires 4 byte alignment 0x00015000802d: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:2 in We can fix that by: 1. eliminating the alignment requirement of struct object_id, 2. providing the alignment in struct cb_node, or 3. avoiding the issue by only using memcpy to access "k". Currently we only store one of two values in "algo" in struct object_id. We could use a uint8_t for that instead and widen it only once we add support for our twohundredth algorithm or so. That would not only avoid alignment issues, but also reduce the memory requirements for each instance of struct object_id by ca. 9%. Supporting keys with alignment requirements might be useful to spread the use of crit-bit trees. It can be achieved by using a wider type for "k" (e.g. uintmax_t), using different types for the members "byte" and "otherbits" (e.g. uint16_t or uint32_t for each), or by avoiding the use of flexible arrays like khash.h does. This patch implements the third option, though, because it has the least potential for causing side-effects and we're close to the next release. If one of the other options is implemented later as well to get their additional benefits we can get rid of the extra copies introduced here. Reported-by: Andrzej Hunt Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano

oidcpy_with_padding: constify `src' arg

2021-07-08T04:28:02Z

As with `oidcpy', the source struct will not be modified and this will allow an upcoming const-correct caller to use it. Signed-off-by: Eric Wong Signed-off-by: Junio C Hamano

parallel-checkout: send the new object_id algo field to the workers

2021-05-17T20:38:54Z

An object_id storing a SHA-1 name has some unused bytes at the end of the hash array. Since these bytes are not used, they are usually not initialized to any value either. However, at parallel_checkout.c:send_one_item() the object_id of a cache entry is copied into a buffer which is later sent to a checkout worker through a pipe write(). This makes Valgrind complain about passing uninitialized bytes to a syscall. The worker won't use these uninitialized bytes either, but the warning could confuse someone trying to debug this code; So instead of using oidcpy(), send_one_item() uses hashcpy() to only copy the used/initialized bytes of the object_id, and leave the remaining part with zeros. However, since cf0983213c ("hash: add an algo member to struct object_id", 2021-04-26), using hashcpy() is no longer sufficient here as it won't copy the new algo field from the object_id. Let's add and use a new function which meets both our requirements of copying all the important object_id data while still avoiding the uninitialized bytes, by padding the end of the hash array in the destination object_id. With this change, we also no longer need the destination buffer from send_one_item() to be initialized with zeros, so let's switch from xcalloc() to xmalloc() to make this clear. Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

hash: provide per-algorithm null OIDs

2021-04-27T07:31:39Z

Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hash: set, copy, and use algo field in struct object_id

2021-04-27T07:31:38Z

Now that struct object_id has an algorithm field, we should populate it. This will allow us to handle object IDs in any supported algorithm and distinguish between them. Ensure that the field is written whenever we write an object ID by storing it explicitly every time we write an object. Set values for the empty blob and tree values as well. In addition, use the algorithm field to compare object IDs. Note that because we zero-initialize struct object_id in many places throughout the codebase, we default to the default algorithm in cases where the algorithm field is zero rather than explicitly initialize all of those locations. This leads to a branch on every comparison, but the alternative is to compare the entire buffer each time and padding the buffer for SHA-1. That alternative ranges up to 3.9% worse than this approach on the perf t0001, t1450, and t1451. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hash: add a function to finalize object IDs

2021-04-27T07:31:38Z

To avoid the penalty of having to branch in hash comparison functions, we'll want to always compare the full hash member in a struct object_id, which will require that SHA-1 object IDs be zero-padded. To do so, add a function which finalizes a hash context and writes it into an object ID that performs this padding. Move the definition of struct object_id and the constant definitions higher up so we they are available for us to use. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

hash: add an algo member to struct object_id

2021-04-27T07:31:38Z

Now that we're working with multiple hash algorithms in the same repo, it's best if we label each object ID with its algorithm so we can determine how to format a given object ID. Add a member called algo to struct object_id. Performance testing on object ID-heavy workloads doesn't reveal a clear change in performance. Out of performance tests t0001 and t1450, there are slight variations in performance both up and down, but all measurements are within the margin of error. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

cache.h: move hash/oid functions to hash.h

2020-12-04T21:55:14Z

We define git_hash_algo and object_id in hash.h, but most of the utility functions are declared in the main cache.h. Let's move them to hash.h along with their struct definitions. This cleans up cache.h a bit, but also avoids circular dependencies when other headers need to know about these functions (e.g., if oid-array.h were to have an inline that used oideq(), it couldn't include cache.h because it is itself included by cache.h). No including C files should be affected, because hash.h is always included in cache.h already. We do have to mention repository.h at the top of hash.h, though, since we depend on the_repository in some of our inline functions. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

hash: implement and use a context cloning function

2020-02-24T17:33:21Z

For all of our SHA-1 implementations and most of our SHA-256 implementations, the hash context we use is a real struct. For these implementations, it's possible to copy a hash context by making a copy of the struct. However, for our libgcrypt implementation, our hash context is a pointer. Consequently, copying it does not lead to an independent hash context like we intended. Fortunately, however, libgcrypt provides us with a handy function to copy hash contexts. Let's add a cloning function to the hash algorithm API, and use it in the one place we need to make a hash context copy. With this change, our libgcrypt SHA-256 implementation is fully functional with all of our other hash implementations. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano