<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/hash.h, branch v2.35.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.35.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.35.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-09-07T17:47:43Z</updated>
<entry>
<title>hash.h: provide constants for the hash IDs</title>
<updated>2021-09-07T17:47:43Z</updated>
<author>
<name>Han-Wen Nienhuys</name>
<email>hanwen@google.com</email>
</author>
<published>2021-08-30T14:57:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=27f3796ac1f9b711fae506f63186034ed178690c'/>
<id>urn:sha1:27f3796ac1f9b711fae506f63186034ed178690c</id>
<content type='text'>
This will simplify referencing them from code that is not deeply integrated with
Git, in particular, the reftable library.

Signed-off-by: Han-Wen Nienhuys &lt;hanwen@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>oidtree: avoid unaligned access to crit-bit tree</title>
<updated>2021-08-15T20:13:50Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2021-08-14T20:00:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8bcda98da5db9003145674d8a7cbe7b81cab0d24'/>
<id>urn:sha1:8bcda98da5db9003145674d8a7cbe7b81cab0d24</id>
<content type='text'>
The flexible array member "k" of struct cb_node is used to store the key
of the crit-bit tree node.  It offers no alignment guarantees -- in fact
the current struct layout puts it one byte after a 4-byte aligned
address, i.e. guaranteed to be misaligned.

oidtree uses a struct object_id as cb_node key.  Since cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) it requires 4-byte
alignment.  The mismatch is reported by UndefinedBehaviorSanitizer at
runtime like this:

hash.h:277:2: runtime error: member access within misaligned address 0x00015000802d for type 'struct object_id', which requires 4 byte alignment
0x00015000802d: note: pointer points here
 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00
             ^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:2 in

We can fix that by:

1. eliminating the alignment requirement of struct object_id,
2. providing the alignment in struct cb_node, or
3. avoiding the issue by only using memcpy to access "k".

Currently we only store one of two values in "algo" in struct object_id.
We could use a uint8_t for that instead and widen it only once we add
support for our twohundredth algorithm or so.  That would not only avoid
alignment issues, but also reduce the memory requirements for each
instance of struct object_id by ca. 9%.

Supporting keys with alignment requirements might be useful to spread
the use of crit-bit trees.  It can be achieved by using a wider type for
"k" (e.g. uintmax_t), using different types for the members "byte" and
"otherbits" (e.g. uint16_t or uint32_t for each), or by avoiding the use
of flexible arrays like khash.h does.

This patch implements the third option, though, because it has the least
potential for causing side-effects and we're close to the next release.
If one of the other options is implemented later as well to get their
additional benefits we can get rid of the extra copies introduced here.

Reported-by: Andrzej Hunt &lt;andrzej@ahunt.org&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>oidcpy_with_padding: constify `src' arg</title>
<updated>2021-07-08T04:28:02Z</updated>
<author>
<name>Eric Wong</name>
<email>e@80x24.org</email>
</author>
<published>2021-07-07T23:10:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=90e07f0a342df836a33b92e179eb105243dba88d'/>
<id>urn:sha1:90e07f0a342df836a33b92e179eb105243dba88d</id>
<content type='text'>
As with `oidcpy', the source struct will not be modified and
this will allow an upcoming const-correct caller to use it.

Signed-off-by: Eric Wong &lt;e@80x24.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>parallel-checkout: send the new object_id algo field to the workers</title>
<updated>2021-05-17T20:38:54Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2021-05-17T19:49:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3d20ed27b8bdac5c82f4f78af802f9afa499f651'/>
<id>urn:sha1:3d20ed27b8bdac5c82f4f78af802f9afa499f651</id>
<content type='text'>
An object_id storing a SHA-1 name has some unused bytes at the end of
the hash array. Since these bytes are not used, they are usually not
initialized to any value either. However, at
parallel_checkout.c:send_one_item() the object_id of a cache entry is
copied into a buffer which is later sent to a checkout worker through a
pipe write(). This makes Valgrind complain about passing uninitialized
bytes to a syscall. The worker won't use these uninitialized bytes
either, but the warning could confuse someone trying to debug this code;
So instead of using oidcpy(), send_one_item() uses hashcpy() to only
copy the used/initialized bytes of the object_id, and leave the
remaining part with zeros.

However, since cf0983213c ("hash: add an algo member to struct
object_id", 2021-04-26), using hashcpy() is no longer sufficient here as
it won't copy the new algo field from the object_id. Let's add and use a
new function which meets both our requirements of copying all the
important object_id data while still avoiding the uninitialized bytes,
by padding the end of the hash array in the destination object_id. With
this change, we also no longer need the destination buffer from
send_one_item() to be initialized with zeros, so let's switch from
xcalloc() to xmalloc() to make this clear.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: provide per-algorithm null OIDs</title>
<updated>2021-04-27T07:31:39Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2021-04-26T01:02:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=14228447c9ce664a4e9c31ba10344ec5e4ea4ba5'/>
<id>urn:sha1:14228447c9ce664a4e9c31ba10344ec5e4ea4ba5</id>
<content type='text'>
Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: set, copy, and use algo field in struct object_id</title>
<updated>2021-04-27T07:31:38Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2021-04-26T01:02:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5a6dce70d7bb12ee2bc7926254c5b6741b91ac5f'/>
<id>urn:sha1:5a6dce70d7bb12ee2bc7926254c5b6741b91ac5f</id>
<content type='text'>
Now that struct object_id has an algorithm field, we should populate it.
This will allow us to handle object IDs in any supported algorithm and
distinguish between them.  Ensure that the field is written whenever we
write an object ID by storing it explicitly every time we write an
object.  Set values for the empty blob and tree values as well.

In addition, use the algorithm field to compare object IDs.  Note that
because we zero-initialize struct object_id in many places throughout
the codebase, we default to the default algorithm in cases where the
algorithm field is zero rather than explicitly initialize all of those
locations.

This leads to a branch on every comparison, but the alternative is to
compare the entire buffer each time and padding the buffer for SHA-1.
That alternative ranges up to 3.9% worse than this approach on the perf
t0001, t1450, and t1451.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: add a function to finalize object IDs</title>
<updated>2021-04-27T07:31:38Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2021-04-26T01:02:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ab795f0d77aec52496557eb69a5fc9e4f60ea1fd'/>
<id>urn:sha1:ab795f0d77aec52496557eb69a5fc9e4f60ea1fd</id>
<content type='text'>
To avoid the penalty of having to branch in hash comparison functions,
we'll want to always compare the full hash member in a struct object_id,
which will require that SHA-1 object IDs be zero-padded.  To do so, add
a function which finalizes a hash context and writes it into an object
ID that performs this padding.

Move the definition of struct object_id and the constant definitions
higher up so we they are available for us to use.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: add an algo member to struct object_id</title>
<updated>2021-04-27T07:31:38Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2021-04-26T01:02:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cf0983213c72727c4c7dea51ed9b3b38a2968fbe'/>
<id>urn:sha1:cf0983213c72727c4c7dea51ed9b3b38a2968fbe</id>
<content type='text'>
Now that we're working with multiple hash algorithms in the same repo,
it's best if we label each object ID with its algorithm so we can
determine how to format a given object ID. Add a member called algo to
struct object_id.

Performance testing on object ID-heavy workloads doesn't reveal a clear
change in performance.  Out of performance tests t0001 and t1450, there
are slight variations in performance both up and down, but all
measurements are within the margin of error.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache.h: move hash/oid functions to hash.h</title>
<updated>2020-12-04T21:55:14Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-12-04T18:51:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3fa6f2aa57e997ff2e665d83335a8f1078b94cb8'/>
<id>urn:sha1:3fa6f2aa57e997ff2e665d83335a8f1078b94cb8</id>
<content type='text'>
We define git_hash_algo and object_id in hash.h, but most of the utility
functions are declared in the main cache.h. Let's move them to hash.h
along with their struct definitions. This cleans up cache.h a bit, but
also avoids circular dependencies when other headers need to know about
these functions (e.g., if oid-array.h were to have an inline that used
oideq(), it couldn't include cache.h because it is itself included by
cache.h).

No including C files should be affected, because hash.h is always
included in cache.h already.

We do have to mention repository.h at the top of hash.h, though, since
we depend on the_repository in some of our inline functions.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: implement and use a context cloning function</title>
<updated>2020-02-24T17:33:21Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2020-02-22T20:17:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=768e30ea27c58aa69893e10b96ba5ba5680dc3cf'/>
<id>urn:sha1:768e30ea27c58aa69893e10b96ba5ba5680dc3cf</id>
<content type='text'>
For all of our SHA-1 implementations and most of our SHA-256
implementations, the hash context we use is a real struct.  For these
implementations, it's possible to copy a hash context by making a copy
of the struct.

However, for our libgcrypt implementation, our hash context is a
pointer.  Consequently, copying it does not lead to an independent hash
context like we intended.

Fortunately, however, libgcrypt provides us with a handy function to
copy hash contexts.  Let's add a cloning function to the hash algorithm
API, and use it in the one place we need to make a hash context copy.
With this change, our libgcrypt SHA-256 implementation is fully
functional with all of our other hash implementations.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
