<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object-file.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-06T11:20:02Z</updated>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: inline empty tree and blob literals</title>
<updated>2024-11-18T12:48:48Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-11-18T09:55:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2af8ead52be9b72f3db76c9016cc4444eea33544'/>
<id>urn:sha1:2af8ead52be9b72f3db76c9016cc4444eea33544</id>
<content type='text'>
We define macros with the bytes of the empty trees and blobs for sha1
and sha256. But since e1ccd7e2b1 (sha1_file: only expose empty object
constants through git_hash_algo, 2018-05-02), those are used only for
initializing the git_hash_algo entries. Any other code using the macros
directly would be suspicious, since a hash_algo pointer is the level of
indirection we use to make everything work with both sha1 and sha256.

So let's future proof against code doing the wrong thing by dropping the
macros entirely and just initializing the structs directly.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: treat cached_object values as const</title>
<updated>2024-11-18T12:48:48Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-11-18T09:55:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e37feea00b2b81c0295fddb4f5137d12ea1825c0'/>
<id>urn:sha1:e37feea00b2b81c0295fddb4f5137d12ea1825c0</id>
<content type='text'>
The cached-object API maps oids to in-memory entries. Once inserted,
these entries should be immutable. Let's return them from the
find_cached_object() call with a const tag to make this clear.

Suggested-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: drop oid field from find_cached_object() return value</title>
<updated>2024-11-18T12:48:48Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-11-18T09:55:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9202ffcf1064f883aacc4aba8016918e1d8d8243'/>
<id>urn:sha1:9202ffcf1064f883aacc4aba8016918e1d8d8243</id>
<content type='text'>
The pretend_object_file() function adds to an array mapping oids to
object contents, which are later retrieved with find_cached_object().
We naturally need to store the oid for each entry, since it's the lookup
key.

But find_cached_object() also returns a hard-coded empty_tree object.
There we don't care about its oid field and instead compare against
the_hash_algo-&gt;empty_tree. The oid field is left as all-zeroes.

This all works, but it means that the cached_object struct we return
from find_cached_object() may or may not have a valid oid field, depend
whether it is the hard-coded tree or came from pretend_object_file().

Nobody looks at the field, so there's no bug. But let's future-proof it
by returning only the object contents themselves, not the oid. We'll
continue to call this "struct cached_object", and the array entry
mapping the key to those contents will be a "cached_object_entry".

This would also let us swap out the array for a better data structure
(like a hashmap) if we chose, but there's not much point. The only code
that adds an entry is git-blame, which adds at most a single entry per
process.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: move empty_tree struct into find_cached_object()</title>
<updated>2024-11-18T12:48:47Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-11-18T09:55:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b2a95dfd63e812dc4abe5750371f2f0596d2d063'/>
<id>urn:sha1:b2a95dfd63e812dc4abe5750371f2f0596d2d063</id>
<content type='text'>
The fake empty_tree struct is a static global, but the only code that
looks at it is find_cached_object(). The struct itself is a little odd,
with an invalid "oid" field that is handled specially by that function.

Since it's really just an implementation detail, let's move it to a
static within the function. That future-proofs against other code trying
to use it and seeing the weird oid value.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: drop confusing oid initializer of empty_tree struct</title>
<updated>2024-11-18T12:48:47Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-11-18T09:55:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2911f9ed1eccf92c4a98c50c3a88abb2c03a8126'/>
<id>urn:sha1:2911f9ed1eccf92c4a98c50c3a88abb2c03a8126</id>
<content type='text'>
We treat the empty tree specially, providing an in-memory "cached" copy,
which allows you to diff against it even if the object doesn't exist in
the repository. This is implemented as part of the larger cached_object
subsystem, but we use a stand-alone empty_tree struct.

We initialize the oid of that struct using EMPTY_TREE_SHA1_BIN_LITERAL.
At first glance, that seems like a bug; how could this ever work for
sha256 repositories?

The answer is that we never look at the oid field! The oid field is used
to look up entries added by pretend_object_file() to the cached_objects
array. But for our stand-alone entry, we look for it independently using
the_hash_algo-&gt;empty_tree, which will point to the correct algo struct
for the repository.

This happened in 62ba93eaa9 (sha1_file: convert cached object code to
struct object_id, 2018-05-02), which even mentions that this field is
never used. Let's reduce confusion for anybody reading this code by
replacing the sha1 initializer with a comment. The resulting field will
be all-zeroes, so any violation of our assumption that the oid field is
not used will break equally for sha1 and sha256.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: prefer array-of-bytes initializer for hash literals</title>
<updated>2024-11-18T12:48:47Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-11-18T09:54:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e770f36307202b1e87e57a3f355dcdac89d4f5aa'/>
<id>urn:sha1:e770f36307202b1e87e57a3f355dcdac89d4f5aa</id>
<content type='text'>
We hard-code a few well-known hash values for empty trees and blobs in
both sha1 and sha256 formats. We do so with string literals like this:

  #define EMPTY_TREE_SHA256_BIN_LITERAL \
         "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
         "\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
         "\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
         "\x53\x21"

and then use it to initialize the hash field of an object_id struct.
That hash field is exactly 32 bytes long (the size we need for sha256).
But the string literal above is actually 33 bytes long due to the NUL
terminator. This is legal in C, and the NUL is ignored.

  Side note on legality: in general excess initializer elements are
  forbidden, and gcc will warn on both of these:

    char foo[3] = { 'h', 'u', 'g', 'e' };
    char bar[3] = "VeryLongString";

  I couldn't find specific language in the standard allowing
  initialization from a string literal where _just_ the NUL is ignored,
  but C99 section 6.7.8 (Initialization), paragraph 32 shows this exact
  case as "example 8".

However, the upcoming gcc 15 will start warning for this case (when
compiled with -Wextra via DEVELOPER=1):

      CC object-file.o
  object-file.c:52:9: warning: initializer-string for array of ‘unsigned char’ is too long [-Wunterminated-string-initialization]
     52 |         "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  object-file.c:79:17: note: in expansion of macro ‘EMPTY_TREE_SHA256_BIN_LITERAL’

which is understandable. Even though this is not a bug for us, since we
do not care about the NUL terminator (and are just using the literal as
a convenient format), it would be easy to accidentally create an array
that was mistakenly unterminated.

We can avoid this warning by switching the initializer to an actual
array of unsigned values. That arguably demonstrates our intent more
clearly anyway.

Reported-by: Sam James &lt;sam@gentoo.org&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/weak-sha1-for-tail-sum'</title>
<updated>2024-10-02T14:46:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-02T14:46:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ead0a050e2eddf8c67ee3404e165bffd42c6fd42'/>
<id>urn:sha1:ead0a050e2eddf8c67ee3404e165bffd42c6fd42</id>
<content type='text'>
The checksum at the tail of files are now computed without
collision detection protection.  This is safe as the consumer of
the information to protect itself from replay attacks checks for
hash collisions independently.

* tb/weak-sha1-for-tail-sum:
  csum-file.c: use unsafe SHA-1 implementation when available
  Makefile: allow specifying a SHA-1 for non-cryptographic uses
  hash.h: scaffolding for _unsafe hashing variants
  sha1: do not redefine `platform_SHA_CTX` and friends
  pack-objects: use finalize_object_file() to rename pack/idx/etc
  finalize_object_file(): implement collision check
  finalize_object_file(): refactor unlink_or_warn() placement
  finalize_object_file(): check for name collision before renaming
</content>
</entry>
<entry>
<title>hash.h: scaffolding for _unsafe hashing variants</title>
<updated>2024-09-27T18:27:47Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=253ed9ecfffa3e50b95e08bb513fdf9efcc5a85f'/>
<id>urn:sha1:253ed9ecfffa3e50b95e08bb513fdf9efcc5a85f</id>
<content type='text'>
Git's default SHA-1 implementation is collision-detecting, which hardens
us against known SHA-1 attacks against Git objects. This makes Git
object writes safer at the expense of some speed when hashing through
the collision-detecting implementation, which is slower than
non-collision detecting alternatives.

Prepare for loading a separate "unsafe" SHA-1 implementation that can be
used for non-cryptographic purposes, like computing the checksum of
files that use the hashwrite() API.

This commit does not actually introduce any new compile-time knobs to
control which implementation is used as the unsafe SHA-1 variant, but
does add scaffolding so that the "git_hash_algo" structure has five new
function pointers which are "unsafe" variants of the five existing
hashing-related function pointers:

  - git_hash_init_fn unsafe_init_fn
  - git_hash_clone_fn unsafe_clone_fn
  - git_hash_update_fn unsafe_update_fn
  - git_hash_final_fn unsafe_final_fn
  - git_hash_final_oid_fn unsafe_final_oid_fn

The following commit will introduce compile-time knobs to specify which
SHA-1 implementation is used for non-cryptographic uses.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>finalize_object_file(): implement collision check</title>
<updated>2024-09-27T18:27:47Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b1b8dfde6929ec9463eca0a858c4adb9786d7c93'/>
<id>urn:sha1:b1b8dfde6929ec9463eca0a858c4adb9786d7c93</id>
<content type='text'>
We've had "FIXME!!! Collision check here ?" in finalize_object_file()
since aac1794132 (Improve sha1 object file writing., 2005-05-03). That
is, when we try to write a file with the same name, we assume the
on-disk contents are the same and blindly throw away the new copy.

One of the reasons we never implemented this is because the files it
moves are all named after the cryptographic hash of their contents
(either loose objects, or packs which have their hash in the name these
days). So we are unlikely to see such a collision by accident. And even
though there are weaknesses in sha1, we assume they are mitigated by our
use of sha1dc.

So while it's a theoretical concern now, it hasn't been a priority.
However, if we start using weaker hashes for pack checksums and names,
this will become a practical concern. So in preparation, let's actually
implement a byte-for-byte collision check.

The new check will cause the write of new differing content to be a
failure, rather than a silent noop, and we'll retain the temporary file
on disk. If there's no collision present, we'll clean up the temporary
file as usual after either rename()-ing or link()-ing it into place.

Note that this may cause some extra computation when the files are in
fact identical, but this should happen rarely.

Loose objects are exempt from this check, and the collision check may be
skipped by calling the _flags variant of this function with the
FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons:

  - We don't treat the hash of the loose object file's contents as a
    checksum, since the same loose object can be stored using different
    bytes on disk (e.g., when adjusting core.compression, using a
    different version of zlib, etc.).

    This is fundamentally different from cases where
    finalize_object_file() is operating over a file which uses the hash
    value as a checksum of the contents. In other words, a pair of
    identical loose objects can be stored using different bytes on disk,
    and that should not be treated as a collision.

  - We already use the path of the loose object as its hash value /
    object name, so checking for collisions at the content level doesn't
    add anything.

    Adding a content-level collision check would have to happen at a
    higher level than in finalize_object_file(), since (avoiding race
    conditions) writing an object loose which already exists in the
    repository will prevent us from even reaching finalize_object_file()
    via the object freshening code.

    There is a collision check in index-pack via its `check_collision()`
    function, but there isn't an analogous function in unpack-objects,
    which just feeds the result to write_object_file().

    So skipping the collision check here does not change for better or
    worse the hardness of loose object writes.

As a small note related to the latter bullet point above, we must teach
the tmp-objdir routines to similarly skip the content-level collision
checks when calling migrate_one() on a loose object file, which we do by
setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
object shard.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Helped-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
