<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object-file.c, branch v2.47.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.47.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.47.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-10-02T14:46:27Z</updated>
<entry>
<title>Merge branch 'tb/weak-sha1-for-tail-sum'</title>
<updated>2024-10-02T14:46:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-02T14:46:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ead0a050e2eddf8c67ee3404e165bffd42c6fd42'/>
<id>urn:sha1:ead0a050e2eddf8c67ee3404e165bffd42c6fd42</id>
<content type='text'>
The checksum at the tail of files are now computed without
collision detection protection.  This is safe as the consumer of
the information to protect itself from replay attacks checks for
hash collisions independently.

* tb/weak-sha1-for-tail-sum:
  csum-file.c: use unsafe SHA-1 implementation when available
  Makefile: allow specifying a SHA-1 for non-cryptographic uses
  hash.h: scaffolding for _unsafe hashing variants
  sha1: do not redefine `platform_SHA_CTX` and friends
  pack-objects: use finalize_object_file() to rename pack/idx/etc
  finalize_object_file(): implement collision check
  finalize_object_file(): refactor unlink_or_warn() placement
  finalize_object_file(): check for name collision before renaming
</content>
</entry>
<entry>
<title>hash.h: scaffolding for _unsafe hashing variants</title>
<updated>2024-09-27T18:27:47Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=253ed9ecfffa3e50b95e08bb513fdf9efcc5a85f'/>
<id>urn:sha1:253ed9ecfffa3e50b95e08bb513fdf9efcc5a85f</id>
<content type='text'>
Git's default SHA-1 implementation is collision-detecting, which hardens
us against known SHA-1 attacks against Git objects. This makes Git
object writes safer at the expense of some speed when hashing through
the collision-detecting implementation, which is slower than
non-collision detecting alternatives.

Prepare for loading a separate "unsafe" SHA-1 implementation that can be
used for non-cryptographic purposes, like computing the checksum of
files that use the hashwrite() API.

This commit does not actually introduce any new compile-time knobs to
control which implementation is used as the unsafe SHA-1 variant, but
does add scaffolding so that the "git_hash_algo" structure has five new
function pointers which are "unsafe" variants of the five existing
hashing-related function pointers:

  - git_hash_init_fn unsafe_init_fn
  - git_hash_clone_fn unsafe_clone_fn
  - git_hash_update_fn unsafe_update_fn
  - git_hash_final_fn unsafe_final_fn
  - git_hash_final_oid_fn unsafe_final_oid_fn

The following commit will introduce compile-time knobs to specify which
SHA-1 implementation is used for non-cryptographic uses.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>finalize_object_file(): implement collision check</title>
<updated>2024-09-27T18:27:47Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b1b8dfde6929ec9463eca0a858c4adb9786d7c93'/>
<id>urn:sha1:b1b8dfde6929ec9463eca0a858c4adb9786d7c93</id>
<content type='text'>
We've had "FIXME!!! Collision check here ?" in finalize_object_file()
since aac1794132 (Improve sha1 object file writing., 2005-05-03). That
is, when we try to write a file with the same name, we assume the
on-disk contents are the same and blindly throw away the new copy.

One of the reasons we never implemented this is because the files it
moves are all named after the cryptographic hash of their contents
(either loose objects, or packs which have their hash in the name these
days). So we are unlikely to see such a collision by accident. And even
though there are weaknesses in sha1, we assume they are mitigated by our
use of sha1dc.

So while it's a theoretical concern now, it hasn't been a priority.
However, if we start using weaker hashes for pack checksums and names,
this will become a practical concern. So in preparation, let's actually
implement a byte-for-byte collision check.

The new check will cause the write of new differing content to be a
failure, rather than a silent noop, and we'll retain the temporary file
on disk. If there's no collision present, we'll clean up the temporary
file as usual after either rename()-ing or link()-ing it into place.

Note that this may cause some extra computation when the files are in
fact identical, but this should happen rarely.

Loose objects are exempt from this check, and the collision check may be
skipped by calling the _flags variant of this function with the
FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons:

  - We don't treat the hash of the loose object file's contents as a
    checksum, since the same loose object can be stored using different
    bytes on disk (e.g., when adjusting core.compression, using a
    different version of zlib, etc.).

    This is fundamentally different from cases where
    finalize_object_file() is operating over a file which uses the hash
    value as a checksum of the contents. In other words, a pair of
    identical loose objects can be stored using different bytes on disk,
    and that should not be treated as a collision.

  - We already use the path of the loose object as its hash value /
    object name, so checking for collisions at the content level doesn't
    add anything.

    Adding a content-level collision check would have to happen at a
    higher level than in finalize_object_file(), since (avoiding race
    conditions) writing an object loose which already exists in the
    repository will prevent us from even reaching finalize_object_file()
    via the object freshening code.

    There is a collision check in index-pack via its `check_collision()`
    function, but there isn't an analogous function in unpack-objects,
    which just feeds the result to write_object_file().

    So skipping the collision check here does not change for better or
    worse the hardness of loose object writes.

As a small note related to the latter bullet point above, we must teach
the tmp-objdir routines to similarly skip the content-level collision
checks when calling migrate_one() on a loose object file, which we do by
setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
object shard.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Helped-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>finalize_object_file(): refactor unlink_or_warn() placement</title>
<updated>2024-09-27T18:27:46Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9ca7c2c13bc26afe673172a9b417a43519381968'/>
<id>urn:sha1:9ca7c2c13bc26afe673172a9b417a43519381968</id>
<content type='text'>
As soon as we've tried to link() a temporary object into place, we then
unlink() the tempfile immediately, whether we were successful or not.

For the success case, this is because we no longer need the old file
(it's now linked into place).

For the error case, there are two outcomes. Either we got EEXIST, in
which case we consider the collision to be a noop. Or we got a system
error, in which we case we are just cleaning up after ourselves.

Using a single line for all of these cases has some problems:

  - in the error case, our unlink() may clobber errno, which we use in
    the error message

  - for the collision case, there's a FIXME that indicates we should do
    a collision check. In preparation for implementing that, we'll need
    to actually hold on to the file.

Split these three cases into their own calls to unlink_or_warn(). This
is more verbose, but lets us do the right thing in each case.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>finalize_object_file(): check for name collision before renaming</title>
<updated>2024-09-27T18:27:46Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d1b44bb76442431eca44279da32deed249dcf213'/>
<id>urn:sha1:d1b44bb76442431eca44279da32deed249dcf213</id>
<content type='text'>
We prefer link()/unlink() to rename() for object files, with the idea
that we should prefer the data that is already on disk to what is
incoming. But we may fall back to rename() if the user has configured us
to do so, or if the filesystem seems not to support cross-directory
links. This loses the "prefer what is on disk" property.

We can mitigate this somewhat by trying to stat() the destination
filename before doing the rename. This is racy, since the object could
be created between the stat() and rename() calls. But in practice it is
expanding the definition of "what is already on disk" to be the point
that the function is called. That is enough to deal with any potential
attacks where an attacker is trying to collide hashes with what's
already in the repository.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ak/typofix-2.46-maint'</title>
<updated>2024-09-25T17:37:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-09-25T17:37:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=78ce6660bba5fd2faf03efd79912725dcf5be481'/>
<id>urn:sha1:78ce6660bba5fd2faf03efd79912725dcf5be481</id>
<content type='text'>
Typofix.

* ak/typofix-2.46-maint:
  upload-pack: fix a typo
  sideband: fix a typo
  setup: fix a typo
  run-command: fix a typo
  revision: fix a typo
  refs: fix typos
  rebase: fix a typo
  read-cache-ll: fix a typo
  pretty: fix a typo
  object-file: fix a typo
  merge-ort: fix typos
  merge-ll: fix a typo
  http: fix a typo
  gpg-interface: fix a typo
  git-p4: fix typos
  git-instaweb: fix a typo
  fsmonitor-settings: fix a typo
  diffcore-rename: fix typos
  config.mak.dev: fix a typo
</content>
</entry>
<entry>
<title>object-file: fix a typo</title>
<updated>2024-09-19T20:46:00Z</updated>
<author>
<name>Andrew Kreimer</name>
<email>algonell@gmail.com</email>
</author>
<published>2024-09-19T18:34:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=28012b915c1e6c535dcd42fc1460ea070e5f978f'/>
<id>urn:sha1:28012b915c1e6c535dcd42fc1460ea070e5f978f</id>
<content type='text'>
Fix a typo in comments.

Signed-off-by: Andrew Kreimer &lt;algonell@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment: move object database functions into object layer</title>
<updated>2024-09-12T17:15:40Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-12T11:29:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=26b4df907bca4829df44dc7eee23ccf7720898f7'/>
<id>urn:sha1:26b4df907bca4829df44dc7eee23ccf7720898f7</id>
<content type='text'>
The `odb_mkstemp()` and `odb_pack_keep()` functions are quite clearly
tied to the object store, but regardless of that they are located in
"environment.c". Move them over, which also helps to get rid of
dependencies on `the_repository` in the environment subsystem.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment: make `get_object_directory()` accept a repository</title>
<updated>2024-09-12T17:15:39Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-12T11:29:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a3673f48986cf990006e56a57e4ad3c7134161e7'/>
<id>urn:sha1:a3673f48986cf990006e56a57e4ad3c7134161e7</id>
<content type='text'>
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/leakfixes-part-4'</title>
<updated>2024-08-23T16:02:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-08-23T16:02:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1b6b2bfae582e1e5a9b9ad2ef15985f69ade0fce'/>
<id>urn:sha1:1b6b2bfae582e1e5a9b9ad2ef15985f69ade0fce</id>
<content type='text'>
More leak fixes.

* ps/leakfixes-part-4: (22 commits)
  builtin/diff: free symmetric diff members
  diff: free state populated via options
  builtin/log: fix leak when showing converted blob contents
  userdiff: fix leaking memory for configured diff drivers
  builtin/format-patch: fix various trivial memory leaks
  diff: fix leak when parsing invalid ignore regex option
  unpack-trees: clear index when not propagating it
  sequencer: release todo list on error paths
  merge-ort: unconditionally release attributes index
  builtin/fast-export: plug leaking tag names
  builtin/fast-export: fix leaking diff options
  builtin/fast-import: plug trivial memory leaks
  builtin/notes: fix leaking `struct notes_tree` when merging notes
  builtin/rebase: fix leaking `commit.gpgsign` value
  config: fix leaking comment character config
  submodule-config: fix leaking name entry when traversing submodules
  read-cache: fix leaking hashfile when writing index fails
  bulk-checkin: fix leaking state TODO
  object-name: fix leaking symlink paths in object context
  object-file: fix memory leak when reading corrupted headers
  ...
</content>
</entry>
</feed>
