<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-write.c, branch v2.47.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.47.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.47.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-10-02T14:46:27Z</updated>
<entry>
<title>Merge branch 'tb/weak-sha1-for-tail-sum'</title>
<updated>2024-10-02T14:46:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-02T14:46:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ead0a050e2eddf8c67ee3404e165bffd42c6fd42'/>
<id>urn:sha1:ead0a050e2eddf8c67ee3404e165bffd42c6fd42</id>
<content type='text'>
The checksum at the tail of files are now computed without
collision detection protection.  This is safe as the consumer of
the information to protect itself from replay attacks checks for
hash collisions independently.

* tb/weak-sha1-for-tail-sum:
  csum-file.c: use unsafe SHA-1 implementation when available
  Makefile: allow specifying a SHA-1 for non-cryptographic uses
  hash.h: scaffolding for _unsafe hashing variants
  sha1: do not redefine `platform_SHA_CTX` and friends
  pack-objects: use finalize_object_file() to rename pack/idx/etc
  finalize_object_file(): implement collision check
  finalize_object_file(): refactor unlink_or_warn() placement
  finalize_object_file(): check for name collision before renaming
</content>
</entry>
<entry>
<title>pack-objects: use finalize_object_file() to rename pack/idx/etc</title>
<updated>2024-09-27T18:27:47Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c177d3dc50d59042b1756e352e19f2dd8b01c25a'/>
<id>urn:sha1:c177d3dc50d59042b1756e352e19f2dd8b01c25a</id>
<content type='text'>
In most places that write files to the object database (even packfiles
via index-pack or fast-import), we use finalize_object_file(). This
prefers link()/unlink() over rename(), because it means we will prefer
data that is already in the repository to data that we are newly
writing.

We should do the same thing in pack-objects. Even though we don't think
of it as accepting outside data (and thus not being susceptible to
collision attacks), in theory a determined attacker could present just
the right set of objects to cause an incremental repack to generate
a pack with their desired hash.

This has some test and real-world fallout, as seen in the adjustment to
t5303 below. That test script assumes that we can "fix" corruption by
repacking into a good state, including when the pack generated by that
repack operation collides with a (corrupted) pack with the same hash.
This violates our assumption from the previous adjustments to
finalize_object_file() that if we're moving a new file over an existing
one, that since their checksums match, so too must their contents.

This makes "fixing" corruption like this a more explicit operation,
since the test (and users, who may fix real-life corruption using a
similar technique) must first move the broken contents out of the way.

Note also that we now call adjust_shared_perm() twice. We already call
adjust_shared_perm() in stage_tmp_packfiles(), and now call it again in
finalize_object_file(). This is somewhat wasteful, but cleaning up the
existing calls to adjust_shared_perm() is tricky (because sometimes
we're writing to a tmpfile, and sometimes we're writing directly into
the final destination), so let's tolerate some minor waste until we can
more carefully clean up the now-redundant calls.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment: make `get_object_directory()` accept a repository</title>
<updated>2024-09-12T17:15:39Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-12T11:29:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a3673f48986cf990006e56a57e4ad3c7134161e7'/>
<id>urn:sha1:a3673f48986cf990006e56a57e4ad3c7134161e7</id>
<content type='text'>
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: introduce `USE_THE_REPOSITORY_VARIABLE` macro</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7da9385708accf518a80a1e17969020fb361048'/>
<id>urn:sha1:e7da9385708accf518a80a1e17969020fb361048</id>
<content type='text'>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`</title>
<updated>2024-06-14T17:26:32Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:49:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f4836570a7adbd8c70ad7a8edf6ae5a977647c06'/>
<id>urn:sha1:f4836570a7adbd8c70ad7a8edf6ae5a977647c06</id>
<content type='text'>
Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.

As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".

Note that both in this and subsequent commits in this series we always
just pass `the_repository-&gt;hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes in source files</title>
<updated>2023-12-26T20:04:31Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-12-23T17:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eea0e59ffbed6e33d171ace5be13cde9faa41639'/>
<id>urn:sha1:eea0e59ffbed6e33d171ace5be13cde9faa41639</id>
<content type='text'>
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes for wrapper.h</title>
<updated>2023-07-05T18:41:59Z</updated>
<author>
<name>Calvin Wan</name>
<email>calvinwan@google.com</email>
</author>
<published>2023-07-05T17:09:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=da9502ff4dc495471a1a080dc297cd6e4628c10c'/>
<id>urn:sha1:da9502ff4dc495471a1a080dc297cd6e4628c10c</id>
<content type='text'>
Signed-off-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store-ll.h: split this header out of object-store.h</title>
<updated>2023-06-21T20:39:54Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:34:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a034e9106ff1a4cb6fcb6f2ea3a1a47b4d2ba173'/>
<id>urn:sha1:a034e9106ff1a4cb6fcb6f2ea3a1a47b4d2ba173</id>
<content type='text'>
The vast majority of files including object-store.h did not need dir.h
nor khash.h.  Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.

After this patch:
    $ git grep -h include..object-store | sort | uniq -c
          2 #include "object-store.h"
        129 #include "object-store-ll.h"

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>repository: remove unnecessary include of path.h</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c339932bd858e84490c8690d393307a58764d6ed'/>
<id>urn:sha1:c339932bd858e84490c8690d393307a58764d6ed</id>
<content type='text'>
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/enable-cruft-packs-by-default'</title>
<updated>2023-04-28T23:03:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-28T23:03:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fc23c397c7088a397a693bc5cb483a5894f59f53'/>
<id>urn:sha1:fc23c397c7088a397a693bc5cb483a5894f59f53</id>
<content type='text'>
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time.  Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.

* tb/enable-cruft-packs-by-default:
  repository.h: drop unused `gc_cruft_packs`
  builtin/gc.c: make `gc.cruftPacks` enabled by default
  t/t9300-fast-import.sh: prepare for `gc --cruft` by default
  t/t6500-gc.sh: add additional test cases
  t/t6500-gc.sh: refactor cruft pack tests
  t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
  t/t5304-prune.sh: prepare for `gc --cruft` by default
  builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
  builtin/repack.c: fix incorrect reference to '-C'
  pack-write.c: plug a leak in stage_tmp_packfiles()
</content>
</entry>
</feed>
