<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-write.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-03T23:21:55Z</updated>
<entry>
<title>config: make `delta_base_cache_limit` a non-global variable</title>
<updated>2024-12-03T23:21:55Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:44:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d6b2d21fbf269db7a6be56d28a62cb65a7d7a660'/>
<id>urn:sha1:d6b2d21fbf269db7a6be56d28a62cb65a7d7a660</id>
<content type='text'>
The `delta_base_cache_limit` variable is a global config variable used
by multiple subsystems. Let's make this non-global, by adding this
variable independently to the subsystems where it is used.

First, add the setting to the `repo_settings` struct, this provides
access to the config in places where the repository is available. Use
this in `packfile.c`.

In `index-pack.c` we add it to the `pack_idx_option` struct and its
constructor. While the repository struct is available here, it may not
be set  because `git index-pack` can be used without a repository.

In `gc.c` add it to the `gc_config` struct and also the constructor
function. The gc functions currently do not have direct access to a
repository struct.

These changes are made to remove the usage of `delta_base_cache_limit`
as a global variable in `packfile.c`. This brings us one step closer to
removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
which we complete in the next patch.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/leakfixes-part-8'</title>
<updated>2024-10-10T21:22:29Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-10T21:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=31bc4454de66c22bc8570fd3af52a99843ac69b0'/>
<id>urn:sha1:31bc4454de66c22bc8570fd3af52a99843ac69b0</id>
<content type='text'>
More leakfixes.

* ps/leakfixes-part-8: (23 commits)
  builtin/send-pack: fix leaking list of push options
  remote: fix leaking push reports
  t/helper: fix leaks in proc-receive helper
  pack-write: fix return parameter of `write_rev_file_order()`
  revision: fix leaking saved parents
  revision: fix memory leaks when rewriting parents
  midx-write: fix leaking buffer
  pack-bitmap-write: fix leaking OID array
  pseudo-merge: fix leaking strmap keys
  pseudo-merge: fix various memory leaks
  line-log: fix several memory leaks
  diff: improve lifecycle management of diff queues
  builtin/revert: fix leaking `gpg_sign` and `strategy` config
  t/helper: fix leaking repository in partial-clone helper
  builtin/clone: fix leaking repo state when cloning with bundle URIs
  builtin/pack-redundant: fix various memory leaks
  builtin/stash: fix leaking `pathspec_from_file`
  submodule: fix leaking submodule entry list
  wt-status: fix leaking buffer with sparse directories
  shell: fix leaking strings
  ...
</content>
</entry>
<entry>
<title>Merge branch 'tb/weak-sha1-for-tail-sum'</title>
<updated>2024-10-02T14:46:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-02T14:46:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ead0a050e2eddf8c67ee3404e165bffd42c6fd42'/>
<id>urn:sha1:ead0a050e2eddf8c67ee3404e165bffd42c6fd42</id>
<content type='text'>
The checksum at the tail of files are now computed without
collision detection protection.  This is safe as the consumer of
the information to protect itself from replay attacks checks for
hash collisions independently.

* tb/weak-sha1-for-tail-sum:
  csum-file.c: use unsafe SHA-1 implementation when available
  Makefile: allow specifying a SHA-1 for non-cryptographic uses
  hash.h: scaffolding for _unsafe hashing variants
  sha1: do not redefine `platform_SHA_CTX` and friends
  pack-objects: use finalize_object_file() to rename pack/idx/etc
  finalize_object_file(): implement collision check
  finalize_object_file(): refactor unlink_or_warn() placement
  finalize_object_file(): check for name collision before renaming
</content>
</entry>
<entry>
<title>pack-write: fix return parameter of `write_rev_file_order()`</title>
<updated>2024-09-30T18:23:08Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-30T09:14:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2f0ee051ddaf880daac06773b56f077c4012a1c7'/>
<id>urn:sha1:2f0ee051ddaf880daac06773b56f077c4012a1c7</id>
<content type='text'>
While the return parameter of `write_rev_file_order()` is a string
constant, the function may indeed return an allocated string when its
first parameter is a `NULL` pointer. This makes for a confusing calling
convention, where callers need to be aware of these intricate ownership
rules and cast away the constness to free the string in some cases.

Adapt the function and its caller `write_rev_file()` to always return an
allocated string and adapt callers to always free the return value.

Note that this requires us to also adapt `rename_tmp_packfile()`, which
compares the pointers to packfile data with each other. Now that the
path of the reverse index file gets allocated unconditionally the check
will always fail. This is fixed by using strcmp(3P) instead, which also
feels way less fragile.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: use finalize_object_file() to rename pack/idx/etc</title>
<updated>2024-09-27T18:27:47Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-09-26T15:22:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c177d3dc50d59042b1756e352e19f2dd8b01c25a'/>
<id>urn:sha1:c177d3dc50d59042b1756e352e19f2dd8b01c25a</id>
<content type='text'>
In most places that write files to the object database (even packfiles
via index-pack or fast-import), we use finalize_object_file(). This
prefers link()/unlink() over rename(), because it means we will prefer
data that is already in the repository to data that we are newly
writing.

We should do the same thing in pack-objects. Even though we don't think
of it as accepting outside data (and thus not being susceptible to
collision attacks), in theory a determined attacker could present just
the right set of objects to cause an incremental repack to generate
a pack with their desired hash.

This has some test and real-world fallout, as seen in the adjustment to
t5303 below. That test script assumes that we can "fix" corruption by
repacking into a good state, including when the pack generated by that
repack operation collides with a (corrupted) pack with the same hash.
This violates our assumption from the previous adjustments to
finalize_object_file() that if we're moving a new file over an existing
one, that since their checksums match, so too must their contents.

This makes "fixing" corruption like this a more explicit operation,
since the test (and users, who may fix real-life corruption using a
similar technique) must first move the broken contents out of the way.

Note also that we now call adjust_shared_perm() twice. We already call
adjust_shared_perm() in stage_tmp_packfiles(), and now call it again in
finalize_object_file(). This is somewhat wasteful, but cleaning up the
existing calls to adjust_shared_perm() is tricky (because sometimes
we're writing to a tmpfile, and sometimes we're writing directly into
the final destination), so let's tolerate some minor waste until we can
more carefully clean up the now-redundant calls.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment: make `get_object_directory()` accept a repository</title>
<updated>2024-09-12T17:15:39Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-12T11:29:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a3673f48986cf990006e56a57e4ad3c7134161e7'/>
<id>urn:sha1:a3673f48986cf990006e56a57e4ad3c7134161e7</id>
<content type='text'>
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: introduce `USE_THE_REPOSITORY_VARIABLE` macro</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7da9385708accf518a80a1e17969020fb361048'/>
<id>urn:sha1:e7da9385708accf518a80a1e17969020fb361048</id>
<content type='text'>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`</title>
<updated>2024-06-14T17:26:32Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:49:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f4836570a7adbd8c70ad7a8edf6ae5a977647c06'/>
<id>urn:sha1:f4836570a7adbd8c70ad7a8edf6ae5a977647c06</id>
<content type='text'>
Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.

As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".

Note that both in this and subsequent commits in this series we always
just pass `the_repository-&gt;hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes in source files</title>
<updated>2023-12-26T20:04:31Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-12-23T17:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eea0e59ffbed6e33d171ace5be13cde9faa41639'/>
<id>urn:sha1:eea0e59ffbed6e33d171ace5be13cde9faa41639</id>
<content type='text'>
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes for wrapper.h</title>
<updated>2023-07-05T18:41:59Z</updated>
<author>
<name>Calvin Wan</name>
<email>calvinwan@google.com</email>
</author>
<published>2023-07-05T17:09:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=da9502ff4dc495471a1a080dc297cd6e4628c10c'/>
<id>urn:sha1:da9502ff4dc495471a1a080dc297cd6e4628c10c</id>
<content type='text'>
Signed-off-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
