<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/upload-pack.c, branch v2.51.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.51.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.51.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-10-15T17:29:30Z</updated>
<entry>
<title>Merge branch 'ps/upload-pack-oom-protection' into maint-2.51</title>
<updated>2025-10-15T17:29:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-10-15T17:29:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=caba7e3d864cbb19ab19dfa239c9aa641a9ffbbb'/>
<id>urn:sha1:caba7e3d864cbb19ab19dfa239c9aa641a9ffbbb</id>
<content type='text'>
A broken or malicious "git fetch" can say that it has the same
object for many many times, and the upload-pack serving it can
exhaust memory storing them redundantly, which has been corrected.

* ps/upload-pack-oom-protection:
  upload-pack: don't ACK non-commits repeatedly in protocol v2
  t5530: modernize tests
</content>
</entry>
<entry>
<title>upload-pack: don't ACK non-commits repeatedly in protocol v2</title>
<updated>2025-09-05T21:35:53Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-09-05T06:18:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=88a2dc68c8c9a2cf04c6d6d52e4ac3b26788e273'/>
<id>urn:sha1:88a2dc68c8c9a2cf04c6d6d52e4ac3b26788e273</id>
<content type='text'>
When a client performs a fetch or clone they can optionally send "have"
lines to tell the server which objects they already have available
locally. These object IDs are stored by the server in an object array so
that it can remember any objects it doesn't have to include in the pack
sent to the client.

While there isn't any reason to do so, clients are free to send the same
"have" line repeatedly. git-upload-pack(1) already knows to handle this
well: every commit it has seen via a "have" line gets marked with the
`THEY_HAVE` flag, and if such a commit is seen repeatedly we know to not
process it another time. This also has the effect that we only store the
object ID once, only, in the `have_obj` array.

There is an edge case though: if the client sends an object ID that does
not refer to a commit we neither store nor check the `THEY_HAVE` flag.
This means that we repeatedly store the same object ID in our `have_obj`
array, with two consequences:

  - In protocol v2 we deduplicate ACKs for commits, but not for any
    other objects as we send ACKs for every object ID in the `have_obj`
    array.

  - The `have_obj` array can grow in size indefinitely with both
    protocols.

The potentially-more-serious issue is the second one, as we basically
have a way for an adversary to allocate arbitrarily large buffers now.
Ultimately, this doesn't seem to be all that serious though: on my
machine, the growth of that array is at around 4MB/s, and after roughly
five minutes I was only at 1GB RSS. So this is concerning, but only
mildly so.

Fix this bug by storing the `THEY_HAVE` flag independent of the object
type so that we don't store duplicate object IDs in `have_obj` anymore.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>odb: rename `has_object()`</title>
<updated>2025-07-01T21:46:38Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fcf8e3e111ec46705f91151baee40f2c0a3637da'/>
<id>urn:sha1:fcf8e3e111ec46705f91151baee40f2c0a3637da</id>
<content type='text'>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.

Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: rename files to "odb.{c,h}"</title>
<updated>2025-07-01T21:46:34Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8f49151763cb81adf4bcec53c1ae67057081b02d'/>
<id>urn:sha1:8f49151763cb81adf4bcec53c1ae67057081b02d</id>
<content type='text'>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>upload-pack: rename `enum` to reflect the operation</title>
<updated>2025-05-15T20:46:46Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2025-05-15T13:11:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bf0468e2ba64ac358a61cb01a675b7c5919d64fd'/>
<id>urn:sha1:bf0468e2ba64ac358a61cb01a675b7c5919d64fd</id>
<content type='text'>
While 3145ea957d (upload-pack: introduce fetch server command,
2018-03-15) added support for the `fetch` command, from the server's
point of view it is an upload, and hence the `enum` should really be
called `upload_state` instead of `fetch_state`. Likewise, rename its
values.

This also helps unconfuse CodeQL which would otherwise be at sixes or
sevens about having _two_ non-local definitions of the same `enum` with
the same values.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: convert users of `repo_has_object_file()` to `has_object()`</title>
<updated>2025-04-29T17:08:13Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-29T07:52:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=062b914c841329a003f74e1340ea5178391274a6'/>
<id>urn:sha1:062b914c841329a003f74e1340ea5178391274a6</id>
<content type='text'>
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:

  - The new function has a short-and-sweet name.

  - More explicit defaults: `has_object()` doesn't fetch missing objects
    via promisor remotes, and neither does it reload packfiles if an
    object wasn't found by default. This ensures that it becomes
    immediately obvious when a simple object existence check may result
    in expensive actions.

Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.

Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:

  - `repo_has_object_file(...)` is equivalent to
    `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
    is equivalent to `has_object(..., 0)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
    is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
    is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.

The replacements should be functionally equivalent.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: merge "object-store-ll.h" and "object-store.h"</title>
<updated>2025-04-15T15:24:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68cd492a3e662c75dec364986c81e94716d4ac56'/>
<id>urn:sha1:68cd492a3e662c75dec364986c81e94716d4ac56</id>
<content type='text'>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object: stop depending on `the_repository`</title>
<updated>2025-03-10T20:16:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=74d414c9f14a91a3b7bd04972bf3eb9bbe6fd81b'/>
<id>urn:sha1:74d414c9f14a91a3b7bd04972bf3eb9bbe6fd81b</id>
<content type='text'>
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Add 'promisor-remote' capability to protocol v2</title>
<updated>2025-02-18T19:05:37Z</updated>
<author>
<name>Christian Couder</name>
<email>christian.couder@gmail.com</email>
</author>
<published>2025-02-18T11:32:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d460267613da14eba959eb225e2cbf6a1e132eb1'/>
<id>urn:sha1:d460267613da14eba959eb225e2cbf6a1e132eb1</id>
<content type='text'>
When a server S knows that some objects from a repository are available
from a promisor remote X, S might want to suggest to a client C cloning
or fetching the repo from S that C may use X directly instead of S for
these objects.

Note that this could happen both in the case S itself doesn't have the
objects and borrows them from X, and in the case S has the objects but
knows that X is better connected to the world (e.g., it is in a
$LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections)
than S. Implementation of the latter case, which would require S to
omit in its response the objects available on X, is left for future
improvement though.

Then C might or might not, want to get the objects from X. If S and C
can agree on C using X directly, S can then omit objects that can be
obtained from X when answering C's request.

To allow S and C to agree and let each other know about C using X or
not, let's introduce a new "promisor-remote" capability in the
protocol v2, as well as a few new configuration variables:

  - "promisor.advertise" on the server side, and:
  - "promisor.acceptFromServer" on the client side.

By default, or if "promisor.advertise" is set to 'false', a server S will
not advertise the "promisor-remote" capability.

If S doesn't advertise the "promisor-remote" capability, then a client C
replying to S shouldn't advertise the "promisor-remote" capability
either.

If "promisor.advertise" is set to 'true', S will advertise its promisor
remotes with a string like:

  promisor-remote=&lt;pr-info&gt;[;&lt;pr-info&gt;]...

where each &lt;pr-info&gt; element contains information about a single
promisor remote in the form:

  name=&lt;pr-name&gt;[,url=&lt;pr-url&gt;]

where &lt;pr-name&gt; is the urlencoded name of a promisor remote and
&lt;pr-url&gt; is the urlencoded URL of the promisor remote named &lt;pr-name&gt;.

For now, the URL is passed in addition to the name. In the future, it
might be possible to pass other information like a filter-spec that the
client may use when cloning from S, or a token that the client may use
when retrieving objects from X.

It is C's responsibility to arrange how it can reach X though, so pieces
of information that are usually outside Git's concern, like proxy
configuration, must not be distributed over this protocol.

It might also be possible in the future for "promisor.advertise" to have
other values. For example a value like "onlyName" could prevent S from
advertising URLs, which could help in case C should use a different URL
for X than the URL S is using. (The URL S is using might be an internal
one on the server side for example.)

By default or if "promisor.acceptFromServer" is set to "None", C will
not accept to use the promisor remotes that might have been advertised
by S. In this case, C will not advertise any "promisor-remote"
capability in its reply to S.

If "promisor.acceptFromServer" is set to "All" and S advertised some
promisor remotes, then on the contrary, C will accept to use all the
promisor remotes that S advertised and C will reply with a string like:

  promisor-remote=&lt;pr-name&gt;[;&lt;pr-name&gt;]...

where the &lt;pr-name&gt; elements are the urlencoded names of all the
promisor remotes S advertised.

In a following commit, other values for "promisor.acceptFromServer" will
be implemented, so that C will be able to decide the promisor remotes it
accepts depending on the name and URL it received from S. So even if
that name and URL information is not used much right now, it will be
needed soon.

Helped-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Helped-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Christian Couder &lt;chriscool@tuxfamily.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
