<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/bundle-uri.c, branch v2.40.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.40.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.40.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-02-16T01:11:52Z</updated>
<entry>
<title>Merge branch 'ds/bundle-uri-5'</title>
<updated>2023-02-16T01:11:52Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-02-16T01:11:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4f598364519f5bb871caf12a482ed117a270588b'/>
<id>urn:sha1:4f598364519f5bb871caf12a482ed117a270588b</id>
<content type='text'>
The bundle-URI subsystem adds support for creation-token heuristics
to help incremental fetches.

* ds/bundle-uri-5:
  bundle-uri: test missing bundles with heuristic
  bundle-uri: store fetch.bundleCreationToken
  fetch: fetch from an external bundle URI
  bundle-uri: drop bundle.flag from design doc
  clone: set fetch.bundleURI if appropriate
  bundle-uri: download in creationToken order
  bundle-uri: parse bundle.&lt;id&gt;.creationToken values
  bundle-uri: parse bundle.heuristic=creationToken
  t5558: add tests for creationToken heuristic
  bundle: verify using check_connected()
  bundle: test unbundling with incomplete history
</content>
</entry>
<entry>
<title>bundle-uri: store fetch.bundleCreationToken</title>
<updated>2023-01-31T16:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-01-31T13:29:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c429bed10223135a13f1f2f854e367762c37af7a'/>
<id>urn:sha1:c429bed10223135a13f1f2f854e367762c37af7a</id>
<content type='text'>
When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>clone: set fetch.bundleURI if appropriate</title>
<updated>2023-01-31T16:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-01-31T13:29:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4074d3c7e1a3b44c5c02235799d798f8b2613333'/>
<id>urn:sha1:4074d3c7e1a3b44c5c02235799d798f8b2613333</id>
<content type='text'>
Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.&lt;id&gt;.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle-uri: download in creationToken order</title>
<updated>2023-01-31T16:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-01-31T13:29:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7903efb7170ca9704e80ef9188bef37ecf55f2d6'/>
<id>urn:sha1:7903efb7170ca9704e80ef9188bef37ecf55f2d6</id>
<content type='text'>
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

 ---------------- bundle-4

       4
      / \
 ----|---|------- bundle-3
     |   |
     |   3
     |   |
 ----|---|------- bundle-2
     |   |
     2   |
     |   |
 ----|---|------- bundle-1
      \ /
       1
       |
 (previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle-uri: parse bundle.&lt;id&gt;.creationToken values</title>
<updated>2023-01-31T16:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-01-31T13:29:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=512fccf8a503bd8617fe46cb62c77480b83fbaea'/>
<id>urn:sha1:512fccf8a503bd8617fe46cb62c77480b83fbaea</id>
<content type='text'>
The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.&lt;id&gt;.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle-uri: parse bundle.heuristic=creationToken</title>
<updated>2023-01-31T16:57:48Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-01-31T13:29:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c93c3d2fa42bec64948441cb339d78e2044ff9a3'/>
<id>urn:sha1:c93c3d2fa42bec64948441cb339d78e2044ff9a3</id>
<content type='text'>
The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.&lt;id&gt;.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.

Before implementing any logic that parses or uses the
bundle.&lt;id&gt;.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.

As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>convert trivial uses of strncmp() to starts_with()</title>
<updated>2023-01-08T01:34:35Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-01-07T13:26:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=20869d1a1d30e9a64c66953a0f4c7245089009cf'/>
<id>urn:sha1:20869d1a1d30e9a64c66953a0f4c7245089009cf</id>
<content type='text'>
It's more readable to use starts_with() instead of strncmp() to match a
prefix, as the latter requires a manually-computed length, and has the
funny "matching is zero" return value common to cmp functions.  This
patch converts several cases which were found with:

  git grep 'strncmp(.*, [0-9]*)'

But note that it doesn't convert all such cases. There are several where
the magic length number is repeated elsewhere in the code, like:

  /* handle "buf" which isn't NUL-terminated and might be too small */
  if (len &gt;= 3 &amp;&amp; !strncmp(buf, "foo", 3))

or:

  /* exact match for "foo", but within a larger string */
  if (end - buf == 3 &amp;&amp; !strncmp(buf, "foo", 3))

While it would not produce the wrong outcome to use starts_with() in
these cases, we'd still be left with one instance of "3". We're better
to leave them for now, as the repeated "3" makes it clear that the two
are linked (there may be other refactorings that handle both, but
they're out of scope for this patch).

A few things to note while reading the patch:

  - all cases but one are trying to match, and so lose the extra "!".
    The case in the first hunk of urlmatch.c is not-matching, and hence
    gains a "!".

  - the case in remote-fd.c is matching the beginning of "connect foo",
    but we never look at str+8 to parse the "foo" part (which would make
    this a candidate for skip_prefix(), not starts_with()). This seems
    at first glance like a bug, but is a limitation of how remote-fd
    works.

  - the second hunk in urlmatch.c shows some cases adjacent to other
    strncmp() calls that are left. These are of the "exact match within
    a larger string" type, as described above.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle-uri: download bundles from an advertised list</title>
<updated>2022-12-25T07:24:24Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-12-22T15:14:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=12b0a14b9e5f140a9a4d5a3e3c47fae00a53c6fc'/>
<id>urn:sha1:12b0a14b9e5f140a9a4d5a3e3c47fae00a53c6fc</id>
<content type='text'>
The logic in fetch_bundle_uri() is useful for the --bundle-uri option of
'git clone', but is not helpful when the clone operation discovers a
list of URIs from the bundle-uri protocol v2 command. To actually
download and unbundle the advertised bundles, we need a different
mechanism.

Create the new fetch_bundle_list() method which is very similar to
fetch_bundle_uri() except that it relies on download_bundle_list()
instead of fetch_bundle_uri_internal(). The download_bundle_list()
method will recursively call fetch_bundle_uri_internal() if any of the
advertised URIs serve a bundle list instead of a bundle. This will also
follow the bundle.list.mode setting from the input list: "any" will
download only one such URI while "all" will download data from all of
the URIs.

In an identical way to fetch_bundle_uri(), the bundles are unbundled
after all of the bundle lists have been expanded and all necessary URIs.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle-uri: allow relative URLs in bundle lists</title>
<updated>2022-12-25T07:24:24Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-12-22T15:14:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ebc39479558fb4893c72a35065e6792515acef86'/>
<id>urn:sha1:ebc39479558fb4893c72a35065e6792515acef86</id>
<content type='text'>
Bundle providers may want to distribute that data across multiple CDNs.
This might require a change in the base URI, all the way to the domain
name. If all bundles require an absolute URI in their 'uri' value, then
every push to a CDN would require altering the table of contents to
match the expected domain and exact location within it.

Allow a bundle list to specify a relative URI for the bundles. This URI
is based on where the client received the bundle list. For a list
provided in the 'bundle-uri' protocol v2 command, the Git remote URI is
the base URI. Otherwise, the bundle list was provided from an HTTP URI
not using the Git protocol, and that URI is the base URI. This allows
easier distribution of bundle data.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle-uri: serve bundle.* keys from config</title>
<updated>2022-12-25T07:24:24Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-12-22T15:14:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=738dc7d4a5f42376bc748008dd8792aa4eb25627'/>
<id>urn:sha1:738dc7d4a5f42376bc748008dd8792aa4eb25627</id>
<content type='text'>
Implement the "bundle-uri" protocol v2 capability by populating the
key=value packet lines from the local Git config. The list of bundles is
provided from the keys beginning with "bundle.".

In the future, we may want to filter this list to be more specific to
the exact known keys that the server intends to share, but for
flexibility at the moment we will assume that the config values are
well-formed.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
