<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/fetch-pack.c, branch v2.35.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.35.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.35.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-01-05T21:31:00Z</updated>
<entry>
<title>i18n: factorize "--foo requires --bar" and the like</title>
<updated>2022-01-05T21:31:00Z</updated>
<author>
<name>Jean-Noël Avila</name>
<email>jn.avila@free.fr</email>
</author>
<published>2022-01-05T20:02:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6fa00ee843cb6c8e720180b4642590f5bcc9c8cf'/>
<id>urn:sha1:6fa00ee843cb6c8e720180b4642590f5bcc9c8cf</id>
<content type='text'>
They are all replaced by "the option '%s' requires '%s'", which is a
new string but replaces 17 previous unique strings.

Signed-off-by: Jean-Noël Avila &lt;jn.avila@free.fr&gt;
Reviewed-by: Johannes Sixt &lt;j6t@kdbg.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/fetch-pack-avoid-sigpipe-to-index-pack'</title>
<updated>2021-12-10T22:35:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-12-10T22:35:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=353a27ad95b726e650b5782a1056c4ca81992efb'/>
<id>urn:sha1:353a27ad95b726e650b5782a1056c4ca81992efb</id>
<content type='text'>
"git fetch", when received a bad packfile, can fail with SIGPIPE.
This wasn't wrong per-se, but we now detect the situation and fail
in a more predictable way.

* jk/fetch-pack-avoid-sigpipe-to-index-pack:
  fetch-pack: ignore SIGPIPE when writing to index-pack
</content>
</entry>
<entry>
<title>fetch-pack: ignore SIGPIPE when writing to index-pack</title>
<updated>2021-11-19T22:22:16Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-11-19T20:58:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2a4aed42ecf6ee4ba8824423bd40010c3572aa0c'/>
<id>urn:sha1:2a4aed42ecf6ee4ba8824423bd40010c3572aa0c</id>
<content type='text'>
When fetching, we send the incoming pack to index-pack (or
unpack-objects) via the sideband demuxer. If index-pack hits an error
(e.g., because an object fails fsck), then it will die immediately. This
may cause us to get SIGPIPE on the fetch, as we're still trying to write
pack contents from the sideband demuxer (which is typically a thread,
and thus takes down the whole fetch process).

You can see this in action with:

  ./t5702-protocol-v2.sh --stress --run=59

which ends with (wrapped for readability):

  test_must_fail: died by signal 13: git -c protocol.version=2 \
    -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \
    clone http://127.0.0.1:5708/smart/http_parent http_child
  not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object

This is mostly cosmetic. The actual error of interest (in this case, the
object that failed the fsck check) comes from index-pack straight to
stderr, so the user still sees it. They _might_ even see fetch-pack
complaining about index-pack failing, because the main thread is racing
with the sideband-demuxer. But they'll definitely see the signal death
in the exit code, which is what the test is complaining about.

We can make this more predictable by just ignoring SIGPIPE. The sideband
demuxer uses write_or_die(), so it will notice and stop (gracefully,
because we hook die_routine() to exit just the thread). And during this
section we're not writing anywhere else where we'd be concerned about
SIGPIPE preventing us from wasting effort writing to nowhere.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>fetch-pack: redact packfile urls in traces</title>
<updated>2021-11-11T18:07:43Z</updated>
<author>
<name>Ivan Frade</name>
<email>ifrade@google.com</email>
</author>
<published>2021-11-10T23:51:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=88e9b1e3fcbd3a8edcf1d54528c49f8237906aba'/>
<id>urn:sha1:88e9b1e3fcbd3a8edcf1d54528c49f8237906aba</id>
<content type='text'>
In some setups, packfile uris act as bearer token. It is not
recommended to expose them plainly in logs, although in special
circunstances (e.g. debug) it makes sense to write them.

Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT
variable is set to false. This mimics the redacting of the Authorization
header in HTTP.

Signed-off-by: Ivan Frade &lt;ifrade@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>fetch-pack: optimize loading of refs via commit graph</title>
<updated>2021-09-01T19:43:56Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-09-01T13:09:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=62b5a35a33ad6a4537e2ae75a49036e4173fcc87'/>
<id>urn:sha1:62b5a35a33ad6a4537e2ae75a49036e4173fcc87</id>
<content type='text'>
In order to negotiate a packfile, we need to dereference refs to see
which commits we have in common with the remote. To do so, we first look
up the object's type -- if it's a tag, we peel until we hit a non-tag
object. If we hit a commit eventually, then we return that commit.

In case the object ID points to a commit directly, we can avoid the
initial lookup of the object type by opportunistically looking up the
commit via the commit-graph, if available, which gives us a slight speed
bump of about 2% in a huge repository with about 2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     31.634 s ±  0.258 s    [User: 28.400 s, System: 5.090 s]
      Range (min … max):   31.280 s … 31.896 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     31.129 s ±  0.543 s    [User: 27.976 s, System: 5.056 s]
      Range (min … max):   30.172 s … 31.479 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.02 ± 0.02 times faster than 'HEAD~: git-fetch'

In case this fails, we fall back to the old code which peels the
objects to a commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>connected: refactor iterator to return next object ID directly</title>
<updated>2021-09-01T19:43:56Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-09-01T13:09:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9fec7b213045135655354e864d15894175428d5a'/>
<id>urn:sha1:9fec7b213045135655354e864d15894175428d5a</id>
<content type='text'>
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.

Refactor the iterator and all its implementations to return object IDs
directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:

    Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
      Time (mean ± σ):     30.110 s ±  0.148 s    [User: 27.161 s, System: 5.075 s]
      Range (min … max):   29.934 s … 30.406 s    10 runs

    Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
      Time (mean ± σ):     29.899 s ±  0.109 s    [User: 26.916 s, System: 5.104 s]
      Range (min … max):   29.696 s … 29.996 s    10 runs

    Summary
      '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
        1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'

While this 1% speedup could be labelled as statistically insignificant,
the speedup is consistent on my machine. Furthermore, this is an end to
end test, so it is expected that the improvement in the connectivity
check itself is more significant.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/fetch-pack-load-refs-optim'</title>
<updated>2021-08-24T22:32:41Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-08-24T22:32:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1b2be06e04c5dba2fc25d2eb0cc2bd85d469e350'/>
<id>urn:sha1:1b2be06e04c5dba2fc25d2eb0cc2bd85d469e350</id>
<content type='text'>
Loading of ref tips to prepare for common ancestry negotiation in
"git fetch-pack" has been optimized by taking advantage of the
commit graph when available.

* ps/fetch-pack-load-refs-optim:
  fetch-pack: speed up loading of refs via commit graph
</content>
</entry>
<entry>
<title>fetch-pack: speed up loading of refs via commit graph</title>
<updated>2021-08-04T17:45:32Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-08-04T13:56:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3e5e6c6e94f09973d3a72049aad09fd2131a3648'/>
<id>urn:sha1:3e5e6c6e94f09973d3a72049aad09fd2131a3648</id>
<content type='text'>
When doing reference negotiation, git-fetch-pack(1) is loading all refs
from disk in order to determine which commits it has in common with the
remote repository. This can be quite expensive in repositories with many
references though: in a real-world repository with around 2.2 million
refs, fetching a single commit by its ID takes around 44 seconds.

Dominating the loading time is decompression and parsing of the objects
which are referenced by commits. Given the fact that we only care about
commits (or tags which can be peeled to one) in this context, there is
thus an easy performance win by switching the parsing logic to make use
of the commit graph in case we have one available. Like this, we avoid
hitting the object database to parse these commits but instead only load
them from the commit-graph. This results in a significant performance
boost when executing git-fetch in said repository with 2.2 million refs:

    Benchmark #1: HEAD~: git fetch $remote $commit
      Time (mean ± σ):     44.168 s ±  0.341 s    [User: 42.985 s, System: 1.106 s]
      Range (min … max):   43.565 s … 44.577 s    10 runs

    Benchmark #2: HEAD: git fetch $remote $commit
      Time (mean ± σ):     19.498 s ±  0.724 s    [User: 18.751 s, System: 0.690 s]
      Range (min … max):   18.629 s … 20.454 s    10 runs

    Summary
      'HEAD: git fetch $remote $commit' ran
        2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit'

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>fetch-pack: signal v2 server that we are done making requests</title>
<updated>2021-05-19T22:38:40Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-05-19T16:11:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ae1a7eefffe60425e6bf6a2065e042ae051cfb6c'/>
<id>urn:sha1:ae1a7eefffe60425e6bf6a2065e042ae051cfb6c</id>
<content type='text'>
When fetching with the v0 protocol over ssh (or a local upload-pack with
pipes), the server closes the connection as soon as it is finished
sending the pack. So even though the client may still be operating on
the data via index-pack (e.g., resolving deltas, checking connectivity,
etc), the server has released all resources.

With the v2 protocol, however, the server considers the ssh session only
as a transport, with individual requests coming over it. After sending
the pack, it goes back to its main loop, waiting for another request to
come from the client. As a result, the ssh session hangs around until
the client process ends, which may be much later (because resolving
deltas, etc, may consume a lot of CPU).

This is bad for two reasons:

  - it's consuming resources on the server to leave open a connection
    that won't see any more use

  - if something bad happens to the ssh connection in the meantime (say,
    it gets killed by the network because it's idle, as happened in a
    real-world report), then ssh will exit non-zero, and we'll propagate
    the error up the stack.

The server is correct here not to hang up after serving the pack. The v2
protocol's design is meant to allow multiple requests like this, and
hanging up would be the wrong thing for a hypothetical client which was
planning to make more requests (though in practice, the git.git client
never would, and I doubt any other implementations would either).

The right thing is instead for the client to signal to the server that
it's not interested in making more requests. We can do that by closing
the pipe descriptor we use to write to ssh. This will propagate to the
server upload-pack as an EOF when it tries to read the next request (and
then it will close its half, and the whole connection will go away).

It's important to do this "half duplex" shutdown, because we have to do
it _before_ we actually receive the pack. This is an artifact of the way
fetch-pack and index-pack (or unpack-objects) interact. We hand the
connection off to index-pack (really, a sideband demuxer which feeds
it), and then wait until it returns. And it doesn't do that until it has
resolved all of the deltas in the pack, even though it was done reading
from the server long before.

So just closing the connection fully after index-pack returns would be
too late; we'd have held it open much longer than was necessary. And
teaching index-pack to close the connection is awkward. It's not even
seeing the whole conversation (the sideband demuxer is, but it doesn't
actually know what's in the packets, or when the end comes).

Note that this close() is happening deep within the transport code. It's
possible that a caller would want to perform other operations over the
same ssh transport after receiving the pack. But as of the current code,
none of the callers do, and there haven't been discussions of any plans
to change this. If we need to support that later, we can probably do so
by passing down a flag for "you're the last request on the transport;
it's OK to close" instead of the code just assuming that's true.

The description above all discusses v2 ssh, so it's worth thinking about
how this interacts with other protocols:

  - in v0 protocols, we could do the same half-duplex shutdown (it just
    goes into the v0 do_fetch_pack() instead). This does work, but since
    it doesn't have the same persistence problem in the first place,
    there's little reason to change it at this point.

  - local fetches against git-upload-pack on the same machine will
    behave the same as ssh (they are talking over two pipes, and see EOF
    on their input pipe)

  - fetches against git-daemon will run this same code, and close one of
    the descriptors. In practice, this won't do anything, since there
    our two descriptors are dups of each other, and not part of a
    half-duplex pair. The right thing would probably be to call
    shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't
    face the same error-code problem (since it's just a TCP connection),
    so it's really only an optimization problem. And git:// is not that
    widely used these days, and has less impact on server resources than
    an ssh termination.

  - v2 http doesn't suffer from this problem in the first place, as our
    pipes terminate at a local git-remote-https, which is passing data
    along as individual requests via curl. Probably curl is keeping the
    TCP/TLS connection open for more requests, and we might be able to
    tell it manually "hey, we are done making requests now". But I think
    that's much less important. It again doesn't suffer from the
    error-code problem, and HTTP keepalive is pretty well understood
    (importantly, the timeouts can be set low, because clients like curl
    know how to reconnect for subsequent requests if necessary). So it's
    probably not worth figuring out how to tell curl that we're done
    (though if we do, this patch is probably the first step anyway;
    fetch-pack closes the pipe back to remote-https, which would be the
    signal that it should tell curl we're done).

The code is pretty straightforward. We close the pipe at the right
moment, and set it to -1 to mark it as invalid. I modified the later
cleanup code to avoid calling close(-1). That's not strictly necessary,
since close(-1) is a noop, but hopefully makes things a bit more obvious
to a reader.

I suspect that trying to call more transport functions after the close()
(e.g., calling transport_fetch_refs() again) would fail, as it's not
smart enough to realize we need to re-open the ssh connection. But
that's already true when v0 is in use. And no current callers want to do
that (and again, the solution is probably a flag in the transport code
to keep things open, which can be added later).

There's no test here, as the situation it covers is inherently racy (the
question is when upload-pack exits, compared to when index-pack finishes
resolving deltas and exits). The rather gross shell snippet below does
recreate the problematic situation; when run on a sufficiently-large
repository (git.git works fine), it kills an "idle" upload-pack while
the client is resolving deltas, leading to a failed clone.

    (
	    git clone --no-local --progress . foo.git 2&gt;&amp;1
	    echo &gt;&amp;2 "clone exit code=$?"
    ) |
    tr '\r' '\n' |
    while read line
    do
	    case "$done,$line" in
	    ,Resolving*)
		    echo "hit resolving deltas; killing upload-pack"
		    killall -9 git-upload-pack
		    done=t
		    ;;
	    esac
    done

Reported-by: Greg Pflaum &lt;greg.pflaum@pnp-hcl.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>fetch: teach independent negotiation (no packfile)</title>
<updated>2021-05-05T01:41:29Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2021-05-04T21:16:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9c1e657a8fd26fa3ed8d13fb8c796cef8db8b124'/>
<id>urn:sha1:9c1e657a8fd26fa3ed8d13fb8c796cef8db8b124</id>
<content type='text'>
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.

This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)

In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.

In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.

There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
