<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/connected.c, branch v2.32.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.32.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.32.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-08-24T21:54:31Z</updated>
<entry>
<title>Merge branch 'rs/more-buffered-io'</title>
<updated>2020-08-24T21:54:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-08-24T21:54:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d8488b9e86f241ba313d04091449869fb15154d6'/>
<id>urn:sha1:d8488b9e86f241ba313d04091449869fb15154d6</id>
<content type='text'>
Use more buffered I/O where we used to call many small write(2)s.

* rs/more-buffered-io:
  upload-pack: use buffered I/O to talk to rev-list
  midx: use buffered I/O to talk to pack-objects
  connected: use buffered I/O to talk to rev-list
</content>
</entry>
<entry>
<title>connected: use buffered I/O to talk to rev-list</title>
<updated>2020-08-17T17:29:39Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2020-08-12T16:52:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=24b75faf0da7a025a192ab4c65cb1f1d6dc6b7f6'/>
<id>urn:sha1:24b75faf0da7a025a192ab4c65cb1f1d6dc6b7f6</id>
<content type='text'>
Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.

Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.

Helped-by: Chris Torek &lt;chris.torek@gmail.com&gt;
Helped-by: Johannes Sixt &lt;j6t@kdbg.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>strvec: fix indentation in renamed calls</title>
<updated>2020-07-28T22:02:18Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-07-28T20:26:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f6d8942b1fc6c968980c8ae03054d7b2114b4415'/>
<id>urn:sha1:f6d8942b1fc6c968980c8ae03054d7b2114b4415</id>
<content type='text'>
Code which split an argv_array call across multiple lines, like:

  argv_array_pushl(&amp;args, "one argument",
                   "another argument", "and more",
		   NULL);

was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:

  strvec_pushl(&amp;args, "one argument",
                   "another argument", "and more",
		   NULL);

Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:

  git jump grep 'strvec_.*,$'

and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>strvec: convert more callers away from argv_array name</title>
<updated>2020-07-28T22:02:18Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-07-28T20:24:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ef8d7ac42a6a62d678166fe25ea743315809d2bb'/>
<id>urn:sha1:ef8d7ac42a6a62d678166fe25ea743315809d2bb</id>
<content type='text'>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>fetch-pack: support more than one pack lockfile</title>
<updated>2020-06-11T01:06:34Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-06-10T20:57:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9da69a6539e98da1b7ed8832cb54b494961463cb'/>
<id>urn:sha1:9da69a6539e98da1b7ed8832cb54b494961463cb</id>
<content type='text'>
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.

In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.

Implementation notes:

 - builtin/fetch-pack.c normally does not generate .keep files, and thus
   is unaffected by this or future changes. However, it has an
   undocumented "--lock-pack" feature, used by remote-curl.c when
   implementing the "fetch" remote helper command. In keeping with the
   remote helper protocol, only one "lock" line will ever be written;
   the rest will result in warnings to stderr. However, in practice,
   warnings will never be written because the remote-curl.c "fetch" is
   only used for protocol v0/v1 (which will not generate multiple .keep
   files). (Protocol v2 uses the "stateless-connect" command, not the
   "fetch" command.)

 - connected.c has an optimization in that connectivity checks on a ref
   need not be done if the target object is in a pack known to be
   self-contained and connected. If there are multiple packfiles, this
   optimization can no longer be done.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jt/connectivity-check-optim-in-partial-clone'</title>
<updated>2020-04-22T20:42:43Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-04-22T20:42:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0c601052a5f761fd9c207a56ae11789c8598e25d'/>
<id>urn:sha1:0c601052a5f761fd9c207a56ae11789c8598e25d</id>
<content type='text'>
Simplify the commit ancestry connectedness check in a partial clone
repository in which "promised" objects are assumed to be obtainable
lazily on-demand from promisor remote repositories.

* jt/connectivity-check-optim-in-partial-clone:
  connected: always use partial clone optimization
</content>
</entry>
<entry>
<title>connected: always use partial clone optimization</title>
<updated>2020-03-29T17:37:44Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-03-20T22:00:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2b98478c6fc32b04434981c783b0472e0137cfad'/>
<id>urn:sha1:2b98478c6fc32b04434981c783b0472e0137cfad</id>
<content type='text'>
With 50033772d5 ("connected: verify promisor-ness of partial clone",
2020-01-30), the fast path (checking promisor packs) in
check_connected() now passes a subset of the slow path (rev-list) - if
all objects to be checked are found in promisor packs, both the fast
path and the slow path will pass; otherwise, the fast path will
definitely not pass. This means that we can always attempt the fast path
whenever we need to do the slow path.

The fast path is currently guarded by a flag; therefore, remove that
flag. Also, make the fast path fallback to the slow path - if the fast
path fails, the failing OID and all remaining OIDs will be passed to
rev-list.

The main user-visible benefit is the performance of fetch from a partial
clone - specifically, the speedup of the connectivity check done before
the fetch. In particular, a no-op fetch into a partial clone on my
computer was sped up from 7 seconds to 0.01 seconds. This is a
complement to the work in 2df1aa239c ("fetch: forgo full
connectivity check if --filter", 2020-01-30), which is the child of the
aforementioned 50033772d5. In that commit, the connectivity check
*after* the fetch was sped up.

The addition of the fast path might cause performance reductions in
these cases:

 - If a partial clone or a fetch into a partial clone fails, Git will
   fruitlessly run rev-list (it is expected that everything fetched
   would go into promisor packs, so if that didn't happen, it is most
   likely that rev-list will fail too).

 - Any connectivity checks done by receive-pack, in the (in my opinion,
   unlikely) event that a partial clone serves receive-pack.

I think that these cases are rare enough, and the performance reduction
in this case minor enough (additional object DB access), that the
benefit of avoiding a flag outweighs these.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Reviewed-by: Josh Steadmon &lt;steadmon@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>connected.c: reprepare packs for corner cases</title>
<updated>2020-03-15T22:39:00Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-03-13T21:11:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b739d971e5f9997d90bc09b6c8cfdc677e3a0da6'/>
<id>urn:sha1:b739d971e5f9997d90bc09b6c8cfdc677e3a0da6</id>
<content type='text'>
While updating the microsoft/git fork on top of v2.26.0-rc0 and
consuming that build into Scalar, I noticed a corner case bug around
partial clone.

The "scalar clone" command can create a Git repository with the
proper config for using partial clone with the "blob:none" filter.
Instead of calling "git clone", it runs "git init" then sets a few
more config values before running "git fetch".

In our builds on v2.26.0-rc0, we noticed that our "git fetch"
command was failing with

  error: https://github.com/microsoft/scalar did not send all necessary objects

This does not happen if you copy the config file from a repository
created by "git clone --filter=blob:none &lt;url&gt;", but it does happen
when adding the config option "core.logAllRefUpdates = true".

By debugging, I was able to see that the loop inside
check_connnected() that checks if all refs are contained in
promisor packs actually did not have any packfiles in the packed_git
list.

I'm not sure what corner-case issues caused this config option to
prevent the reprepare_packed_git() from being called at the proper
spot during the fetch operation. This approach requires a situation
where we use the remote helper process, which makes it difficult to
test.

It is possible to place a reprepare_packed_git() call in the fetch code
closer to where we receive a pack, but that leaves an opening for a
later change to re-introduce this problem. Further, a concurrent repack
operation could replace the pack-file list we already loaded into
memory, causing this issue in an even harder to reproduce scenario.

It is really the responsibility of anyone looping through the list of
pack-files for a certain object to fall back to reprepare_packed_git()
on a fail-to-find. The loop in check_connected() does not have this
fallback, leading to this bug.

We _could_ try looping through the packs and only reprepare the packs
after a miss, but that change is more involved and has little value.
Since this case is isolated to the case when
opt-&gt;check_refs_are_promisor_objects_only is true, we are confident that
we are verifying the refs after downloading new data. This implies that
calling reprepare_packed_git() in advance is not a huge cost compared to
the rest of the operations already made.

Helped-by: Jeff King &lt;peff@peff.net&gt;
Helped-by: Junio Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>connected: verify promisor-ness of partial clone</title>
<updated>2020-01-30T18:55:31Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-01-12T04:15:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=50033772d50ef3c4023d63561d20bc61db96500e'/>
<id>urn:sha1:50033772d50ef3c4023d63561d20bc61db96500e</id>
<content type='text'>
Commit dfa33a298d ("clone: do faster object check for partial clones",
2019-04-21) optimized the connectivity check done when cloning with
--filter to check only the existence of objects directly pointed to by
refs. But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are
promisor objects, that is, they appear in a promisor pack.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Reviewed-by: Jonathan Nieder &lt;jrnieder@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>clone: remove fetch_if_missing=0</title>
<updated>2019-11-13T02:48:47Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2019-11-13T00:34:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e362fadcd03753471cf8e7fc91d6d721b7423b8f'/>
<id>urn:sha1:e362fadcd03753471cf8e7fc91d6d721b7423b8f</id>
<content type='text'>
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from clone as well. But doing so reveals a bug - when the server does
not send an object directly pointed to by a ref, this should be an
error, not a trigger for a lazy fetch. (This case in the fetching
mechanism was covered by a test using "git clone", not "git fetch",
which is why the aforementioned commit didn't uncover the bug.)

The bug can be fixed by suppressing lazy-fetching during the
connectivity check. Fix this bug, and remove fetch_if_missing from
clone.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
