<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/connected.c, branch v2.35.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.35.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.35.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-11-26T06:15:08Z</updated>
<entry>
<title>run-command API: remove "env" member, always use "env_array"</title>
<updated>2021-11-26T06:15:08Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-11-25T22:52:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c7c4bdeccf3e737e6e674cd9f0828922e629ab06'/>
<id>urn:sha1:c7c4bdeccf3e737e6e674cd9f0828922e629ab06</id>
<content type='text'>
Remove the "env" member from "struct child_process" in favor of always
using the "env_array". As with the preceding removal of "argv" in
favor of "args" this gets rid of current and future oddities around
memory management at the API boundary (see the amended API docs).

For some of the conversions we can replace patterns like:

    child.env = env-&gt;v;

With:

    strvec_pushv(&amp;child.env_array, env-&gt;v);

But for others we need to guard the strvec_pushv() with a NULL check,
since we're not passing in the "v" member of a "struct strvec",
e.g. in the case of tmp_objdir_env()'s return value.

Ideally we'd rename the "env_array" member to simply "env" as a
follow-up, since it and "args" are now inconsistent in not having an
"_array" suffix, and seemingly without any good reason, unless we look
at the history of how they came to be.

But as we've currently got 122 in-tree hits for a "git grep env_array"
let's leave that for now (and possibly forever). Doing that rename
would be too disruptive.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/connectivity-optim'</title>
<updated>2021-11-12T23:29:24Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-11-12T23:29:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8996d68ac7052bf96a38a2b36546c4239280279e'/>
<id>urn:sha1:8996d68ac7052bf96a38a2b36546c4239280279e</id>
<content type='text'>
Regression fix.

* ps/connectivity-optim:
  Revert "connected: do not sort input revisions"
</content>
</entry>
<entry>
<title>Revert "connected: do not sort input revisions"</title>
<updated>2021-11-11T20:34:41Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-11-11T20:34:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a7df4f52affe49f6b06246b9398e99e1937f9efc'/>
<id>urn:sha1:a7df4f52affe49f6b06246b9398e99e1937f9efc</id>
<content type='text'>
This reverts commit f45022dc2fd692fd024f2eb41a86a66f19013d43,
as this is like breakage in the traversal more likely.  In a
history with 10 single strand of pearls,

   1--&gt;2--&gt;3--...-&gt;7--&gt;8--&gt;9--&gt;10

asking "rev-list --unsorted-input 1 10 --not 9 8 7 6 5 4" fails to
paint the bottom 1 uninteresting as the traversal stops, without
completing the propagation of uninteresting bit starting at 4 down
through 3 and 2 to 1.
</content>
</entry>
<entry>
<title>Merge branch 'ps/fetch-optim'</title>
<updated>2021-09-20T22:20:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-09-20T22:20:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=deec8aa2d05aba0a2ba495f9d41ab4e2805c8b9e'/>
<id>urn:sha1:deec8aa2d05aba0a2ba495f9d41ab4e2805c8b9e</id>
<content type='text'>
Optimize code that handles large number of refs in the "git fetch"
code path.

* ps/fetch-optim:
  fetch: avoid second connectivity check if we already have all objects
  fetch: merge fetching and consuming refs
  fetch: refactor fetch refs to be more extendable
  fetch-pack: optimize loading of refs via commit graph
  connected: refactor iterator to return next object ID directly
  fetch: avoid unpacking headers in object existence check
  fetch: speed up lookup of want refs via commit-graph
</content>
</entry>
<entry>
<title>connected: refactor iterator to return next object ID directly</title>
<updated>2021-09-01T19:43:56Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-09-01T13:09:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9fec7b213045135655354e864d15894175428d5a'/>
<id>urn:sha1:9fec7b213045135655354e864d15894175428d5a</id>
<content type='text'>
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.

Refactor the iterator and all its implementations to return object IDs
directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:

    Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
      Time (mean ± σ):     30.110 s ±  0.148 s    [User: 27.161 s, System: 5.075 s]
      Range (min … max):   29.934 s … 30.406 s    10 runs

    Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
      Time (mean ± σ):     29.899 s ±  0.109 s    [User: 26.916 s, System: 5.104 s]
      Range (min … max):   29.696 s … 29.996 s    10 runs

    Summary
      '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
        1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'

While this 1% speedup could be labelled as statistically insignificant,
the speedup is consistent on my machine. Furthermore, this is an end to
end test, so it is expected that the improvement in the connectivity
check itself is more significant.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>connected: do not sort input revisions</title>
<updated>2021-08-09T16:51:12Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-08-09T08:11:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f45022dc2fd692fd024f2eb41a86a66f19013d43'/>
<id>urn:sha1:f45022dc2fd692fd024f2eb41a86a66f19013d43</id>
<content type='text'>
In order to compute whether objects reachable from a set of tips are all
connected, we do a revision walk with these tips as positive references
and `--not --all`. `--not --all` will cause the revision walk to load
all preexisting references as uninteresting, which can be very expensive
in repositories with many references.

Benchmarking the git-rev-list(1) command highlights that by far the most
expensive single phase is initial sorting of the input revisions: after
all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes
up about 40% of the total runtime of git-rev-list(1).

Ultimately, the connectivity check shouldn't really bother about the
order of input revisions at all. We only care whether we can actually
walk all objects until we hit the cut-off point. So sorting the input is
a complete waste of time.

Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
cause it to not sort the commits and adjust the connectivity check to
always pass the flag. This results in the following speedups, executed
in a clone of gitlab-org/gitlab [1]:

    Benchmark #1: git rev-list  --objects --quiet --not --all --not $(cat newrev)
      Time (mean ± σ):      7.639 s ±  0.065 s    [User: 7.304 s, System: 0.335 s]
      Range (min … max):    7.543 s …  7.742 s    10 runs

    Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.995 s ±  0.044 s    [User: 4.657 s, System: 0.337 s]
      Range (min … max):    4.909 s …  5.048 s    10 runs

    Summary
      'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
        1.53 ± 0.02 times faster than 'git rev-list  --objects --quiet --not --all --not $newrev'

[1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
     are visible to clients.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/more-buffered-io'</title>
<updated>2020-08-24T21:54:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-08-24T21:54:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d8488b9e86f241ba313d04091449869fb15154d6'/>
<id>urn:sha1:d8488b9e86f241ba313d04091449869fb15154d6</id>
<content type='text'>
Use more buffered I/O where we used to call many small write(2)s.

* rs/more-buffered-io:
  upload-pack: use buffered I/O to talk to rev-list
  midx: use buffered I/O to talk to pack-objects
  connected: use buffered I/O to talk to rev-list
</content>
</entry>
<entry>
<title>connected: use buffered I/O to talk to rev-list</title>
<updated>2020-08-17T17:29:39Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2020-08-12T16:52:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=24b75faf0da7a025a192ab4c65cb1f1d6dc6b7f6'/>
<id>urn:sha1:24b75faf0da7a025a192ab4c65cb1f1d6dc6b7f6</id>
<content type='text'>
Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.

Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.

Helped-by: Chris Torek &lt;chris.torek@gmail.com&gt;
Helped-by: Johannes Sixt &lt;j6t@kdbg.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>strvec: fix indentation in renamed calls</title>
<updated>2020-07-28T22:02:18Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-07-28T20:26:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f6d8942b1fc6c968980c8ae03054d7b2114b4415'/>
<id>urn:sha1:f6d8942b1fc6c968980c8ae03054d7b2114b4415</id>
<content type='text'>
Code which split an argv_array call across multiple lines, like:

  argv_array_pushl(&amp;args, "one argument",
                   "another argument", "and more",
		   NULL);

was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:

  strvec_pushl(&amp;args, "one argument",
                   "another argument", "and more",
		   NULL);

Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:

  git jump grep 'strvec_.*,$'

and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>strvec: convert more callers away from argv_array name</title>
<updated>2020-07-28T22:02:18Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-07-28T20:24:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ef8d7ac42a6a62d678166fe25ea743315809d2bb'/>
<id>urn:sha1:ef8d7ac42a6a62d678166fe25ea743315809d2bb</id>
<content type='text'>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
