<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/builtin/pack-objects.c, branch v2.41.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.41.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-05-06T21:34:09Z</updated>
<entry>
<title>attr: teach "--attr-source=&lt;tree&gt;" global option to "git"</title>
<updated>2023-05-06T21:34:09Z</updated>
<author>
<name>John Cai</name>
<email>johncai86@gmail.com</email>
</author>
<published>2023-05-06T04:15:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=44451a2e5eec5360378be23e2cdbd9ecee49e14e'/>
<id>urn:sha1:44451a2e5eec5360378be23e2cdbd9ecee49e14e</id>
<content type='text'>
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=&lt;tree&gt;" option to
allow it to read attribute files from a tree-ish, but did so only
for the command.  Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.

Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=&lt;tree&gt;" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.

Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/pack-revindex-on-disk'</title>
<updated>2023-04-27T23:00:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-27T23:00:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=849c8b3dbf8e850020951e81a42df4dc0b1e5b5d'/>
<id>urn:sha1:849c8b3dbf8e850020951e81a42df4dc0b1e5b5d</id>
<content type='text'>
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.

* tb/pack-revindex-on-disk:
  t: invert `GIT_TEST_WRITE_REV_INDEX`
  config: enable `pack.writeReverseIndex` by default
  pack-revindex: introduce `pack.readReverseIndex`
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: make `load_pack_revindex` take a repository
  t5325: mark as leak-free
  pack-write.c: plug a leak in stage_tmp_packfiles()
</content>
</entry>
<entry>
<title>Merge branch 'ps/fix-geom-repack-with-alternates'</title>
<updated>2023-04-25T20:56:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-25T20:56:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=36628c56ed54b5a86aa9a17ed589fc0ee7f5bea5'/>
<id>urn:sha1:36628c56ed54b5a86aa9a17ed589fc0ee7f5bea5</id>
<content type='text'>
Geometric repacking ("git repack --geometric=&lt;n&gt;") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.

* ps/fix-geom-repack-with-alternates:
  repack: disable writing bitmaps when doing a local repack
  repack: honor `-l` when calculating pack geometry
  t/helper: allow chmtime to print verbosely without modifying mtime
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: fix error when packing same pack twice
  pack-objects: split out `--stdin-packs` tests into separate file
  repack: fix generating multi-pack-index with only non-local packs
  repack: fix trying to use preferred pack in alternates
  midx: fix segfault with no packs and invalid preferred pack
</content>
</entry>
<entry>
<title>Merge branch 'en/header-split-cache-h'</title>
<updated>2023-04-25T20:56:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-25T20:56:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0807e57807aaffe2813fffb7704dcc9153f03832'/>
<id>urn:sha1:0807e57807aaffe2813fffb7704dcc9153f03832</id>
<content type='text'>
Header clean-up.

* en/header-split-cache-h: (24 commits)
  protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
  mailmap, quote: move declarations of global vars to correct unit
  treewide: reduce includes of cache.h in other headers
  treewide: remove double forward declaration of read_in_full
  cache.h: remove unnecessary includes
  treewide: remove cache.h inclusion due to pager.h changes
  pager.h: move declarations for pager.c functions from cache.h
  treewide: remove cache.h inclusion due to editor.h changes
  editor: move editor-related functions and declarations into common file
  treewide: remove cache.h inclusion due to object.h changes
  object.h: move some inline functions and defines from cache.h
  treewide: remove cache.h inclusion due to object-file.h changes
  object-file.h: move declarations for object-file.c functions from cache.h
  treewide: remove cache.h inclusion due to git-zlib changes
  git-zlib: move declarations for git-zlib functions from cache.h
  treewide: remove cache.h inclusion due to object-name.h changes
  object-name.h: move declarations for object-name.c functions from cache.h
  treewide: remove unnecessary cache.h inclusion
  treewide: be explicit about dependence on mem-pool.h
  treewide: be explicit about dependence on oid-array.h
  ...
</content>
</entry>
<entry>
<title>pack-objects: fix error when same packfile is included and excluded</title>
<updated>2023-04-14T17:27:51Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2023-04-14T06:01:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=752b465c3c0fd7f503b50c326017b8b13af83c3b'/>
<id>urn:sha1:752b465c3c0fd7f503b50c326017b8b13af83c3b</id>
<content type='text'>
When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.

Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: fix error when packing same pack twice</title>
<updated>2023-04-14T17:27:51Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2023-04-14T06:01:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=732194b5f2ff50fc536b28c3d5a3e43e0110419b'/>
<id>urn:sha1:732194b5f2ff50fc536b28c3d5a3e43e0110419b</id>
<content type='text'>
When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.

While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.

Fix this bug by removing any duplicates from both the included and
excluded packs.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t: invert `GIT_TEST_WRITE_REV_INDEX`</title>
<updated>2023-04-13T14:55:46Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-04-12T22:20:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9f7f10a282d8adeb9da0990aa0eb2adf93a47ca7'/>
<id>urn:sha1:9f7f10a282d8adeb9da0990aa0eb2adf93a47ca7</id>
<content type='text'>
Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we
added a test knob to conditionally enable writing a ".rev" file when
indexing a pack. At the time, this was used to ensure that the test
suite worked even when ".rev" files were written, which served as a
stress-test for the on-disk reverse index implementation.

Now that reading from on-disk ".rev" files is enabled by default, the
test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning.

We could get rid of the option entirely, but there would be no
convenient way to test Git when ".rev" files *aren't* in place.

Instead of getting rid of the option, invert its meaning to instead
disable writing ".rev" files, thereby running the test suite in a mode
where the reverse index is generated from scratch.

This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
spelling of "true", we are still running and exercising Git's behavior
when forced to generate reverse indexes from scratch. Do so by setting
it in the linux-TEST-vars CI run to ensure that we are maintaining good
coverage of this now-legacy code.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Acked-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: enable `pack.writeReverseIndex` by default</title>
<updated>2023-04-13T14:55:46Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-04-12T22:20:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a8dd7e05b1c033917b900ea3d930e79eea3ff9a4'/>
<id>urn:sha1:a8dd7e05b1c033917b900ea3d930e79eea3ff9a4</id>
<content type='text'>
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.

A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack")  to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).

Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.

They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:

  - an ~80% reduction in the time it takes to serve fetches on a popular
    repository, Homebrew/homebrew-core.

  - a ~60% reduction in the peak memory usage to serve fetches on that
    same repository.

  - a collective savings of ~35% in CPU time across all pack-objects
    invocations serving fetches across all repositories in a single
    datacenter.

Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:

    $ { git rev-parse HEAD &amp;&amp; printf '^' &amp;&amp; git rev-parse HEAD~10 } &gt;in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null
      Time (mean ± σ):     543.0 ms ±  20.3 ms    [User: 616.2 ms, System: 58.8 ms]
      Range (min … max):   521.0 ms … 577.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null
      Time (mean ± σ):     245.0 ms ±  11.4 ms    [User: 335.6 ms, System: 31.3 ms]
      Range (min … max):   226.0 ms … 259.6 ms    13 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null' ran
	2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null'

The same is true of writing a pack containing the objects for the 30
most-recent commits:

    $ { git rev-parse HEAD &amp;&amp; printf '^' &amp;&amp; git rev-parse HEAD~30 } &gt;in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null
      Time (mean ± σ):     866.5 ms ±  16.2 ms    [User: 1414.5 ms, System: 97.0 ms]
      Range (min … max):   839.3 ms … 886.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null
      Time (mean ± σ):     581.6 ms ±  10.2 ms    [User: 1181.7 ms, System: 62.6 ms]
      Range (min … max):   567.5 ms … 599.3 ms    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null' ran
	1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout &lt;in &gt;/dev/null'

...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:

    $ git rev-parse HEAD &gt;in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" &lt;in'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" &lt;in
      Time (mean ± σ):     305.8 ms ±  11.4 ms    [User: 264.2 ms, System: 41.4 ms]
      Range (min … max):   290.3 ms … 331.1 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" &lt;in
      Time (mean ± σ):       4.0 ms ±   0.3 ms    [User: 1.7 ms, System: 2.3 ms]
      Range (min … max):     1.6 ms …   4.6 ms    1155 runs

    Summary
      'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" &lt;in' ran
       76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" &lt;in'

In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.

This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.

(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).

[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Acked-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file.h: move declarations for object-file.c functions from cache.h</title>
<updated>2023-04-11T15:52:10Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-11T07:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=87bed17907b2cb9a9581a5b8b16b8da264c2a2a8'/>
<id>urn:sha1:87bed17907b2cb9a9581a5b8b16b8da264c2a2a8</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Acked-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/header-split-cleanup'</title>
<updated>2023-04-06T20:38:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-06T20:38:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6047b28eb7e1a0b0061c5310034f7b5683ea401a'/>
<id>urn:sha1:6047b28eb7e1a0b0061c5310034f7b5683ea401a</id>
<content type='text'>
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.

* en/header-split-cleanup:
  csum-file.h: remove unnecessary inclusion of cache.h
  write-or-die.h: move declarations for write-or-die.c functions from cache.h
  treewide: remove cache.h inclusion due to setup.h changes
  setup.h: move declarations for setup.c functions from cache.h
  treewide: remove cache.h inclusion due to environment.h changes
  environment.h: move declarations for environment.c functions from cache.h
  treewide: remove unnecessary includes of cache.h
  wrapper.h: move declarations for wrapper.c functions from cache.h
  path.h: move function declarations for path.c functions from cache.h
  cache.h: remove expand_user_path()
  abspath.h: move absolute path functions from cache.h
  environment: move comment_line_char from cache.h
  treewide: remove unnecessary cache.h inclusion from several sources
  treewide: remove unnecessary inclusion of gettext.h
  treewide: be explicit about dependence on gettext.h
  treewide: remove unnecessary cache.h inclusion from a few headers
</content>
</entry>
</feed>
