<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/Documentation/git-pack-objects.txt, branch v2.45.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.4</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-10-02T21:54:29Z</updated>
<entry>
<title>pack-objects: allow `--filter` without `--stdout`</title>
<updated>2023-10-02T21:54:29Z</updated>
<author>
<name>Christian Couder</name>
<email>christian.couder@gmail.com</email>
</author>
<published>2023-10-02T16:54:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6cfcabfb9f60369f6fb4d24465a7c98b1d90b34b'/>
<id>urn:sha1:6cfcabfb9f60369f6fb4d24465a7c98b1d90b34b</id>
<content type='text'>
9535ce7337 (pack-objects: add list-objects filtering, 2017-11-21)
taught `git pack-objects` to use `--filter`, but required the use of
`--stdout` since a partial clone mechanism was not yet in place to
handle missing objects. Since then, changes like 9e27beaa23
(promisor-remote: implement promisor_remote_get_direct(), 2019-06-25)
and others added support to dynamically fetch objects that were missing.

Even without a promisor remote, filtering out objects can also be useful
if we can put the filtered out objects in a separate pack, and in this
case it also makes sense for pack-objects to write the packfile directly
to an actual file rather than on stdout.

Remove the `--stdout` requirement when using `--filter`, so that in a
follow-up commit, repack can pass `--filter` to pack-objects to omit
certain objects from the resulting packfile.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Christian Couder &lt;chriscool@tuxfamily.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: support `--max-pack-size` with `--cruft`</title>
<updated>2023-08-29T18:58:06Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-08-28T22:49:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=61568efa95608fdafffe67967a82e88bcd90fade'/>
<id>urn:sha1:61568efa95608fdafffe67967a82e88bcd90fade</id>
<content type='text'>
When pack-objects learned the `--cruft` option back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20), we
explicitly forbade `--cruft` with `--max-pack-size`.

At the time, there was no specific rationale given in the patch for not
supporting the `--max-pack-size` option with `--cruft`. (As best I can
remember, it's because we were trying to push users towards only ever
having a single cruft pack, but I cannot be sure).

However, `--max-pack-size` is flexible enough that it already works with
`--cruft` and can shard unreachable objects across multiple cruft packs,
creating separate ".mtimes" files as appropriate. In fact, the
`--max-pack-size` option worked with `--cruft` as far back as
b757353676!

This is because we overwrite the `written_list`, and pass down the
appropriate length, i.e. the number of objects written in each pack
shard.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: --cruft without expiration</title>
<updated>2022-05-26T22:48:26Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-05-20T23:17:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b757353676d6ffe1ac275366fa8b5b42b5d9727d'/>
<id>urn:sha1:b757353676d6ffe1ac275366fa8b5b42b5d9727d</id>
<content type='text'>
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.

When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".

Generating a non-expiring cruft packs works as follows:

  - Callers provide a list of every pack they know about, and indicate
    which packs are about to be removed.

  - All packs which are going to be removed (we'll call these the
    redundant ones) are marked as kept in-core.

    Any packs the caller did not mention (but are known to the
    `pack-objects` process) are also marked as kept in-core. Packs not
    mentioned by the caller are assumed to be unknown to them, i.e.,
    they entered the repository after the caller decided which packs
    should be kept and which should be discarded.

    Since we do not want to include objects in these "unknown" packs
    (because we don't know which of their objects are or aren't
    reachable), these are also marked as kept in-core.

  - Then, we enumerate all objects in the repository, and add them to
    our packing list if they do not appear in an in-core kept pack.

This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>doc: express grammar placeholders between angle brackets</title>
<updated>2021-11-09T17:39:11Z</updated>
<author>
<name>Jean-Noël Avila</name>
<email>jn.avila@free.fr</email>
</author>
<published>2021-11-06T18:48:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=49cbad0edd4dcf53e373e9fd27a9c36a41fb044e'/>
<id>urn:sha1:49cbad0edd4dcf53e373e9fd27a9c36a41fb044e</id>
<content type='text'>
This discerns user inputs from verbatim options in the synopsis.

Signed-off-by: Jean-Noël Avila &lt;jn.avila@free.fr&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/doc-max-pack-size'</title>
<updated>2021-07-08T20:15:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-07-08T20:15:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=18b49be4927bb327449624c37eebd13a5b9c1582'/>
<id>urn:sha1:18b49be4927bb327449624c37eebd13a5b9c1582</id>
<content type='text'>
Doc update.

* jk/doc-max-pack-size:
  doc: warn people against --max-pack-size
</content>
</entry>
<entry>
<title>doc: warn people against --max-pack-size</title>
<updated>2021-06-08T23:56:09Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-06-08T07:24:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6fb9195f6c6df047949efcdb8a1fb9eaed0c925e'/>
<id>urn:sha1:6fb9195f6c6df047949efcdb8a1fb9eaed0c925e</id>
<content type='text'>
This option is almost never a good idea, as the resulting repository is
larger and slower (see the new explanations in the docs).

I outlined the potential problems. We could go further and make the
option harder to find (or at least, make the command-line option
descriptions a much more terse "you probably don't want this; see
pack.packsizeLimit for details"). But this seems like a minimal change
that may prevent people from thinking it's more useful than it is.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/geometric-repack'</title>
<updated>2021-03-24T21:36:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-03-24T21:36:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2744383cbda9bbbe4219bd3532757ae6d28460e1'/>
<id>urn:sha1:2744383cbda9bbbe4219bd3532757ae6d28460e1</id>
<content type='text'>
"git repack" so far has been only capable of repacking everything
under the sun into a single pack (or split by size).  A cleverer
strategy to reduce the cost of repacking a repository has been
introduced.

* tb/geometric-repack:
  builtin/pack-objects.c: ignore missing links with --stdin-packs
  builtin/repack.c: reword comment around pack-objects flags
  builtin/repack.c: be more conservative with unsigned overflows
  builtin/repack.c: assign pack split later
  t7703: test --geometric repack with loose objects
  builtin/repack.c: do not repack single packs with --geometric
  builtin/repack.c: add '--geometric' option
  packfile: add kept-pack cache for find_kept_pack_entry()
  builtin/pack-objects.c: rewrite honor-pack-keep logic
  p5303: measure time to repack with keep
  p5303: add missing &amp;&amp;-chains
  builtin/pack-objects.c: add '--stdin-packs' option
  revision: learn '--no-kept-objects'
  packfile: introduce 'find_kept_pack_entry()'
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: add '--stdin-packs' option</title>
<updated>2021-02-23T07:30:52Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2021-02-23T02:25:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=339bce27f4f2a6f3bfab3e708429c810f4030c43'/>
<id>urn:sha1:339bce27f4f2a6f3bfab3e708429c810f4030c43</id>
<content type='text'>
In an upcoming commit, 'git repack' will want to create a pack comprised
of all of the objects in some packs (the included packs) excluding any
objects in some other packs (the excluded packs).

This caller could iterate those packs themselves and feed the objects it
finds to 'git pack-objects' directly over stdin, but this approach has a
few downsides:

  - It requires every caller that wants to drive 'git pack-objects' in
    this way to implement pack iteration themselves. This forces the
    caller to think about details like what order objects are fed to
    pack-objects, which callers would likely rather not do.

  - If the set of objects in included packs is large, it requires
    sending a lot of data over a pipe, which is inefficient.

  - The caller is forced to keep track of the excluded objects, too, and
    make sure that it doesn't send any objects that appear in both
    included and excluded packs.

But the biggest downside is the lack of a reachability traversal.
Because the caller passes in a list of objects directly, those objects
don't get a namehash assigned to them, which can have a negative impact
on the delta selection process, causing 'git pack-objects' to fail to
find good deltas even when they exist.

The caller could formulate a reachability traversal themselves, but the
only way to drive 'git pack-objects' in this way is to do a full
traversal, and then remove objects in the excluded packs after the
traversal is complete. This can be detrimental to callers who care
about performance, especially in repositories with many objects.

Introduce 'git pack-objects --stdin-packs' which remedies these four
concerns.

'git pack-objects --stdin-packs' expects a list of pack names on stdin,
where 'pack-xyz.pack' denotes that pack as included, and
'^pack-xyz.pack' denotes it as excluded. The resulting pack includes all
objects that are present in at least one included pack, and aren't
present in any excluded pack.

To address the delta selection problem, 'git pack-objects --stdin-packs'
works as follows. First, it assembles a list of objects that it is going
to pack, as above. Then, a reachability traversal is started, whose tips
are any commits mentioned in included packs. Upon visiting an object, we
find its corresponding object_entry in the to_pack list, and set its
namehash parameter appropriately.

To avoid the traversal visiting more objects than it needs to, the
traversal is halted upon encountering an object which can be found in an
excluded pack (by marking the excluded packs as kept in-core, and
passing --no-kept-objects=in-core to the revision machinery).

This can cause the traversal to halt early, for example if an object in
an included pack is an ancestor of ones in excluded packs. But stopping
early is OK, since filling in the namehash fields of objects in the
to_pack list is only additive (i.e., having it helps the delta selection
process, but leaving it blank doesn't impact the correctness of the
resulting pack).

Even still, it is unlikely that this hurts us much in practice, since
the 'git repack --geometric' caller (which is introduced in a later
commit) marks small packs as included, and large ones as excluded.
During ordinary use, the small packs usually represent pushes after a
large repack, and so are unlikely to be ancestors of objects that
already exist in the repository.

(I found it convenient while developing this patch to have 'git
pack-objects' report the number of objects which were visited and got
their namehash fields filled in during traversal. This is also included
in the below patch via trace2 data lines).

Suggested-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Reviewed-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>doc: mention bigFileThreshold for packing</title>
<updated>2021-02-22T21:18:30Z</updated>
<author>
<name>Christian Walther</name>
<email>cwalther@gmx.ch</email>
</author>
<published>2021-02-21T13:23:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3a837b58e33ee67f21a2ca737c6a12d74cee9e5e'/>
<id>urn:sha1:3a837b58e33ee67f21a2ca737c6a12d74cee9e5e</id>
<content type='text'>
Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Capitalize CONFIGURATION for consistency with other pages having such a
section.

Signed-off-by: Christian Walther &lt;cwalther@gmx.ch&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: no fetch when allow-{any,promisor}</title>
<updated>2020-08-06T20:01:03Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-08-05T23:06:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ee47243d7636d3d54b727ad24027a9167b68ebb1'/>
<id>urn:sha1:ee47243d7636d3d54b727ad24027a9167b68ebb1</id>
<content type='text'>
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:

    This patch introduces handling of missing objects to help
    debugging and development of the "partial clone" mechanism,
    and once the mechanism is implemented, for a power user to
    perform operations that are missing-object aware without
    incurring the cost of checking if a missing link is expected.

The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).

However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
