<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/packfile.h, branch v2.50.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.50.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.50.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-04-29T17:08:12Z</updated>
<entry>
<title>object-store: move function declarations to their respective subsystems</title>
<updated>2025-04-29T17:08:12Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-29T07:52:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1a793261c53507f7c46f748cc76378a9c5bb05cf'/>
<id>urn:sha1:1a793261c53507f7c46f748cc76378a9c5bb05cf</id>
<content type='text'>
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: move `struct packed_git` into "packfile.h"</title>
<updated>2025-04-29T17:08:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-29T07:52:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ddb28da58fd657fa672f4605e50e140ce4c662f8'/>
<id>urn:sha1:ddb28da58fd657fa672f4605e50e140ce4c662f8</id>
<content type='text'>
The "object-store.h" header contains the definition of `struct
packed_git`. As this structure hosts all kind of information about a
specific packfile it is arguably a bit out of place in a generic place
like "object-store.h".

Move the structure as well as `pack_map_entry_cmp()` into "packfile.h".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: freshen objects from existing cruft packs</title>
<updated>2025-03-13T18:48:04Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2025-03-13T18:09:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=08f612ba7000bf181ef6d8baed9ece322e567efd'/>
<id>urn:sha1:08f612ba7000bf181ef6d8baed9ece322e567efd</id>
<content type='text'>
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.

Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).

But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.

Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.

When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.

 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
    we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
    over the packs we're going to retain (which are marked as kept
    in-core by 'read_cruft_objects()').

    This means that we will avoid enumerating additional packed copies
    of objects found in any cruft packs which are larger than the given
    size threshold. Thus there is no opportunity to call
    'create_object_entry()' whatsoever.

 2. We likewise will discard the loose copy (if one exists) of any
    unreachable object packed in a cruft pack that is larger than the
    threshold. Here our call path is 'add_unreachable_loose_objects()',
    which uses the 'add_loose_object()' callback.

    That function will eventually land us in 'want_object_in_pack()'
    (via 'add_cruft_object_entry()'), and we'll discard the object as it
    appears in one of the packs which we marked as kept in-core.

This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.

Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.

In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.

But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:

 - exists in a non-cruft pack that we are retaining, regardless of that
   pack's mtime, or

 - exists in a cruft pack with an mtime at least as recent as the copy
   we are debating whether or not to pack, in which case freshening
   would be redundant.

To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/pack-header-parse-alignment-fix'</title>
<updated>2025-01-28T21:02:23Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-01-28T21:02:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f8b9821f7d4f5239911794d03ac74edf27d711e6'/>
<id>urn:sha1:f8b9821f7d4f5239911794d03ac74edf27d711e6</id>
<content type='text'>
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.

* jk/pack-header-parse-alignment-fix:
  index-pack, unpack-objects: use skip_prefix to avoid magic number
  index-pack, unpack-objects: use get_be32() for reading pack header
  parse_pack_header_option(): avoid unaligned memory writes
  packfile: factor out --pack_header argument parsing
  bswap.h: squelch potential sparse -Wcast-truncate warnings
</content>
</entry>
<entry>
<title>packfile: factor out --pack_header argument parsing</title>
<updated>2025-01-21T16:42:55Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-19T13:23:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=798e0f451661f81f4568dce4033cf1c9076f224f'/>
<id>urn:sha1:798e0f451661f81f4568dce4033cf1c9076f224f</id>
<content type='text'>
Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.

In preparation for a bugfix, let's factor the duplicated code into a
common helper.

The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).

Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: make `packed_git_(limit|window_size)` non-global variables</title>
<updated>2024-12-03T23:21:55Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:44:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d284713bae71877577cf1a07501c8528f8c44bb2'/>
<id>urn:sha1:d284713bae71877577cf1a07501c8528f8c44bb2</id>
<content type='text'>
The variables `packed_git_window_size` and `packed_git_limit` are global
config variables used in the `packfile.c` file. Since it is only used in
this file, let's change it from being a global config variable to a
local variable for the subsystem.

With this, we rid `packfile.c` from all global variable usage and this
means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from
the file.

Helped-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: pass down repository to `for_each_packed_object`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:44:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c87910b96b1223d4a1dfe65c04e03863fadef6bc'/>
<id>urn:sha1:c87910b96b1223d4a1dfe65c04e03863fadef6bc</id>
<content type='text'>
The function `for_each_packed_object` currently relies on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from this
function and closely related function `is_promisor_object`.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: pass down repository to `has_object[_kept]_pack`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cc656f4eb2b7b10bc530c96844909c869bdd1fdf'/>
<id>urn:sha1:cc656f4eb2b7b10bc530c96844909c869bdd1fdf</id>
<content type='text'>
The functions `has_object[_kept]_pack` currently rely on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from these
functions and any related ones.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: pass down repository to `odb_pack_name`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=873b00597bbf20c1bcda089a687641167b148fa2'/>
<id>urn:sha1:873b00597bbf20c1bcda089a687641167b148fa2</id>
<content type='text'>
The function `odb_pack_name` currently relies on the global variable
`the_repository`. To eliminate global variable usage in `packfile.c`, we
should progressively shift the dependency on the_repository to higher
layers.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: add repository to struct `packed_git`</title>
<updated>2024-12-03T23:21:53Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2cf3fe63f6eedd6d132c530b897595345a05088b'/>
<id>urn:sha1:2cf3fe63f6eedd6d132c530b897595345a05088b</id>
<content type='text'>
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.

We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.

We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.

Helped-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
