<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/builtin/pack-objects.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-23T17:32:11Z</updated>
<entry>
<title>Merge branch 'ps/build-sign-compare'</title>
<updated>2024-12-23T17:32:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-23T17:32:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4156b6a741c7fb15a4eccb320612fb6e453f439c'/>
<id>urn:sha1:4156b6a741c7fb15a4eccb320612fb6e453f439c</id>
<content type='text'>
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands' into kn/midx-wo-the-repository</title>
<updated>2024-12-04T01:32:02Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-04T01:32:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=aaafb67ba92939125d909166e74eb36a1e652fd4'/>
<id>urn:sha1:aaafb67ba92939125d909166e74eb36a1e652fd4</id>
<content type='text'>
* kn/pass-repo-to-builtin-sub-sub-commands:
  builtin: pass repository to sub commands
  Git 2.47.1
  Makefile(s): avoid recipe prefix in conditional statements
  doc: switch links to https
  doc: update links to current pages
  The eleventh batch
  pack-objects: only perform verbatim reuse on the preferred pack
  t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
  test-lib: move malloc-debug setup after $PATH setup
  builtin/difftool: intialize some hashmap variables
  refspec: store raw refspecs inside refspec_item
  refspec: drop separate raw_nr count
  fetch: adjust refspec-&gt;raw_nr when filtering prefetch refspecs
  test-lib: check malloc debug LD_PRELOAD before using
</content>
</entry>
<entry>
<title>Merge branch 'kn/the-repository' into kn/midx-wo-the-repository</title>
<updated>2024-12-04T01:31:46Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-04T01:31:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=33833ed08b67d1c43f4121ce8b1ec7e03b734dea'/>
<id>urn:sha1:33833ed08b67d1c43f4121ce8b1ec7e03b734dea</id>
<content type='text'>
* kn/the-repository:
  packfile.c: remove unnecessary prepare_packed_git() call
  midx: add repository to `multi_pack_index` struct
  config: make `packed_git_(limit|window_size)` non-global variables
  config: make `delta_base_cache_limit` a non-global variable
  packfile: pass down repository to `for_each_packed_object`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `odb_pack_name`
  packfile: pass `repository` to static function in the file
  packfile: use `repository` from `packed_git` directly
  packfile: add repository to struct `packed_git`
</content>
</entry>
<entry>
<title>packfile: pass down repository to `for_each_packed_object`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:44:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c87910b96b1223d4a1dfe65c04e03863fadef6bc'/>
<id>urn:sha1:c87910b96b1223d4a1dfe65c04e03863fadef6bc</id>
<content type='text'>
The function `for_each_packed_object` currently relies on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from this
function and closely related function `is_promisor_object`.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: pass down repository to `has_object[_kept]_pack`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cc656f4eb2b7b10bc530c96844909c869bdd1fdf'/>
<id>urn:sha1:cc656f4eb2b7b10bc530c96844909c869bdd1fdf</id>
<content type='text'>
The functions `has_object[_kept]_pack` currently rely on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from these
functions and any related ones.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/multi-pack-reuse-dupfix'</title>
<updated>2024-11-22T05:34:19Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-11-22T05:34:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0a83b39594ae2b6f04d0e90d02b6ad9600e12551'/>
<id>urn:sha1:0a83b39594ae2b6f04d0e90d02b6ad9600e12551</id>
<content type='text'>
Object reuse code based on multi-pack-index sent an unwanted copy
of object.

* tb/multi-pack-reuse-dupfix:
  pack-objects: only perform verbatim reuse on the preferred pack
  t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
</content>
</entry>
<entry>
<title>pack-objects: only perform verbatim reuse on the preferred pack</title>
<updated>2024-11-15T00:13:31Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-11-14T13:42:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e1992905927cfd1c6503ecad22f538f5f2143b5e'/>
<id>urn:sha1:e1992905927cfd1c6503ecad22f538f5f2143b5e</id>
<content type='text'>
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.

This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.

To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.

If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:

 - include the object twice, assuming that the caller wants Y in the
   pack, or

 - include the object once, resulting in us packing more objects than
   necessary.

This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.

Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.

This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).

The only significant changes from the pre-ca0fd69e37 implementation are:

 - that we can no longer inspect words up to the end of
   reuse_packfile_bitmap-&gt;word_alloc, since we only want to look at
   words whose bits all correspond to objects in the given packfile, and

 - that we return early when given a reuse_packfile which is not
   preferred, making the call a noop.

In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).

Helped-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>index-pack: repack local links into promisor packs</title>
<updated>2024-11-12T01:18:16Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2024-11-01T20:11:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c08589efdc57604590c5892b52c6820d00f6e0fd'/>
<id>urn:sha1:c08589efdc57604590c5892b52c6820d00f6e0fd</id>
<content type='text'>
Teach index-pack to, when processing the objects in a pack with
--promisor specified on the CLI, repack local objects (and the local
objects that they refer to, recursively) referenced by these objects
into promisor packs.

This prevents the situation in which, when fetching from a promisor
remote, we end up with promisor objects (newly fetched) referring
to non-promisor objects (locally created prior to the fetch). This
situation may arise if the client had previously pushed objects to the
remote, for example. One issue that arises in this situation is that,
if the non-promisor objects become inaccessible except through promisor
objects (for example, if the branch pointing to them has moved to
point to the promisor object that refers to them), then GC will garbage
collect them. There are other ways to solve this, but the simplest
seems to be to enforce the invariant that we don't have promisor objects
referring to non-promisor objects.

This repacking is done from index-pack to minimize the performance
impact. During a fetch, the only time most objects are fully inflated
in memory is when their object ID is computed, so we also scan the
objects (to see which objects they refer to) during this time.

Also to minimize the performance impact, an object is calculated to be
local if it's a loose object or present in a non-promisor pack. (If it's
also in a promisor pack or referred to by an object in a promisor pack,
it is technically already a promisor object. But a misidentification
of a promisor object as a non-promisor object is relatively benign
here - we will thus repack that promisor object into a promisor pack,
duplicating it in the object store, but there is no correctness issue,
just an issue of inefficiency.)

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: use object_id in find_pack_entry_one()</title>
<updated>2024-10-25T21:35:46Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-10-25T07:06:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=479ab76c9ffbd35585a1506ac5c99fe218df70b9'/>
<id>urn:sha1:479ab76c9ffbd35585a1506ac5c99fe218df70b9</id>
<content type='text'>
The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.

As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.

The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
</feed>
