<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-objects.h, branch v2.50.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.50.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.50.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-04-29T17:08:11Z</updated>
<entry>
<title>object-store: move `struct packed_git` into "packfile.h"</title>
<updated>2025-04-29T17:08:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-29T07:52:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ddb28da58fd657fa672f4605e50e140ce4c662f8'/>
<id>urn:sha1:ddb28da58fd657fa672f4605e50e140ce4c662f8</id>
<content type='text'>
The "object-store.h" header contains the definition of `struct
packed_git`. As this structure hosts all kind of information about a
specific packfile it is arguably a bit out of place in a generic place
like "object-store.h".

Move the structure as well as `pack_map_entry_cmp()` into "packfile.h".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: merge "object-store-ll.h" and "object-store.h"</title>
<updated>2025-04-15T15:24:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68cd492a3e662c75dec364986c81e94716d4ac56'/>
<id>urn:sha1:68cd492a3e662c75dec364986c81e94716d4ac56</id>
<content type='text'>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/name-hash-tweaks'</title>
<updated>2025-02-12T18:08:51Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-02-12T18:08:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=aae91a86fb2a71ff89a71b63ccec3a947b26ca51'/>
<id>urn:sha1:aae91a86fb2a71ff89a71b63ccec3a947b26ca51</id>
<content type='text'>
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.

* ds/name-hash-tweaks:
  pack-objects: prevent name hash version change
  test-tool: add helper for name-hash values
  p5313: add size comparison test
  pack-objects: add GIT_TEST_NAME_HASH_VERSION
  repack: add --name-hash-version option
  pack-objects: add --name-hash-version option
  pack-objects: create new name-hash function version
</content>
</entry>
<entry>
<title>pack-objects: create new name-hash function version</title>
<updated>2025-01-27T21:21:05Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2025-01-27T19:02:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dca924b4508e3147e82be5bba10cfc8aefa7ecf0'/>
<id>urn:sha1:dca924b4508e3147e82be5bba10cfc8aefa7ecf0</id>
<content type='text'>
As we will explore in later changes, the default name-hash function used
in 'git pack-objects' has a tendency to cause collisions and cause poor
delta selection. This change creates an alternative that avoids some
collisions while preserving some amount of hash locality.

The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.

Here's the crux of the implementation:

	/*
	 * This effectively just creates a sortable number from the
	 * last sixteen non-whitespace characters. Last characters
	 * count "most", so things that end in ".c" sort together.
	 */
	while ((c = *name++) != 0) {
		if (isspace(c))
			continue;
		hash = (hash &gt;&gt; 2) + (c &lt;&lt; 24);
	}

As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more than
others. This collision is somewhat by design in order to promote hash
locality for files that have similar types (.c, .h, .json) or could be the
same file across a directory rename (a/foo.txt to b/foo.txt). This leads to
decent cross-path deltas in cases like shallow clones or packing a
repository with very few historical versions of files that share common data
with other similarly-named files.

However, when the name-hash instead leads to a large number of name-hash
collisions for otherwise unrelated files, this can lead to confusing the
delta calculation to prefer cross-path deltas over previous versions of the
same file.

The new pack_name_hash_v2() function attempts to fix this issue by
taking more of the directory path into account through its hash
function. Its naming implies that we will later wire up details for
choosing a name-hash function by version.

The first change is to be more careful about paths using non-ASCII
characters. With these characters in mind, reverse the bits in the byte
as the least-significant bits have the highest entropy and we want to
maximize their influence. This is done with some bit manipulation that
swaps the two halves, then the quarters within those halves, and then
the bits within those quarters.

The second change is to perform hash composition operations at every
level of the path. This is done by storing a 'base' hash value that
contains the hash of the parent directory. When reaching a directory
boundary, we XOR the current level's name-hash value with a downshift of
the previous level's hash. This perturbation intends to create low-bit
distinctions for paths with the same final 16 bytes but distinct parent
directory structures.

The collision rate and effectiveness of this hash function will be
explored in later changes as the function is integrated with 'git
pack-objects' and 'git repack'.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Derrick Stolee &lt;stolee@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: make `delta_base_cache_limit` a non-global variable</title>
<updated>2024-12-03T23:21:55Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:44:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d6b2d21fbf269db7a6be56d28a62cb65a7d7a660'/>
<id>urn:sha1:d6b2d21fbf269db7a6be56d28a62cb65a7d7a660</id>
<content type='text'>
The `delta_base_cache_limit` variable is a global config variable used
by multiple subsystems. Let's make this non-global, by adding this
variable independently to the subsystems where it is used.

First, add the setting to the `repo_settings` struct, this provides
access to the config in places where the repository is available. Use
this in `packfile.c`.

In `index-pack.c` we add it to the `pack_idx_option` struct and its
constructor. While the repository struct is available here, it may not
be set  because `git index-pack` can be used without a repository.

In `gc.c` add it to the `gc_config` struct and also the constructor
function. The gc functions currently do not have direct access to a
repository struct.

These changes are made to remove the usage of `delta_base_cache_limit`
as a global variable in `packfile.c`. This brings us one step closer to
removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
which we complete in the next patch.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: free packing_data in more places</title>
<updated>2023-12-14T22:38:07Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-12-14T22:23:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=66f0c71073ee5fe1c9d12d2952305a4793d7b43f'/>
<id>urn:sha1:66f0c71073ee5fe1c9d12d2952305a4793d7b43f</id>
<content type='text'>
The pack-objects internals use a packing_data struct to track what
objects are part of the pack(s) being formed.

Since these structures contain allocated fields, failing to
appropriately free() them results in a leak. Plug that leak by
introducing a clear_packing_data() function, and call it in the
appropriate spots.

This is a fairly straightforward leak to plug, since none of the callers
expect to read any values or have any references to parts of the address
space being freed.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store-ll.h: split this header out of object-store.h</title>
<updated>2023-06-21T20:39:54Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:34:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a034e9106ff1a4cb6fcb6f2ea3a1a47b4d2ba173'/>
<id>urn:sha1:a034e9106ff1a4cb6fcb6f2ea3a1a47b4d2ba173</id>
<content type='text'>
The vast majority of files including object-store.h did not need dir.h
nor khash.h.  Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.

After this patch:
    $ git grep -h include..object-store | sort | uniq -c
          2 #include "object-store.h"
        129 #include "object-store-ll.h"

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects.h: remove outdated pahole results</title>
<updated>2022-06-28T22:39:03Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-06-28T18:30:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=14deb585fb9bd98521ceb189a2d2dcdae9f44606'/>
<id>urn:sha1:14deb585fb9bd98521ceb189a2d2dcdae9f44606</id>
<content type='text'>
The size and padding of `struct object_entry` is an important factor in
determining the memory usage of `pack-objects`. For this reason,
3b13a5f263 (pack-objects: reorder members to shrink struct object_entry,
2018-04-14) added a comment containing some information from pahole
indicating the size and padding of that struct.

Unfortunately, this comment hasn't been updated since 9ac3f0e5b3
(pack-objects: fix performance issues on packing large deltas,
2018-07-22), despite the size of this struct changing many times since
that commit.

To see just how often the size of object_entry changes, I skimmed the
first-parent history with this script:

    for sha in $(git rev-list --first-parent --reverse 9ac3f0e..)
    do
      echo -n "$sha "
      git checkout -q $sha
      make -s pack-objects.o 2&gt;/dev/null
      pahole -C object_entry pack-objects.o | sed -n \
        -e 's/\/\* size: \([0-9]*\).*/size \1/p' \
        -e 's/\/\*.*padding: \([0-9]*\).*/padding \1/p' | xargs
    done | uniq -f1

In between each merge, the size of object_entry changes too often to
record every instance here. But the important merges (along with their
corresponding sizes and bit paddings) in chronological order are:

    ad635e82d6 (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23) size 80 padding 4
    29d9e3e2c4 (Merge branch 'nd/pack-deltify-regression-fix', 2018-08-22) size 80 padding 9
    3ebdef2e1b (Merge branch 'jk/pack-delta-reuse-with-bitmap', 2018-09-17) size 80 padding 8
    33e4ae9c50 (Merge branch 'bc/sha-256', 2019-01-29) size 96 padding 8

(indicating that the current size of the struct is 96 bytes, with 8
padding bits).

Even though this comment was written in a good spirit, it is updated
infrequently enough that it serves to confuse rather than to encourage
contributors to update the appropriate values when the modify the
definition of object_entry.

For that reason, eliminate the confusion by removing the comment
altogether.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-mtimes: support writing pack .mtimes files</title>
<updated>2022-05-26T22:48:26Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-05-20T23:17:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5dfaf49a5a651d3e8c3552bc2e833312a3671a4f'/>
<id>urn:sha1:5dfaf49a5a651d3e8c3552bc2e833312a3671a4f</id>
<content type='text'>
Now that the `.mtimes` format is defined, supplement the pack-write API
to be able to conditionally write an `.mtimes` file along with a pack by
setting an additional flag and passing an oidmap that contains the
timestamps corresponding to each object in the pack.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: move static inline from a header to the sole consumer</title>
<updated>2021-05-27T03:14:41Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-05-27T00:52:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d089fb9b73a0d7f9bf43a81f284fd4f499d4886'/>
<id>urn:sha1:7d089fb9b73a0d7f9bf43a81f284fd4f499d4886</id>
<content type='text'>
Move the code that is only used in builtin/pack-objects.c out of
pack-objects.h.

This fixes an issue where Solaris's SunCC hasn't been able to compile
git since 483fa7f42d9 (t/helper/test-bitmap.c: initial commit,
2021-03-31).

The real origin of that issue is that in 898eba5e630 (pack-objects:
refer to delta objects by index instead of pointer, 2018-04-14)
utility functions only needed by builtin/pack-objects.c were added to
pack-objects.h. Since then the header has been used in a few other
places, but 483fa7f42d9 was the first time it was used by test helper.

Since Solaris is stricter about linking and the oe_get_size_slow()
function lives in builtin/pack-objects.c the build started failing
with:

    Undefined                       first referenced
     symbol                             in file
    oe_get_size_slow                    t/helper/test-bitmap.o
    ld: fatal: symbol referencing errors. No output written to t/helper/test-tool

On other platforms this is presumably OK because the compiler and/or
linker detects that the "static inline" functions that reference
oe_get_size_slow() aren't used.

Let's solve this by moving the relevant code from pack-objects.h to
builtin/pack-objects.c. This is almost entirely a code-only move, but
because of the early macro definitions in that file referencing some
of these inline functions we need to move the definition of "static
struct packing_data to_pack" earlier, and declare these inline
functions above the macros.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
