<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/builtin/pack-objects.c, branch v2.34.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.34.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.34.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-09-20T22:20:42Z</updated>
<entry>
<title>Merge branch 'tb/pack-finalize-ordering'</title>
<updated>2021-09-20T22:20:42Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-09-20T22:20:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a1af5333234b42e9e44729280a0e92b989bd865d'/>
<id>urn:sha1:a1af5333234b42e9e44729280a0e92b989bd865d</id>
<content type='text'>
The order in which various files that make up a single (conceptual)
packfile has been reevaluated and straightened up.  This matters in
correctness, as an incomplete set of files must not be shown to a
running Git.

* tb/pack-finalize-ordering:
  pack-objects: rename .idx files into place after .bitmap files
  pack-write: split up finish_tmp_packfile() function
  builtin/index-pack.c: move `.idx` files into place last
  index-pack: refactor renaming in final()
  builtin/repack.c: move `.idx` files into place last
  pack-write.c: rename `.idx` files after `*.rev`
  pack-write: refactor renaming in finish_tmp_packfile()
  bulk-checkin.c: store checksum directly
  pack.h: line-wrap the definition of finish_tmp_packfile()
</content>
</entry>
<entry>
<title>Merge branch 'tb/multi-pack-bitmaps'</title>
<updated>2021-09-20T22:20:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-09-20T22:20:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0649303820cf88fb5a6ab440af15c8d6b8799d3f'/>
<id>urn:sha1:0649303820cf88fb5a6ab440af15c8d6b8799d3f</id>
<content type='text'>
The reachability bitmap file used to be generated only for a single
pack, but now we've learned to generate bitmaps for history that
span across multiple packfiles.

* tb/multi-pack-bitmaps: (29 commits)
  pack-bitmap: drop bitmap_index argument from try_partial_reuse()
  pack-bitmap: drop repository argument from prepare_midx_bitmap_git()
  p5326: perf tests for MIDX bitmaps
  p5310: extract full and partial bitmap tests
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t7700: update to work with MIDX bitmap test knob
  t5319: don't write MIDX bitmaps in t5319
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5326: test multi-pack bitmap behavior
  t/helper/test-read-midx.c: add --checksum mode
  t5310: move some tests to lib-bitmap.sh
  pack-bitmap: write multi-pack bitmaps
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  midx: avoid opening multiple MIDXs when writing
  midx: close linked MIDXs, avoid leaking memory
  ...
</content>
</entry>
<entry>
<title>pack-objects: rename .idx files into place after .bitmap files</title>
<updated>2021-09-10T01:23:11Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-09-09T23:25:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4bc1fd6e3941be74027594efad3d2358a93702df'/>
<id>urn:sha1:4bc1fd6e3941be74027594efad3d2358a93702df</id>
<content type='text'>
In preceding commits the race of renaming .idx files in place before
.rev files and other auxiliary files was fixed in pack-write.c's
finish_tmp_packfile(), builtin/repack.c's "struct exts", and
builtin/index-pack.c's final(). As noted in the change to pack-write.c
we left in place the issue of writing *.bitmap files after the *.idx,
let's fix that issue.

See 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21)
for commentary at the time when *.bitmap was implemented about how
those files are written out, nothing in that commit contradicts what's
being done here.

Note that this commit and preceding ones only close any race condition
with *.idx files being written before their auxiliary files if we're
optimistic about our lack of fsync()-ing in this are not tripping us
over. See the thread at [1] for a rabbit hole of various discussions
about filesystem races in the face of doing and not doing fsync() (and
if doing fsync(), not doing it properly).

We may want to fsync the containing directory once after renaming the
*.idx file into place, but that is outside the scope of this series.

1. https://lore.kernel.org/git/8735qgkvv1.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-write: split up finish_tmp_packfile() function</title>
<updated>2021-09-10T01:23:11Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-09-09T23:24:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2ec02dd5a8261bc837b961ef36788081ded5c2bc'/>
<id>urn:sha1:2ec02dd5a8261bc837b961ef36788081ded5c2bc</id>
<content type='text'>
Split up the finish_tmp_packfile() function and use the split-up version
in pack-objects.c in preparation for moving the step of renaming the
*.idx file later as part of a function change.

Since the only other caller of finish_tmp_packfile() was in
bulk-checkin.c, and it won't be needing a change to its *.idx renaming,
provide a thin wrapper for the old function as a static function in that
file. If other callers end up needing the simpler version it could be
moved back to "pack-write.c" and "pack.h".

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-write: refactor renaming in finish_tmp_packfile()</title>
<updated>2021-09-10T01:23:11Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-09-09T23:24:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=66833f0e70c473ca6c4e6a79d34e879d8b40ba9d'/>
<id>urn:sha1:66833f0e70c473ca6c4e6a79d34e879d8b40ba9d</id>
<content type='text'>
Refactor the renaming in finish_tmp_packfile() into a helper function.
The callers are now expected to pass a "name_buffer" ending in
"pack-OID." instead of the previous "pack-", we then append "pack",
"idx" or "rev" to it.

By doing the strbuf_setlen() in rename_tmp_packfile() we reuse the
buffer and avoid the repeated allocations we'd get if that function had
its own temporary "struct strbuf".

This approach of reusing the buffer does make the last user in
pack-object.c's write_pack_file() slightly awkward, since we needlessly
do a strbuf_setlen() before calling strbuf_release() for consistency. In
subsequent changes we'll move that bitmap writing code around, so let's
not skip the strbuf_setlen() now.

The previous strbuf_reset() idiom originated with 5889271114a
(finish_tmp_packfile():use strbuf for pathname construction,
2014-03-03), which in turn was a minimal adjustment of pre-strbuf code
added in 0e990530ae (finish_tmp_packfile(): a helper function,
2011-10-28).

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: read multi-pack bitmaps</title>
<updated>2021-09-01T20:56:43Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2021-08-31T20:52:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0f533c728418fd3ef6ebcae5240e8df566cdaa72'/>
<id>urn:sha1:0f533c728418fd3ef6ebcae5240e8df566cdaa72</id>
<content type='text'>
This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit. Note also
that get_midx_checksum() and get_midx_filename() are made non-static so
they can be called from pack-bitmap.c.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: remove duplicate hash lookup</title>
<updated>2021-08-30T06:25:43Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2021-08-30T02:48:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b0173340c6b5fb330f5ea22504389fc6c5367f14'/>
<id>urn:sha1:b0173340c6b5fb330f5ea22504389fc6c5367f14</id>
<content type='text'>
In the original code from 08cdfb1337 (pack-objects --keep-unreachable,
2007-09-16), we add each object to the packing list with type
`obj-&gt;type`, where `obj` comes from `lookup_unknown_object()`. Unless we
had already looked up and parsed the object, this will be `OBJ_NONE`.
That's fine, since oe_set_type() sets the type_valid bit to '0', and we
determine the real type later on.

So the only thing we need from the object lookup is access to the
`flags` field so that we can mark that we've added the object with
`OBJECT_ADDED` to avoid adding it again (we can just pass `OBJ_NONE`
directly instead of grabbing it from the object).

But add_object_entry() already rejects duplicates! This has been the
behavior since 7a979d99ba (Thin pack - create packfile with missing
delta base., 2006-02-19), but 08cdfb1337 didn't take advantage of it.
Moreover, to do the OBJECT_ADDED check, we have to do a hash lookup in
`obj_hash`.

So we can drop the lookup_unknown_object() call completely, *and* the
OBJECT_ADDED flag, too, since the spot we're touching here is the only
location that checks it.

In the end, we perform the same number of hash lookups, but with the
added bonus that we don't waste memory allocating an OBJ_NONE object (if
we were traversing, we'd need it eventually, but the whole point of this
code path is not to traverse).

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: simplify add_objects_in_unpacked_packs()</title>
<updated>2021-08-30T06:25:20Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2021-08-30T02:48:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a9fd2f207d0318cd7a4771f13c9fb4759d280f68'/>
<id>urn:sha1:a9fd2f207d0318cd7a4771f13c9fb4759d280f68</id>
<content type='text'>
This function is used to implement `pack-objects`'s `--keep-unreachable`
option, but can be simplified in a couple of ways:

  - add_objects_in_unpacked_packs() iterates over all packs (and then
    all packed objects) itself, but could use for_each_packed_object()
    instead since the missing flags necessary were added in the previous
    commit

  - objects are added to an in_pack array which store (off_t, object)
    tuples, and then sorted in offset order when we could iterate
    objects in offset order.

    There is a slight behavior change here: before we would have added
    objects in sorted offset order among _all_ packs. Handing objects to
    create_object_entry() in pack order for each pack (instead of
    feeding objects from all packs simultaneously their offset relative
    to different packs) is much more reasonable, if different than how
    the code currently works.

  - objects in a single pack are iterated in index order and searched
    for in order to discover their offsets, which is much less efficient
    than using the on-disk reverse index

Simplify the function by addressing each of the above and moving the
core of the loop into a callback function that we then pass to
for_each_packed_object() instead of open-coding the latter function
ourselves.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap-write.c: gracefully fail to write non-closed bitmaps</title>
<updated>2021-08-24T20:21:13Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2021-08-24T16:15:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3ba3d0621b6822e974f06123db9394b33e454ed7'/>
<id>urn:sha1:3ba3d0621b6822e974f06123db9394b33e454ed7</id>
<content type='text'>
The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps (except in the
case of partial clones where we aren't guaranteed to have all objects).

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

A test is added in t0410 to trigger a bitmap write without full
reachability closure by removing local copies of some reachable objects
from a promisor remote.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: fix segfault in --stdin-packs option</title>
<updated>2021-07-09T18:53:40Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-07-09T10:13:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=561fa03529202a36e0d77548fdcb5d81422c1865'/>
<id>urn:sha1:561fa03529202a36e0d77548fdcb5d81422c1865</id>
<content type='text'>
Fix a segfault in the --stdin-packs option added in
339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
2021-02-22).

The read_packs_list_from_stdin() function didn't check that the lines
it was reading were valid packs, and thus when doing the QSORT() with
pack_mtime_cmp() we'd have a NULL "util" field. The "util" field is
used to associate the names of included/excluded packs with the
packed_git structs they correspond to.

The logic error was in assuming that we could iterate all packs and
annotate the excluded and included packs we got, as opposed to
checking the lines we got on stdin. There was a check for excluded
packs, but included packs were simply assumed to be valid.

As noted in the test we'll not report the first bad line, but whatever
line sorted first according to the string-list.c API. In this case I
think that's fine. There was further discussion of alternate
approaches in [1].

Even though we're being lazy let's assert the line we do expect to get
in the test, since whoever changes this code in the future might miss
this case, and would want to update the test and comments.

1. http://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
