<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-bitmap.c, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-05-05T04:57:58Z</updated>
<entry>
<title>pack-bitmap: pass object filter to fill-in traversal</title>
<updated>2020-05-05T04:57:58Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-05-04T23:12:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9639474b6dd126bae29dba70e2fbe11169d60e20'/>
<id>urn:sha1:9639474b6dd126bae29dba70e2fbe11169d60e20</id>
<content type='text'>
Sometimes a bitmap traversal still has to walk some commits manually,
because those commits aren't included in the bitmap packfile (e.g., due
to a push or commit since the last full repack). If we're given an
object filter, we don't pass it down to this traversal. It's not
necessary for correctness because the bitmap code has its own filters to
post-process the bitmap result (which it must, to filter out the objects
that _are_ mentioned in the bitmapped packfile).

And with blob filters, there was no performance reason to pass along
those filters, either. The fill-in traversal could omit them from the
result, but it wouldn't save us any time to do so, since we'd still have
to walk each tree entry to see if it's a blob or not.

But now that we support tree filters, there's opportunity for savings. A
tree:depth=0 filter means we can avoid accessing trees entirely, since
we know we won't them (or any of the subtrees or blobs they point to).
The new test in p5310 shows this off (the "partial bitmap" state is one
where HEAD~100 and its ancestors are all in a bitmapped pack, but
HEAD~100..HEAD are not). Here are the results (run against linux.git):

  Test                                                  HEAD^               HEAD
  -------------------------------------------------------------------------------------------------
  [...]
  5310.16: rev-list with tree filter (partial bitmap)   0.19(0.17+0.02)     0.03(0.02+0.01) -84.2%

The absolute number of savings isn't _huge_, but keep in mind that we
only omitted 100 first-parent links (in the version of linux.git here,
that's 894 actual commits). In a more pathological case, we might have a
much larger proportion of non-bitmapped commits. I didn't bother
creating such a case in the perf script because the setup is expensive,
and this is plenty to show the savings as a percentage.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap.c: support 'tree:0' filtering</title>
<updated>2020-05-05T04:57:58Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-05-04T23:12:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b0a8d4820baa9966511ea9bd8c8b09dab087475f'/>
<id>urn:sha1:b0a8d4820baa9966511ea9bd8c8b09dab087475f</id>
<content type='text'>
In the previous patch, we made it easy to define other filters that
exclude all objects of a certain type. Use that in order to implement
bitmap-level filtering for the '--filter=tree:&lt;n&gt;' filter when 'n' is
equal to 0.

The general case is not helped by bitmaps, since for values of 'n &gt; 0',
the object filtering machinery requires a full-blown tree traversal in
order to determine the depth of a given tree. Caching this is
non-obvious, too, since the same tree object can have a different depth
depending on the context (e.g., a tree was moved up in the directory
hierarchy between two commits).

But, the 'n = 0' case can be helped, and this patch does so. Running
p5310.11 in this tree and on master with the kernel, we can see that
this case is helped substantially:

  Test                                  master              this tree
  --------------------------------------------------------------------------------
  5310.11: rev-list count with tree:0   10.68(10.39+0.27)   0.06(0.04+0.01) -99.4%

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap.c: make object filtering functions generic</title>
<updated>2020-05-05T04:57:58Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-05-04T23:12:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=856e12c18a4616bbfbdfd1777577c1182b148519'/>
<id>urn:sha1:856e12c18a4616bbfbdfd1777577c1182b148519</id>
<content type='text'>
In 4f3bd5606a (pack-bitmap: implement BLOB_NONE filtering, 2020-02-14),
filtering support for bitmaps was added for the 'LOFC_BLOB_NONE' filter.

In the future, we would like to add support for filters that behave as
if they exclude a certain type of object, for e.g., the tree depth
filter with depth 0.

To prepare for this, make some of the functions used for filtering more
generic, such as 'find_tip_blobs' and 'filter_bitmap_blob_none' so that
they can work over arbitrary object types.

To that end, create 'find_tip_objects' and
'filter_bitmap_exclude_type', and redefine the aforementioned functions
in terms of those.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/nth-packed-object-id'</title>
<updated>2020-03-05T18:43:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-03-05T18:43:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e8e71848ea866d7dc34eacffc20b9c3826ae29a1'/>
<id>urn:sha1:e8e71848ea866d7dc34eacffc20b9c3826ae29a1</id>
<content type='text'>
Code cleanup to use "struct object_id" more by replacing use of
"char *sha1"

* jk/nth-packed-object-id:
  packfile: drop nth_packed_object_sha1()
  packed_object_info(): use object_id internally for delta base
  packed_object_info(): use object_id for returning delta base
  pack-check: push oid lookup into loop
  pack-check: convert "internal error" die to a BUG()
  pack-bitmap: use object_id when loading on-disk bitmaps
  pack-objects: use object_id struct in pack-reuse code
  pack-objects: convert oe_set_delta_ext() to use object_id
  pack-objects: read delta base oid into object_id struct
  nth_packed_object_oid(): use customary integer return
</content>
</entry>
<entry>
<title>Merge branch 'jk/object-filter-with-bitmap'</title>
<updated>2020-03-02T23:07:18Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-03-02T23:07:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0df82d99dae85dbd4f667e95020a146ea0167975'/>
<id>urn:sha1:0df82d99dae85dbd4f667e95020a146ea0167975</id>
<content type='text'>
The object reachability bitmap machinery and the partial cloning
machinery were not prepared to work well together, because some
object-filtering criteria that partial clones use inherently rely
on object traversal, but the bitmap machinery is an optimization
to bypass that object traversal.  There however are some cases
where they can work together, and they were taught about them.

* jk/object-filter-with-bitmap:
  rev-list --count: comment on the use of count_right++
  pack-objects: support filters with bitmaps
  pack-bitmap: implement BLOB_LIMIT filtering
  pack-bitmap: implement BLOB_NONE filtering
  bitmap: add bitmap_unset() function
  rev-list: use bitmap filters for traversal
  pack-bitmap: basic noop bitmap filter infrastructure
  rev-list: allow commit-only bitmap traversals
  t5310: factor out bitmap traversal comparison
  rev-list: allow bitmaps when counting objects
  rev-list: make --count work with --objects
  rev-list: factor out bitmap-optimized routines
  pack-bitmap: refuse to do a bitmap traversal with pathspecs
  rev-list: fallback to non-bitmap traversal when filtering
  pack-bitmap: fix leak of haves/wants object lists
  pack-bitmap: factor out type iterator initialization
</content>
</entry>
<entry>
<title>pack-bitmap: use object_id when loading on-disk bitmaps</title>
<updated>2020-02-24T20:55:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-24T04:32:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=500e4f236606684467b0b34b86e319dfa40747c4'/>
<id>urn:sha1:500e4f236606684467b0b34b86e319dfa40747c4</id>
<content type='text'>
A pack bitmap file contains the index position of the commit for each
bitmap, which we then translate into an object id via
nth_packed_object_sha1(). In preparation for that function going away,
we can switch to the more type-safe nth_packed_object_id().

Note that even though the result ends up in an object_id this does incur
an extra copy of the hash (into our temporary object_id, and then into
the final malloc'd stored_bitmap struct). This shouldn't make any
measurable difference. If it did, we could avoid this copy _and_ the
copy of the rest of the items by allocating the stored_bitmap struct
beforehand and reading directly into it from the bitmap file. Or better
still, if this is a bottleneck, we could introduce an on-disk index to
the bitmap file so we don't have to read every single entry to use just
one of them. So it's not worth worrying about micro-optimizing out this
one hash copy.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>nth_packed_object_oid(): use customary integer return</title>
<updated>2020-02-24T20:55:42Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-24T04:27:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0763671b8e0b3ef873df13c741a911b809e6813d'/>
<id>urn:sha1:0763671b8e0b3ef873df13c741a911b809e6813d</id>
<content type='text'>
Our nth_packed_object_sha1() function returns NULL for error. So when we
wrapped it with nth_packed_object_oid(), we kept the same semantics. But
it's a bit funny, because the caller actually passes in an out
parameter, and the pointer we return is just that same struct they
passed to us (or NULL).

It's not too terrible, but it does make the interface a little
non-idiomatic. Let's switch to our usual "0 for success, negative for
error" return value. Most callers either don't check it, or are
trivially converted. The one that requires the biggest change is
actually improved, as we can ditch an extra aliased pointer variable.

Since we are changing the interface in a subtle way that the compiler
wouldn't catch, let's also change the name to catch any topics in
flight. We can drop the 'o' and make it nth_packed_object_id(). That's
slightly shorter, but also less redundant since the 'o' stands for
"object" already.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/packfile-reuse-cleanup'</title>
<updated>2020-02-14T20:54:19Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-14T20:54:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a14aebeac330e6d58f9628a02521ea780daf0a5b'/>
<id>urn:sha1:a14aebeac330e6d58f9628a02521ea780daf0a5b</id>
<content type='text'>
The way "git pack-objects" reuses objects stored in existing pack
to generate its result has been improved.

* jk/packfile-reuse-cleanup:
  pack-bitmap: don't rely on bitmap_git-&gt;reuse_objects
  pack-objects: add checks for duplicate objects
  pack-objects: improve partial packfile reuse
  builtin/pack-objects: introduce obj_is_packed()
  pack-objects: introduce pack.allowPackReuse
  csum-file: introduce hashfile_total()
  pack-bitmap: simplify bitmap_has_oid_in_uninteresting()
  pack-bitmap: uninteresting oid can be outside bitmapped packfile
  pack-bitmap: introduce bitmap_walk_contains()
  ewah/bitmap: introduce bitmap_word_alloc()
  packfile: expose get_delta_base()
  builtin/pack-objects: report reused packfile objects
</content>
</entry>
<entry>
<title>pack-bitmap: implement BLOB_LIMIT filtering</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=84243da1298890bd7370df66b754c2b252a08346'/>
<id>urn:sha1:84243da1298890bd7370df66b754c2b252a08346</id>
<content type='text'>
Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT filters by looking at the sizes of any blobs in the result
and unsetting their bits as appropriate. This is slightly more expensive
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):

  Test                                         HEAD~2            HEAD
  ------------------------------------------------------------------------------------
  5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
  5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%

The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
(since we can't mask out the matches, but must look up the size
individually for each blob). The trick with using ctz64() is taken from
show_objects_for_type(), which likewise needs to find individual bits
(but wants to quickly skip over big chunks without blobs).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: implement BLOB_NONE filtering</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4f3bd5606a02260274555f41fd7d6368f2bea1d8'/>
<id>urn:sha1:4f3bd5606a02260274555f41fd7d6368f2bea1d8</id>
<content type='text'>
We can easily support BLOB_NONE filters with bitmaps. Since we know the
types of all of the objects, we just need to clear the result bits of
any blobs.

Note two subtleties in the implementation (which I also called out in
comments):

  - we have to include any blobs that were specifically asked for (and
    not reached through graph traversal) to match the non-bitmap version

  - we have to handle in-pack and "ext_index" objects separately.
    Arguably prepare_bitmap_walk() could be adding these ext_index
    objects to the type bitmaps. But it doesn't for now, so let's match
    the rest of the bitmap code here (it probably wouldn't be an
    efficiency improvement to do so since the cost of extending those
    bitmaps is about the same as our loop here, but it might make the
    code a bit simpler).

Here are perf results for the new test on git.git:

  Test                                    HEAD^             HEAD
  --------------------------------------------------------------------------------
  5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
