<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-bitmap-write.c, branch v2.8.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.8.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.8.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2016-02-12T20:51:17Z</updated>
<entry>
<title>list-objects: pass full pathname to callbacks</title>
<updated>2016-02-12T20:51:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2016-02-11T22:28:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=de1e67d0703894cb6ea782e36abb63976ab07e60'/>
<id>urn:sha1:de1e67d0703894cb6ea782e36abb63976ab07e60</id>
<content type='text'>
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects: drop name_path entirely</title>
<updated>2016-02-12T20:51:15Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2016-02-11T22:26:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bd64516aca4d4e22acb33c71429d293a14d355cf'/>
<id>urn:sha1:bd64516aca4d4e22acb33c71429d293a14d355cf</id>
<content type='text'>
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Remove get_object_hash.</title>
<updated>2015-11-20T13:02:05Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2015-11-10T02:22:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ed1c9977cb1b63e4270ad8bdf967a2d02580aa08'/>
<id>urn:sha1:ed1c9977cb1b63e4270ad8bdf967a2d02580aa08</id>
<content type='text'>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</content>
</entry>
<entry>
<title>Convert struct object to object_id</title>
<updated>2015-11-20T13:02:05Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2015-11-10T02:22:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f2fd0760f62e79609fef7bfd7ecebb002e8e4ced'/>
<id>urn:sha1:f2fd0760f62e79609fef7bfd7ecebb002e8e4ced</id>
<content type='text'>
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</content>
</entry>
<entry>
<title>Add several uses of get_object_hash.</title>
<updated>2015-11-20T13:02:05Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2015-11-10T02:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7999b2cf772956466baa8925491d6fb1b0963292'/>
<id>urn:sha1:7999b2cf772956466baa8925491d6fb1b0963292</id>
<content type='text'>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/pack-bitmap'</title>
<updated>2014-12-12T22:31:42Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-12-12T22:31:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3889e7a60c65031f0c5381d8060cefb4294d932e'/>
<id>urn:sha1:3889e7a60c65031f0c5381d8060cefb4294d932e</id>
<content type='text'>
* jk/pack-bitmap:
  pack-bitmap: do not use gcc packed attribute
</content>
</entry>
<entry>
<title>pack-bitmap: do not use gcc packed attribute</title>
<updated>2014-12-01T02:07:34Z</updated>
<author>
<name>Karsten Blees</name>
<email>blees@dcon.de</email>
</author>
<published>2014-11-27T05:24:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b5007211b6582fc38647ff695b5ac51541ea9de8'/>
<id>urn:sha1:b5007211b6582fc38647ff695b5ac51541ea9de8</id>
<content type='text'>
The "__attribute__" flag may be a noop on some compilers.
That's OK as long as the code is correct without the
attribute, but in this case it is not. We would typically
end up with a struct that is 2 bytes too long due to struct
padding, breaking both reading and writing of bitmaps.

Instead of marshalling the data in a struct, let's just
provide helpers for reading and writing the appropriate
types. Besides being correct on all platforms, the result is
more efficient and simpler to read.

Signed-off-by: Karsten Blees &lt;blees@dcon.de&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>use REALLOC_ARRAY for changing the allocation size of arrays</title>
<updated>2014-09-18T16:13:42Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2014-09-16T18:56:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2756ca4347cbda05b16954cd7f445c216b935e76'/>
<id>urn:sha1:2756ca4347cbda05b16954cd7f445c216b935e76</id>
<content type='text'>
Signed-off-by: Rene Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Use hashcpy() when copying object names</title>
<updated>2014-03-06T22:03:12Z</updated>
<author>
<name>Sun He</name>
<email>sunheehnus@gmail.com</email>
</author>
<published>2014-03-03T09:39:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=50546b15ed1df25837f8b291e6fa5bbcdb84635e'/>
<id>urn:sha1:50546b15ed1df25837f8b291e6fa5bbcdb84635e</id>
<content type='text'>
We invented hashcpy() to keep the abstraction of "object name"
behind it.  Use it instead of calling memcpy() with hard-coded
20-byte length when moving object names between pieces of memory.

Leave ppc/sha1.c as-is, because the function is about the SHA-1 hash
algorithm whose output is and will always be 20 bytes.

Helped-by: Michael Haggerty &lt;mhagger@alum.mit.edu&gt;
Helped-by: Duy Nguyen &lt;pclouds@gmail.com&gt;
Signed-off-by: Sun He &lt;sunheehnus@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: implement optional name_hash cache</title>
<updated>2013-12-30T20:19:23Z</updated>
<author>
<name>Vicent Marti</name>
<email>tanoku@gmail.com</email>
</author>
<published>2013-12-21T14:00:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ae4f07fbccaab6dc93be52c0f34e137dd9fcbcf4'/>
<id>urn:sha1:ae4f07fbccaab6dc93be52c0f34e137dd9fcbcf4</id>
<content type='text'>
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.

In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.

As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.

The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.

But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.

This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.

Here are perf results for p5310:

Test                      origin/master       HEAD^                      HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk    36.81(37.82+1.43)   47.70(48.74+1.41) +29.6%   47.75(48.70+1.51) +29.7%
5310.3: simulated clone   30.78(29.70+2.14)   1.08(0.97+0.10) -96.5%     1.07(0.94+0.12) -96.5%
5310.4: simulated fetch   3.16(6.10+0.08)     3.54(10.65+0.06) +12.0%    1.70(3.07+0.06) -46.2%
5310.6: partial bitmap    36.76(43.19+1.81)   6.71(11.25+0.76) -81.7%    4.08(6.26+0.46) -88.9%

You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.

Signed-off-by: Vicent Marti &lt;tanoku@gmail.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
