<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/builtin/pack-objects.c, branch v2.13.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.13.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.13.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2017-06-04T01:21:02Z</updated>
<entry>
<title>Merge branch 'jk/disable-pack-reuse-when-broken' into maint</title>
<updated>2017-06-04T01:21:02Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2017-06-04T01:21:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f305016f428db71df44d3e0072e937a8739b4e1a'/>
<id>urn:sha1:f305016f428db71df44d3e0072e937a8739b4e1a</id>
<content type='text'>
"pack-objects" can stream a slice of an existing packfile out when
the pack bitmap can tell that the reachable objects are all needed
in the output, without inspecting individual objects.  This
strategy however would not work well when "--local" and other
options are in use, and need to be disabled.

* jk/disable-pack-reuse-when-broken:
  t5310: fix "; do" style
  pack-objects: disable pack reuse for object-selection options
</content>
</entry>
<entry>
<title>pack-objects: disable pack reuse for object-selection options</title>
<updated>2017-05-09T03:07:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-05-09T02:54:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9df4a6074a98740b70b980d43dca5203db0ff688'/>
<id>urn:sha1:9df4a6074a98740b70b980d43dca5203db0ff688</id>
<content type='text'>
If certain options like --honor-pack-keep, --local, or
--incremental are used with pack-objects, then we need to
feed each potential object to want_object_in_pack() to see
if it should be filtered out. But when the bitmap
reuse_packfile optimization is in effect, we do not call
that function at all, and in fact skip adding the objects to
the to_pack list entirely.  This means we have a bug: for
certain requests we will silently ignore those options and
include objects in that pack that should not be there.

The problem has been present since the inception of the
pack-reuse code in 6b8fda2db (pack-objects: use bitmaps when
packing objects, 2013-12-21), but it was unlikely to come up
in practice.  These options are generally used for on-disk
packing, not transfer packs (which go to stdout), but we've
never allowed pack reuse for non-stdout packs (until
645c432d6, we did not even use bitmaps, which the reuse
optimization relies on; after that, we explicitly turned it
off when not packing to stdout).

We can fix this by just disabling the reuse_packfile
optimization when the options are in use. In theory we could
teach the pack-reuse code to satisfy these checks, but it's
not worth the complexity. The purpose of the optimization is
to keep the amount of per-object work we do to a minimum.
But these options inherently require us to search for other
copies of each object, drowning out any benefit of the
pack-reuse optimization. But note that the optimizations
from 56dfeb626 (pack-objects: compute local/ignore_pack_keep
early, 2016-07-29) happen before pack-reuse, meaning that
specifying "--honor-pack-keep" in a repository with no .keep
files can still follow the fast path.

There are tests in t5310 that check these options with
bitmaps and --stdout, but they didn't catch the bug, and
it's hard to adapt them to do so.

One problem is that they don't use --delta-base-offset;
without that option, we always disable the reuse
optimization entirely. It would be fine to add it in (it
actually makes the test more realistic), but that still
isn't quite enough.

The other problem is that the reuse code is very picky; it
only kicks in when it can reuse most of a pack, starting
from the first byte. So we'd have to start from a fully
repacked and bitmapped state to trigger it. But the tests
for these options use a much more subtle state; they want to
be sure that the want_object_in_pack() code is allowing some
objects but not others. Doing a full repack runs counter to
that.

So this patch adds new tests at the end of the script which
create the fully-packed state and make sure that each option
is not fooled by reusable pack.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'bc/object-id'</title>
<updated>2017-04-20T04:37:13Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2017-04-20T04:37:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b1081e4004091947b6c6a806625addd1cbba61b7'/>
<id>urn:sha1:b1081e4004091947b6c6a806625addd1cbba61b7</id>
<content type='text'>
Conversion from unsigned char [40] to struct object_id continues.

* bc/object-id:
  Documentation: update and rename api-sha1-array.txt
  Rename sha1_array to oid_array
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert remaining callers of sha1_array_lookup to object_id
  Make sha1_array_append take a struct object_id *
  sha1-array: convert internal storage for struct sha1_array to object_id
  builtin/pull: convert to struct object_id
  submodule: convert check_for_new_submodule_commits to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  sha1_name: convert struct disambiguate_state to object_id
  test-sha1-array: convert most code to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  fsck: convert init_skiplist to struct object_id
  builtin/receive-pack: convert portions to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/diff: convert to struct object_id
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Define new hash-size constants for allocating memory
</content>
</entry>
<entry>
<title>Rename sha1_array to oid_array</title>
<updated>2017-03-31T15:33:56Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2017-03-31T01:40:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=910650d2f8755359ab7b1f0e2a2d576c06a68091'/>
<id>urn:sha1:910650d2f8755359ab7b1f0e2a2d576c06a68091</id>
<content type='text'>
Since this structure handles an array of object IDs, rename it to struct
oid_array.  Also rename the accessor functions and the initialization
constant.

This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:

    perl -pi -E 's/struct sha1_array/struct oid_array/g'
    perl -pi -E 's/\bsha1_array_/oid_array_/g'
    perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Convert sha1_array_lookup to take struct object_id</title>
<updated>2017-03-31T15:33:55Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2017-03-31T01:39:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5d3206d5010423b3bbf479e32567d0fddc7095e0'/>
<id>urn:sha1:5d3206d5010423b3bbf479e32567d0fddc7095e0</id>
<content type='text'>
Convert this function by changing the declaration and definition and
applying the following semantic patch to update the callers:

@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2.hash)
+ sha1_array_lookup(E1, &amp;E2)

@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2-&gt;hash)
+ sha1_array_lookup(E1, E2)

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Convert remaining callers of sha1_array_lookup to object_id</title>
<updated>2017-03-31T15:33:55Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2017-03-31T01:39:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4ce3621a6dd3f98dab3c588b7f53f81b8404df32'/>
<id>urn:sha1:4ce3621a6dd3f98dab3c588b7f53f81b8404df32</id>
<content type='text'>
There are a very small number of callers which don't already use struct
object_id.  Convert them.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Make sha1_array_append take a struct object_id *</title>
<updated>2017-03-31T15:33:55Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2017-03-31T01:39:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=98a72ddc12e13231537150607f4a6a8ff8b43854'/>
<id>urn:sha1:98a72ddc12e13231537150607f4a6a8ff8b43854</id>
<content type='text'>
Convert the callers to pass struct object_id by changing the function
declaration and definition and applying the following semantic patch:

@@
expression E1, E2;
@@
- sha1_array_append(E1, E2.hash)
+ sha1_array_append(E1, &amp;E2)

@@
expression E1, E2;
@@
- sha1_array_append(E1, E2-&gt;hash)
+ sha1_array_append(E1, E2)

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/fast-import-cleanup'</title>
<updated>2017-03-28T21:05:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2017-03-28T21:05:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=53a0f9f7ad8cda231ce5bb15a0f086f29bdc6de3'/>
<id>urn:sha1:53a0f9f7ad8cda231ce5bb15a0f086f29bdc6de3</id>
<content type='text'>
Code clean-up.

* jk/fast-import-cleanup:
  pack.h: define largest possible encoded object size
  encode_in_pack_object_header: respect output buffer length
  fast-import: use xsnprintf for formatting headers
  fast-import: use xsnprintf for writing sha1s
</content>
</entry>
<entry>
<title>pack.h: define largest possible encoded object size</title>
<updated>2017-03-24T19:34:07Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-03-24T17:26:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2c5e2865cc3dfc053e71510415f479e165119d04'/>
<id>urn:sha1:2c5e2865cc3dfc053e71510415f479e165119d04</id>
<content type='text'>
Several callers use fixed buffers for storing the pack
object header, and they've picked 10 as a magic number. This
is reasonable, since it handles objects up to 2^67. But
let's give them a constant so it's clear that the number
isn't pulled out of thin air.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>encode_in_pack_object_header: respect output buffer length</title>
<updated>2017-03-24T19:34:07Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-03-24T17:26:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7202a6fa8773fdcf3f374625def3c15276250b67'/>
<id>urn:sha1:7202a6fa8773fdcf3f374625def3c15276250b67</id>
<content type='text'>
The encode_in_pack_object_header() writes a variable-length
header to an output buffer, but it doesn't actually know
long the buffer is. At first glance, this looks like it
might be possible to overflow.

In practice, this is probably impossible. The smallest
buffer we use is 10 bytes, which would hold the header for
an object up to 2^67 bytes. Obviously we're not likely to
see such an object, but we might worry that an object could
lie about its size (causing us to overflow before we realize
it does not actually have that many bytes). But the argument
is passed as a uintmax_t. Even on systems that have __int128
available, uintmax_t is typically restricted to 64-bit by
the ABI.

So it's unlikely that a system exists where this could be
exploited. Still, it's easy enough to use a normal out/len
pair and make sure we don't write too far. That protects the
hypothetical 128-bit system, makes it harder for callers to
accidentally specify a too-small buffer, and makes the
resulting code easier to audit.

Note that the one caller in fast-import tried to catch such
a case, but did so _after_ the call (at which point we'd
have already overflowed!). This check can now go away.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
