<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-objects.h, branch v2.18.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.18.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.18.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2018-05-23T05:38:19Z</updated>
<entry>
<title>Merge branch 'nd/pack-objects-pack-struct'</title>
<updated>2018-05-23T05:38:19Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2018-05-23T05:38:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ad635e82d600e1b725a2e65d69114140db6bc876'/>
<id>urn:sha1:ad635e82d600e1b725a2e65d69114140db6bc876</id>
<content type='text'>
"git pack-objects" needs to allocate tons of "struct object_entry"
while doing its work, and shrinking its size helps the performance
quite a bit.

* nd/pack-objects-pack-struct:
  ci: exercise the whole test suite with uncommon code in pack-objects
  pack-objects: reorder members to shrink struct object_entry
  pack-objects: shrink delta_size field in struct object_entry
  pack-objects: shrink size field in struct object_entry
  pack-objects: clarify the use of object_entry::size
  pack-objects: don't check size when the object is bad
  pack-objects: shrink z_delta_size field in struct object_entry
  pack-objects: refer to delta objects by index instead of pointer
  pack-objects: move in_pack out of struct object_entry
  pack-objects: move in_pack_pos out of struct object_entry
  pack-objects: use bitfield for object_entry::depth
  pack-objects: use bitfield for object_entry::dfs_state
  pack-objects: turn type and in_pack_type to bitfields
  pack-objects: a bit of document about struct object_entry
  read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
</content>
</entry>
<entry>
<title>gc --auto: exclude base pack if not enough mem to "repack -ad"</title>
<updated>2018-04-16T04:52:29Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-15T15:36:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9806f5a7bf3d02247c2c500ef74f56213cd7b07a'/>
<id>urn:sha1:9806f5a7bf3d02247c2c500ef74f56213cd7b07a</id>
<content type='text'>
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.

Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.

gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).

This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: reorder members to shrink struct object_entry</title>
<updated>2018-04-16T03:38:59Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3b13a5f263872c4643f7f0eb6eccb7a8196b2760'/>
<id>urn:sha1:3b13a5f263872c4643f7f0eb6eccb7a8196b2760</id>
<content type='text'>
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes on 64-bit systems, before any field shrinking is done)
with 16 bits to spare (and a couple more in in_pack_header_size when
we really run out of bits).

This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).

Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%. The runtime
of repack has stayed the same throughout this series. Ævar's testing
on a big monorepo he has access to (bigger than linux-2.6.git) has
shown a 7.9% reduction, so the overall expected improvement should be
somewhere around 8%.

See 87po42cwql.fsf@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/) for
more detailed numbers and a test script used to produce the numbers
cited above.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: shrink delta_size field in struct object_entry</title>
<updated>2018-04-16T03:38:59Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0aca34e8269514ebb67676e0470a314615494ae8'/>
<id>urn:sha1:0aca34e8269514ebb67676e0470a314615494ae8</id>
<content type='text'>
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.

If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.

Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: shrink size field in struct object_entry</title>
<updated>2018-04-16T03:38:59Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ac77d0c370dc5b23ac1ec4a13c754fe7ffa48564'/>
<id>urn:sha1:ac77d0c370dc5b23ac1ec4a13c754fe7ffa48564</id>
<content type='text'>
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.

Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.

Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).

Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like

    make test GIT_TEST_OE_SIZE=4

which forces this code on all objects larger than 3 bytes.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: clarify the use of object_entry::size</title>
<updated>2018-04-16T03:38:59Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=27a7d0679f17bd536f566e76e51058de0e1fa17a'/>
<id>urn:sha1:27a7d0679f17bd536f566e76e51058de0e1fa17a</id>
<content type='text'>
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.

"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: shrink z_delta_size field in struct object_entry</title>
<updated>2018-04-16T03:38:58Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0cb3c1427a1e9cee638e0c1629933c907493d7e3'/>
<id>urn:sha1:0cb3c1427a1e9cee638e0c1629933c907493d7e3</id>
<content type='text'>
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: refer to delta objects by index instead of pointer</title>
<updated>2018-04-16T03:38:58Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=898eba5e630696608ddca0ede489378e87464ad6'/>
<id>urn:sha1:898eba5e630696608ddca0ede489378e87464ad6</id>
<content type='text'>
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.

Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].

[1] This means we can only index 2^32-2 objects even though nr_objects
    could contain 2^32-1 objects. It should not be a problem in
    practice because when we grow objects[], nr_alloc would probably
    blow up long before nr_objects hits the wall.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: move in_pack out of struct object_entry</title>
<updated>2018-04-16T03:38:58Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=43fa44fa3b68e6570145126892e1e43380d7bb5a'/>
<id>urn:sha1:43fa44fa3b68e6570145126892e1e43380d7bb5a</id>
<content type='text'>
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.

This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.

If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.

The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running

    make test GIT_TEST_FULL_IN_PACK_ARRAY=1

to take advantage of other special case tests.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: move in_pack_pos out of struct object_entry</title>
<updated>2018-04-16T03:38:58Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-04-14T15:35:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=06af3bba414b832fe9e04fb959daa2b9b678d2d5'/>
<id>urn:sha1:06af3bba414b832fe9e04fb959daa2b9b678d2d5</id>
<content type='text'>
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
