<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/packfile.c, branch v2.45.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.4</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-03-28T21:13:50Z</updated>
<entry>
<title>Merge branch 'eb/hash-transition'</title>
<updated>2024-03-28T21:13:50Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-03-28T21:13:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1002f28a527d33893f7dab068dbac7011f84af65'/>
<id>urn:sha1:1002f28a527d33893f7dab068dbac7011f84af65</id>
<content type='text'>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes in source files</title>
<updated>2023-12-26T20:04:31Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-12-23T17:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eea0e59ffbed6e33d171ace5be13cde9faa41639'/>
<id>urn:sha1:eea0e59ffbed6e33d171ace5be13cde9faa41639</id>
<content type='text'>
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-walk: init_tree_desc take an oid to get the hash algorithm</title>
<updated>2023-10-02T21:57:40Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2023-10-02T02:40:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=efed687edcaa272601e0f4e192db368972daa7ac'/>
<id>urn:sha1:efed687edcaa272601e0f4e192db368972daa7ac</id>
<content type='text'>
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jc/retire-get-sha1-hex'</title>
<updated>2023-08-04T17:52:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-08-04T17:52:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3365e2675e5ac95e0a44665966e8cfbb9433456e'/>
<id>urn:sha1:3365e2675e5ac95e0a44665966e8cfbb9433456e</id>
<content type='text'>
The implementation of "get_sha1_hex()" that reads a hexadecimal
string that spells a full object name has been extended to cope
with any hash function used in the repository, but the "sha1" in
its name survived.  Rename it to get_hash_hex(), a name that is
more consistent within its friends like get_hash_hex_algop().

* jc/retire-get-sha1-hex:
  hex: retire get_sha1_hex()
</content>
</entry>
<entry>
<title>Merge branch 'tb/object-access-overflow-protection'</title>
<updated>2023-07-25T19:05:23Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-07-25T19:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4488bb3bed8cc80aee1642d0cdc331c9ea6be8fb'/>
<id>urn:sha1:4488bb3bed8cc80aee1642d0cdc331c9ea6be8fb</id>
<content type='text'>
Various offset computation in the code that accesses the packfiles
and other data in the object layer has been hardened against
arithmetic overflow, especially on 32-bit systems.

* tb/object-access-overflow-protection:
  commit-graph.c: prevent overflow in `verify_commit_graph()`
  commit-graph.c: prevent overflow in `write_commit_graph()`
  commit-graph.c: prevent overflow in `merge_commit_graph()`
  commit-graph.c: prevent overflow in `split_graph_merge_strategy()`
  commit-graph.c: prevent overflow in `load_tree_for_commit()`
  commit-graph.c: prevent overflow in `fill_commit_in_graph()`
  commit-graph.c: prevent overflow in `fill_commit_graph_info()`
  commit-graph.c: prevent overflow in `load_oid_from_graph()`
  commit-graph.c: prevent overflow in add_graph_to_chain()
  commit-graph.c: prevent overflow in `write_commit_graph_file()`
  pack-bitmap.c: ensure that eindex lookups don't overflow
  midx.c: prevent overflow in `fill_included_packs_batch()`
  midx.c: prevent overflow in `write_midx_internal()`
  midx.c: store `nr`, `alloc` variables as `size_t`'s
  midx.c: prevent overflow in `nth_midxed_offset()`
  midx.c: prevent overflow in `nth_midxed_object_oid()`
  midx.c: use `size_t`'s for fanout nr and alloc
  packfile.c: use checked arithmetic in `nth_packed_object_offset()`
  packfile.c: prevent overflow in `load_idx()`
  packfile.c: prevent overflow in `nth_packed_object_id()`
</content>
</entry>
<entry>
<title>hex: retire get_sha1_hex()</title>
<updated>2023-07-24T23:11:23Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-07-24T23:11:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=08e5fb1296238c9c4468ae2cfbd7a49045159c60'/>
<id>urn:sha1:08e5fb1296238c9c4468ae2cfbd7a49045159c60</id>
<content type='text'>
The naming convention around get_sha1_hex() and its friends is
awkward these days, after "struct object_id" was introduced.

There are three public functions around this area:

 * get_sha1_hex()       - use the implied the_hash_algo, fill uchar *
 * get_oid_hex()        - use the implied the_hash_algo, fill oid *
 * get_oid_hex_algop()  - use the passed algop, fill oid *

Between the latter two, the "_algop" suffix signals whether the
the_hash_algo is used as the implied algorithm or the caller should
pass an algorithm explicitly.  That is very much understandable and
is a good convention.

Between the former two, however, the "SHA1" vs "OID" in the names
differentiate in what type of variable the result is stored.

We could argue that it makes sense to use "SHA1" to mean "flat byte
buffer" to honor the historical practice in the days before "struct
object_id" was invented, but the natural fourth friend of the above
group would take an algop and fill a flat byte buffer, and it would
be strange to name it get_sha1_hex_algop().  Do we use the passed in
algo, or are we limited to SHA-1 ;-)?

In fact, such a function exists, albeit as a private helper function
used by the implementation of these functions, and is named a lot
more sensibly: get_hash_hex_algop().

Correct the misnomer of get_sha1_hex() and use "hash", instead of
"sha1", as "flat byte buffer that stores binary (as opposed to
hexadecimal) representation of the hash".

The four (2x2) friends now become:

 * get_hash_hex()       - use the implied the_hash_algo, fill uchar *
 * get_oid_hex()        - use the implied the_hash_algo, fill oid *
 * get_hash_hex_algop() - use the passed algop, fill uchar *
 * get_oid_hex_algop()  - use the passed algop, fill oid *

As there are only two remaining calls to get_sha1_hex() in the
codebase right now, the blast radious of this change is fairly
small.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile.c: use checked arithmetic in `nth_packed_object_offset()`</title>
<updated>2023-07-14T16:32:03Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-07-12T23:37:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a519abca02eeca7dce864717b9664c62a124e1c0'/>
<id>urn:sha1:a519abca02eeca7dce864717b9664c62a124e1c0</id>
<content type='text'>
In a similar spirit as the previous commits, ensure that we use
`st_add()` or `st_mult()` when computing values that may overflow the
32-bit unsigned limit.

Note that in each of these instances, we prevent 32-bit overflow
already since we have explicit casts to `size_t`.

So this code is OK as-is, but let's clarify it by using the `st_xyz()`
helpers to make it obvious that we are performing the relevant
computations using 64 bits.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile.c: prevent overflow in `load_idx()`</title>
<updated>2023-07-14T16:31:34Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-07-14T00:54:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=42be681b33ef73be056fb99e3c63c6e9b9c2e7ef'/>
<id>urn:sha1:42be681b33ef73be056fb99e3c63c6e9b9c2e7ef</id>
<content type='text'>
Prevent an overflow when locating a pack's CRC offset when the number
of packed items is greater than 2^32-1/hashsz by guarding the
computation with an `st_mult()`.

Note that to avoid truncating the result, the `crc_offset` member must
itself become a `size_t`. The only usage of this variable (besides the
assignment in `load_idx()`) is in `read_v2_anomalous_offsets()` in the
index-pack code. There we use the `crc_offset` as a pointer offset, so
we are already equipped to handle the type change.

Helped-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile.c: prevent overflow in `nth_packed_object_id()`</title>
<updated>2023-07-13T04:44:59Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-07-12T23:37:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=de41d03e1c7ab73174716c99b8eaf7ff5884d6bb'/>
<id>urn:sha1:de41d03e1c7ab73174716c99b8eaf7ff5884d6bb</id>
<content type='text'>
In 37fec86a83 (packfile: abstract away hash constant values,
2018-05-02), `nth_packed_object_id()` started using the variable
`the_hash_algo-&gt;rawsz` instead of a fixed constant when trying to
compute an offset into the ".idx" file for some object position.

This can lead to surprising truncation when looking for an object
towards the end of a large enough pack, like the following:

    (gdb) p hashsz
    $1 = 20
    (gdb) p n
    $2 = 215043814
    (gdb) p hashsz * n
    $3 = 5908984

, which is a debugger session broken on a known-bad call to the
`nth_packed_object_id()` function.

This behavior predates 37fec86a83, and is original to the v2 index
format, via: 74e34e1fca (sha1_file.c: learn about index version 2,
2007-04-09).

This is due to §6.4.4.1 of the C99 standard, which states that an
untyped integer constant will take the first type in which the value can
be accurately represented, among `int`, `long int`, and `long long int`.

Since 20 can be represented as an `int`, and `n` is a 32-bit unsigned
integer, the resulting computation is defined by §6.3.1.8, and the
(signed) integer value representing `n` is converted to an unsigned
type, meaning that `20 * n` (for `n` having type `uint32_t`) is
equivalent to a multiplication between two unsigned 32-bit integers.

When multiplying a sufficiently large `n`, the resulting value can
exceed 2^32-1, wrapping around and producing an invalid result. Let's
follow the example in f86f769550e (compute pack .idx byte offsets using
size_t, 2020-11-13) and replace this computation with `st_mult()`, which
will ensure that the computation is done using 64-bits.

While here, guard the corresponding computation for packs with v1
indexes, too. Though the likelihood of seeing a bug there is much
smaller, since (a) v1 indexes are generated far less frequently than v2
indexes, and (b) they all correspond to packs no larger than 2 GiB, so
having enough objects to trigger this overflow is unlikely if not
impossible.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>git-compat-util: move alloc macros to git-compat-util.h</title>
<updated>2023-07-05T18:42:31Z</updated>
<author>
<name>Calvin Wan</name>
<email>calvinwan@google.com</email>
</author>
<published>2023-07-05T17:09:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91c080dff511b7a81f91d1cc79589b49e16a2b7a'/>
<id>urn:sha1:91c080dff511b7a81f91d1cc79589b49e16a2b7a</id>
<content type='text'>
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.

Signed-off-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
