<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object-file.c, branch v2.34.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.34.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.34.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-11-12T23:29:25Z</updated>
<entry>
<title>Merge branch 'ab/fsck-unexpected-type'</title>
<updated>2021-11-12T23:29:25Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-11-12T23:29:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2c0fa66bc81a0725cd65fda69640552361536e50'/>
<id>urn:sha1:2c0fa66bc81a0725cd65fda69640552361536e50</id>
<content type='text'>
Regression fix.

* ab/fsck-unexpected-type:
  object-file: free(*contents) only in read_loose_object() caller
  object-file: fix SEGV on free() regression in v2.34.0-rc2
</content>
</entry>
<entry>
<title>object-file: free(*contents) only in read_loose_object() caller</title>
<updated>2021-11-11T21:40:43Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-11-11T05:18:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=16235e3b1460ddd708995b4cc5028e49d47e3761'/>
<id>urn:sha1:16235e3b1460ddd708995b4cc5028e49d47e3761</id>
<content type='text'>
In the preceding commit a free() of uninitialized memory regression in
96e41f58fe1 (fsck: report invalid object type-path combinations,
2021-10-01) was fixed, but we'd still have an issue with leaking
memory from fsck_loose(). Let's fix that issue too.

That issue was introduced in my 31deb28f5e0 (fsck: don't hard die on
invalid object types, 2021-10-01). It can be reproduced under
SANITIZE=leak with the test I added in 093fffdfbec (fsck tests: add
test for fsck-ing an unknown type, 2021-10-01):

    ./t1450-fsck.sh --run=84 -vixd

In some sense it's not a problem, we lost the same amount of memory in
terms of things malloc'd and not free'd. It just moved from the "still
reachable" to "definitely lost" column in valgrind(1) nomenclature[1],
since we'd have die()'d before.

But now that we don't hard die() anymore in the library let's properly
free() it. Doing so makes this code much easier to follow, since we'll
now have one function owning the freeing of the "contents" variable,
not two.

For context on that memory management pattern the read_loose_object()
function was added in f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) and subsequently used in c68b489e564 (fsck:
parse loose object paths directly, 2017-01-13). The pattern of it
being the task of both sides to free() the memory has been there in
this form since its inception.

1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: fix SEGV on free() regression in v2.34.0-rc2</title>
<updated>2021-11-11T18:41:54Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-11-11T05:18:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=168a937bbcf74370267fe04f5368035948745baa'/>
<id>urn:sha1:168a937bbcf74370267fe04f5368035948745baa</id>
<content type='text'>
Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
object type-path combinations, 2021-10-01). When fsck-ing blobs larger
than core.bigFileThreshold, we'd free() a pointer to uninitialized
memory.

This issue would have been caught by SANITIZE=address, but since it
involves core.bigFileThreshold, none of the existing tests in our test
suite covered it.

Running them with the "big_file_threshold" in "environment.c" changed
to say "6" would have shown this failure, but let's add a dedicated
test for this scenario based on Han Xin's report[1].

The bug was introduced between v9 and v10[2] of the fsck series merged
in 061a21d36d8 (Merge branch 'ab/fsck-unexpected-type', 2021-10-25).

1. https://lore.kernel.org/git/20211111030302.75694-1-hanxin.hx@alibaba-inc.com/
2. https://lore.kernel.org/git/cover-v10-00.17-00000000000-20211001T091051Z-avarab@gmail.com/

Reported-by: Han Xin &lt;chiyutianyi@gmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/fix-commit-error-message-upon-unwritable-object-store'</title>
<updated>2021-10-25T23:06:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-10-25T23:06:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2c428e4205a50cee19669d5b73cc149ec2254a5d'/>
<id>urn:sha1:2c428e4205a50cee19669d5b73cc149ec2254a5d</id>
<content type='text'>
"git commit" gave duplicated error message when the object store
was unwritable, which has been corrected.

* ab/fix-commit-error-message-upon-unwritable-object-store:
  commit: fix duplication regression in permission error output
  unwritable tests: assert exact error output
</content>
</entry>
<entry>
<title>Merge branch 'jt/no-abuse-alternate-odb-for-submodules'</title>
<updated>2021-10-25T23:06:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-10-25T23:06:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=162a13b855ee8d920a31d6d1e928cef0f0a18e18'/>
<id>urn:sha1:162a13b855ee8d920a31d6d1e928cef0f0a18e18</id>
<content type='text'>
Follow through the work to use the repo interface to access
submodule objects in-process, instead of abusing the alternate
object database interface.

* jt/no-abuse-alternate-odb-for-submodules:
  submodule: trace adding submodule ODB as alternate
  submodule: pass repo to check_has_commit()
  object-file: only register submodule ODB if needed
  merge-{ort,recursive}: remove add_submodule_odb()
  refs: peeling non-the_repository iterators is BUG
  refs: teach arbitrary repo support to iterators
  refs: plumb repo into ref stores
</content>
</entry>
<entry>
<title>Merge branch 'ab/fsck-unexpected-type'</title>
<updated>2021-10-25T23:06:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-10-25T23:06:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=061a21d36d807bdcf996f388d5e487d5e1993bbc'/>
<id>urn:sha1:061a21d36d807bdcf996f388d5e487d5e1993bbc</id>
<content type='text'>
"git fsck" has been taught to report mismatch between expected and
actual types of an object better.

* ab/fsck-unexpected-type:
  fsck: report invalid object type-path combinations
  fsck: don't hard die on invalid object types
  object-file.c: stop dying in parse_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: don't set "typep" when returning non-zero
  cat-file tests: test for current --allow-unknown-type behavior
  cat-file tests: add corrupt loose object test
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: move bogus_* variable declarations earlier
  fsck tests: test for garbage appended to a loose object
  fsck tests: test current hash/type mismatch behavior
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
</content>
</entry>
<entry>
<title>commit: fix duplication regression in permission error output</title>
<updated>2021-10-12T18:16:59Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-10-12T14:30:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4ef91a2d795c424eda2bec1bfbbd0c813bcc978a'/>
<id>urn:sha1:4ef91a2d795c424eda2bec1bfbbd0c813bcc978a</id>
<content type='text'>
Fix a regression in the error output emitted when .git/objects can't
be written to. Before 9c4d6c0297b (cache-tree: Write updated
cache-tree after commit, 2014-07-13) we'd emit only one "insufficient
permission" error, now we'll do so again.

The cause is rather straightforward, we've got WRITE_TREE_SILENT for
the use-case of wanting to prepare an index silently, quieting any
permission etc. error output. Then when we attempt to update to
that (possibly broken) index we'll run into the same errors again.

But with 9c4d6c0297b the gap between the cache-tree API and the object
store wasn't closed in terms of asking write_object_file() to be
silent. I.e. post-9c4d6c0297b the first call is to prepare_index(),
and after that we'll call prepare_to_commit(). We only want verbose
error output from the latter.

So let's add and use that facility with a corresponding HASH_SILENT
flag, its only user is cache-tree.c's update_one(), which will set it
if its "WRITE_TREE_SILENT" flag is set.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: only register submodule ODB if needed</title>
<updated>2021-10-08T22:06:06Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2021-10-08T21:08:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eef71904ff68e90f2db37aa9e25fe82c3fccf2a0'/>
<id>urn:sha1:eef71904ff68e90f2db37aa9e25fe82c3fccf2a0</id>
<content type='text'>
In a35e03dee0 ("submodule: lazily add submodule ODBs as alternates",
2021-09-08), Git was taught to add all known submodule ODBs as
alternates when attempting to read an object that doesn't exist, as a
fallback for when a submodule object is read as if it were in
the_repository. However, this behavior wasn't restricted to happen only
when reading from the_repository. Fix this.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'hn/refs-errno-cleanup'</title>
<updated>2021-10-04T04:49:18Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-10-04T04:49:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=92382d14cd4146ad05e891a529ed3ce5822d11f7'/>
<id>urn:sha1:92382d14cd4146ad05e891a529ed3ce5822d11f7</id>
<content type='text'>
Futz with the way 'errno' is relied on in the refs API to carry the
failure modes up the call chain.

* hn/refs-errno-cleanup:
  refs: make errno output explicit for read_raw_ref_fn
  refs/files-backend: stop setting errno from lock_ref_oid_basic
  refs: remove EINVAL errno output from specification of read_raw_ref_fn
  refs file backend: move raceproof_create_file() here
</content>
</entry>
<entry>
<title>fsck: report invalid object type-path combinations</title>
<updated>2021-10-01T22:06:01Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-10-01T09:16:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=96e41f58fe1a5aeadf2bf1c1850c53a1c1144bbc'/>
<id>urn:sha1:96e41f58fe1a5aeadf2bf1c1850c53a1c1144bbc</id>
<content type='text'>
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob &lt;/dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally &lt;/dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob &lt;/dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage &gt;&gt;objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally &lt;/dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage &gt;&gt;objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
