<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object.c, branch v2.43.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.43.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.43.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-06-29T23:43:21Z</updated>
<entry>
<title>Merge branch 'en/header-split-cache-h-part-3'</title>
<updated>2023-06-29T23:43:21Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-06-29T23:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a1264a08a1a6e0cd7e510c899cd0ba42dcf1045d'/>
<id>urn:sha1:a1264a08a1a6e0cd7e510c899cd0ba42dcf1045d</id>
<content type='text'>
Header files cleanup.

* en/header-split-cache-h-part-3: (28 commits)
  fsmonitor-ll.h: split this header out of fsmonitor.h
  hash-ll, hashmap: move oidhash() to hash-ll
  object-store-ll.h: split this header out of object-store.h
  khash: name the structs that khash declares
  merge-ll: rename from ll-merge
  git-compat-util.h: remove unneccessary include of wildmatch.h
  builtin.h: remove unneccessary includes
  list-objects-filter-options.h: remove unneccessary include
  diff.h: remove unnecessary include of oidset.h
  repository: remove unnecessary include of path.h
  log-tree: replace include of revision.h with simple forward declaration
  cache.h: remove this no-longer-used header
  read-cache*.h: move declarations for read-cache.c functions from cache.h
  repository.h: move declaration of the_index from cache.h
  merge.h: move declarations for merge.c from cache.h
  diff.h: move declaration for global in diff.c from cache.h
  preload-index.h: move declarations for preload-index.c from elsewhere
  sparse-index.h: move declarations for sparse-index.c from cache.h
  name-hash.h: move declarations for name-hash.c from cache.h
  run-command.h: move declarations for run-command.c from cache.h
  ...
</content>
</entry>
<entry>
<title>cache.h: remove this no-longer-used header</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bc5c5ec0446895f5c4139cd470066beb3c4ac6d5'/>
<id>urn:sha1:bc5c5ec0446895f5c4139cd470066beb3c4ac6d5</id>
<content type='text'>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.

Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first).  This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>statinfo: move stat_{data,validity} functions from cache/read-cache</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=90cbae9ce5d22df29867be9026c514b8c79e3d31'/>
<id>urn:sha1:90cbae9ce5d22df29867be9026c514b8c79e3d31</id>
<content type='text'>
These functions do not depend upon struct cache_entry or struct
index_state in any way, and it seems more logical to break them out into
this file, especially since statinfo.h already has the struct stat_data
declaration.

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object: add object_array initializer helper function</title>
<updated>2023-05-08T19:05:55Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-05-08T17:38:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fe90355361430dc52f858845a821370db0c54c80'/>
<id>urn:sha1:fe90355361430dc52f858845a821370db0c54c80</id>
<content type='text'>
The object_array API has an OBJECT_ARRAY_INIT macro, but lacks a
function to initialize an object_array at a given location in memory.

Introduce `object_array_init()` to implement such a function.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file.h: move declarations for object-file.c functions from cache.h</title>
<updated>2023-04-11T15:52:10Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-11T07:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=87bed17907b2cb9a9581a5b8b16b8da264c2a2a8'/>
<id>urn:sha1:87bed17907b2cb9a9581a5b8b16b8da264c2a2a8</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Acked-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: be explicit about dependence on gettext.h</title>
<updated>2023-03-21T17:56:51Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-03-21T06:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f394e093df10f1867d9bb2180b3789ee61124aed'/>
<id>urn:sha1:f394e093df10f1867d9bb2180b3789ee61124aed</id>
<content type='text'>
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cache.h: remove dependence on hex.h; make other files include it explicitly</title>
<updated>2023-02-24T01:25:29Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-02-24T00:09:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41771fa435a44ff8be3f23753bde0309a2a65b03'/>
<id>urn:sha1:41771fa435a44ff8be3f23753bde0309a2a65b03</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>blob: drop unused parts of parse_blob_buffer()</title>
<updated>2022-12-13T13:16:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-12-13T11:11:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c1166ca0e23d60629e0e7babd4eb4be64f578286'/>
<id>urn:sha1:c1166ca0e23d60629e0e7babd4eb4be64f578286</id>
<content type='text'>
Our parse_blob_buffer() takes a ptr/len combo, just like
parse_tree_buffer(), etc, and returns success or failure. But it doesn't
actually do anything with them; we just set the "parsed" flag in the
object and return success, without even looking at the contents.

There could be some value to keeping these unused parameters:

  - it's consistent with the parse functions for other object types. But
    we already lost that consistency in 837d395a5c (Replace parse_blob()
    with an explanatory comment, 2010-01-18).

  - As the comment from 837d395a5c explains, callers are supposed to
    make sure they have the object content available. So in theory
    asking for these parameters could serve as a signal. But there are
    only two callers, and one of them always passes NULL (after doing a
    streaming check of the object hash).

    This shows that there aren't likely to be a lot of callers (since
    everyone either uses the type-generic parse functions, or handles
    blobs individually), and that they need to take special care anyway
    (because we usually want to avoid loading whole blobs in memory if
    we can avoid it).

So let's just drop these unused parameters, and likewise the useless
return value. While we're touching the header file, let's move the
declaration of parse_blob_buffer() right below that explanatory comment,
where it's more likely to be seen by people looking for the function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>parse_object(): simplify blob conditional</title>
<updated>2022-11-22T01:13:54Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-11-21T19:26:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=40286ca2fa1e08c386ea7bc6b76616a3cac63ffd'/>
<id>urn:sha1:40286ca2fa1e08c386ea7bc6b76616a3cac63ffd</id>
<content type='text'>
Commit 8db2dad7a0 (parse_object(): check on-disk type of suspected blob,
2022-11-17) simplified the conditional for checking if we might have a
blob. But we can simplify it further. In:

  !obj || (obj &amp;&amp; obj-&gt;type == OBJ_BLOB)

the short-circuit "OR" means "obj" will always be true on the right-hand
side. The compiler almost certainly optimized that out anyway, but
dropping it makes the conditional easier to understand for humans.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>parse_object(): check on-disk type of suspected blob</title>
<updated>2022-11-18T18:59:31Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-11-17T22:41:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8db2dad7a045e376b9c4f51ddd33da43c962e3a4'/>
<id>urn:sha1:8db2dad7a045e376b9c4f51ddd33da43c962e3a4</id>
<content type='text'>
In parse_object(), we try to handle blobs by streaming rather than
loading them entirely into memory. The most common case here will be
that we haven't seen the object yet and check oid_object_info(), which
tells us we have a blob.

But we trigger this code on one other case: when we have an in-memory
object struct with type OBJ_BLOB (and without its "parsed" flag set,
since otherwise we'd return early from the function). This indicates
that some other part of the code suspected we have a blob (e.g., it was
mentioned by a tree or tag) but we haven't yet looked at the on-disk
copy.

In this case before hitting the streaming path, we check if we have the
object on-disk at all. This is mostly pointless extra work, as the
streaming path would complain if it couldn't open the object (albeit
with the message "hash mismatch", which is a little misleading).

But it's also insufficient to catch all problems. The streaming code
will only tell us "yes, the on-disk object matches the oid". But it
doesn't actually confirm that what we found was indeed a blob, and
neither does repo_has_object_file().

One way to improve this would be to teach stream_object_signature() to
check the type (either by returning it to us to check, or taking an
"expected" type). But there's an even simpler fix here: if we suspect
the object is a blob, just call oid_object_info() to confirm that we
have it on-disk, and that it really is a blob.

This is slightly less efficient than teaching stream_object_signature()
to do it (since it has to open the object already). But this case very
rarely comes up. In practice, we usually don't have any clue what the
type is, in which case we already call oid_object_info(). This
"suspected" case happens only when some other code created an object
struct but didn't actually parse the blob, which is actually tricky to
trigger at all (see the discussion of the test below).

I reworked the conditional a bit so that instead of:

  if ((suspected_blob &amp;&amp; oid_object_info() == OBJ_BLOB)
      (no_clue &amp;&amp; oid_object_info() == OBJ_BLOB)

we have the simpler:

  if ((suspected_blob || no_clue) &amp;&amp; oid_object_info() == OBJ_BLOB)

This is shorter, but also reflects what we really want say, which is
"have we ruled out this being a blob; if not, check it on-disk".

In either case, if oid_object_info() fails to tell us it's a blob, we'll
skip the streaming code path and call repo_read_object_file(), just as
before. And if we really do have a mismatch with the existing object
struct, we'll eventually call lookup_commit(), etc, via
parse_object_buffer(), which will complain that it doesn't match our
existing obj-&gt;type.

So this fixes one of the lingering expect_failure cases from 0616617c7e
(t: introduce tests for unexpected object types, 2019-04-09).  That test
works by peeling a tag that claims to point to a blob (triggering us to
create the struct), but really points to something else, which we later
discover when we call parse_object() as part of the actual traversal).
Prior to this commit, we'd quietly check the sha1 and mark the blob as
"parsed". Now we correctly complain about the mismatch.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
</feed>
