<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-check.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2026-03-23T15:33:11Z</updated>
<entry>
<title>builtin/fsck: stop using `the_repository` when checking packed objects</title>
<updated>2026-03-23T15:33:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-23T15:03:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1c5f77b6103adae5d45ae9ff24e9945b8f8b76c8'/>
<id>urn:sha1:1c5f77b6103adae5d45ae9ff24e9945b8f8b76c8</id>
<content type='text'>
We implicitly rely on `the_repository` when checking objects part of a
packfile. These objects are iterated over via `verify_pack()`, which is
provided by the packfile subsystem, and a callback function is then
invoked for each of the objects in that specific pack.

Unfortunately, it is not possible to provide a payload to the callback
function. Refactor `verify_pack()` to accept a payload that is passed
through to the callback so that we can inject the repository and get rid
of the use of `the_repository`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-check: fix verification of large objects</title>
<updated>2026-02-23T21:19:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-02-23T16:00:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=13eb65d36615d7269df053015bddf7987ef6d923'/>
<id>urn:sha1:13eb65d36615d7269df053015bddf7987ef6d923</id>
<content type='text'>
It was reported [1] that git-fsck(1) may sometimes run into an infinite
loop when processing packfiles. This bug was bisected to c31bad4f7d
(packfile: track packs via the MRU list exclusively, 2025-10-30), which
refactored our lsit of packfiles to only be tracked via an MRU list,
exclusively. This isn't entirely surprising: any caller that iterates
through the list of packfiles and then hits `find_pack_entry()`, for
example because they read an object from it, may cause the MRU list to
be updated. And if the caller is unlucky, this may cause the mentioned
infinite loop.

While this mechanism is somewhat fragile, it is still surprising that we
encounter it when verifying the packfile. We iterate through objects in
a given pack one by one and then read them via their offset, and doing
this shouldn't ever end up in `find_pack_entry()`.

But there is an edge case here: when the object in question is a blob
bigger than "core.largeFileThreshold", then we will be careful to not
read it into memory. Instead, we read it via an object stream by calling
`odb_read_object_stream()`, and that function will perform an object
lookup via `odb_read_object_info()`. So in the case where there are at
least two blobs in two different packfiles, and both of these blobs
exceed "core.largeFileThreshold", then we'll run into an infinite loop
because we'll always update the MRU.

We could fix this by improving `repo_for_each_pack()` to not update the
MRU, and this would address the issue. But the fun part is that using
`odb_read_object_stream()` is the wrong thing to do in the first place:
it may open _any_ instance of this object, so we ultimately cannot be
sure that we even verified the object in our given packfile.

Fix this bug by creating the object stream for the packed object
directly via `packfile_read_object_stream()`. Add a test that would have
caused the infinite loop.

[1]: &lt;20260222183710.2963424-1-sandals@crustytoothpaste.net&gt;

Reported-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: adapt `stream_object_signature()` to take a stream</title>
<updated>2026-02-23T21:19:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-02-23T16:00:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=10a6762719f612bb5edc554e62239a744bbc4283'/>
<id>urn:sha1:10a6762719f612bb5edc554e62239a744bbc4283</id>
<content type='text'>
The function `stream_object_signature()` is responsible for verifying
whether the given object ID matches the actual hash of the object's
contents. In contrast to `check_object_signature()` it does so in a
streaming fashion so that we don't have to load the full object into
memory.

In a subsequent commit we'll want to adapt one of its callsites to pass
a preconstructed stream. Prepare for this by accepting a stream as input
that the caller needs to assemble.

While at it, improve the error reporting in `parse_object_with_flags()`
to tell apart the two failure modes.

Helped-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: rename files to "odb.{c,h}"</title>
<updated>2025-07-01T21:46:34Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8f49151763cb81adf4bcec53c1ae67057081b02d'/>
<id>urn:sha1:8f49151763cb81adf4bcec53c1ae67057081b02d</id>
<content type='text'>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: merge "object-store-ll.h" and "object-store.h"</title>
<updated>2025-04-15T15:24:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68cd492a3e662c75dec364986c81e94716d4ac56'/>
<id>urn:sha1:68cd492a3e662c75dec364986c81e94716d4ac56</id>
<content type='text'>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-check: stop depending on `the_repository`</title>
<updated>2025-03-10T20:16:19Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7ebf19ce55ebfddd152aab6ddcc6559bba378aec'/>
<id>urn:sha1:7ebf19ce55ebfddd152aab6ddcc6559bba378aec</id>
<content type='text'>
There are multiple sites in "pack-check.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`. In all of those cases we already have a repository
available in the calling context though.

Refactor the code to instead use the caller-provided repository and
remove the `USE_THE_REPOSITORY_VARIABLE` define.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment: move access to "core.bigFileThreshold" into repo settings</title>
<updated>2025-03-10T20:16:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7835ee75cdffbce925246cbacc83e8b4a932a681'/>
<id>urn:sha1:7835ee75cdffbce925246cbacc83e8b4a932a681</id>
<content type='text'>
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.

Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.

Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>csum-file: stop depending on `the_repository`</title>
<updated>2025-03-10T20:16:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=228457c9d9f32f000f5c04c36fcce9002f72965a'/>
<id>urn:sha1:228457c9d9f32f000f5c04c36fcce9002f72965a</id>
<content type='text'>
There are multiple sites in "csum-file.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`.

Refactor the code to stop using `the_repository` by adapting functions
to receive required data as parameters. Adapt callsites accordingly by
either using `the_repository-&gt;hash_algo`, or by using a context-provided
hash algorithm in case the subsystem already got rid of its dependency
on `the_repository`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: adapt callers to use generic hash context helpers</title>
<updated>2025-01-31T18:06:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-01-31T12:55:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0578f1e66aa381356bfe2f53decf3864d88d23d3'/>
<id>urn:sha1:0578f1e66aa381356bfe2f53decf3864d88d23d3</id>
<content type='text'>
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop typedeffing the hash context</title>
<updated>2025-01-31T18:06:10Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-01-31T12:55:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7346e340f147131ca32089f61f7d0f502f80d19d'/>
<id>urn:sha1:7346e340f147131ca32089f61f7d0f502f80d19d</id>
<content type='text'>
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.

Drop the typedef and adapt all callers accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
