<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/path.c, branch v2.47.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.47.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.47.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-09-12T17:15:42Z</updated>
<entry>
<title>environment: guard state depending on a repository</title>
<updated>2024-09-12T17:15:42Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-12T11:30:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=673af418d0f271faadb24486348430e547d32d2a'/>
<id>urn:sha1:673af418d0f271faadb24486348430e547d32d2a</id>
<content type='text'>
In "environment.h" we have quite a lot of functions and variables that
either explicitly or implicitly depend on `the_repository`.

The implicit set of stateful declarations includes for example variables
which get populated when parsing a repository's Git configuration. This
set of variables is broken by design, as their state often depends on
the last repository config that has been parsed. So they may or may not
represent the state of `the_repository`.

Fixing that is quite a big undertaking, and later patches in this series
will demonstrate a solution for a first small set of those variables. So
for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that
callers are aware of the implicit dependency.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>path: hide functions using `the_repository` by default</title>
<updated>2024-08-13T17:01:01Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-13T09:13:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7ac16649eca0ee206fdb203a76fa4330484b3b6a'/>
<id>urn:sha1:7ac16649eca0ee206fdb203a76fa4330484b3b6a</id>
<content type='text'>
The path subsystem provides a bunch of legacy functions that compute
paths relative to the "gitdir" and "commondir" directories of the global
`the_repository` variable. Use of those functions is discouraged, and it
is easy to miss the implicit dependency on `the_repository` that calls
to those functions may cause.

With `USE_THE_REPOSITORY_VARIABLE`, we have recently introduced a tool
that allows us to get rid of such functions over time. With this macro,
we can hide away functions that have such implicit dependency such that
other subsystems that want to be free of `the_repository` will not use
them by accident.

Move all path-related functions that use `the_repository` into a block
that gets only conditionally compiled depending on whether or not the
macro has been defined. This also removes all dependencies on that
variable in "path.c", allowing us to remove the definition of said
preprocessor macro.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>path: stop relying on `the_repository` in `worktree_git_path()`</title>
<updated>2024-08-13T17:01:01Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-13T09:13:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a973f60dc7c178828e351ec4e68886ffecfbcadc'/>
<id>urn:sha1:a973f60dc7c178828e351ec4e68886ffecfbcadc</id>
<content type='text'>
When not provided a worktree, then `worktree_git_path()` will fall back
to returning a path relative to the main repository. In this case, we
implicitly rely on `the_repository` to derive the path. Remove this
dependency by passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>path: stop relying on `the_repository` when reporting garbage</title>
<updated>2024-08-13T17:01:01Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-13T09:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=78f2210b3c8605144d62a90b58e888312f64efc8'/>
<id>urn:sha1:78f2210b3c8605144d62a90b58e888312f64efc8</id>
<content type='text'>
We access `the_repository` in `report_linked_checkout_garbage()` both
directly and indirectly via `get_git_dir()`. Remove this dependency by
instead passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>path: expose `do_git_common_path()` as `repo_common_pathv()`</title>
<updated>2024-08-13T17:01:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-13T09:13:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=61419a42f641c7b9f7bfc9585e3ec9c393ab0166'/>
<id>urn:sha1:61419a42f641c7b9f7bfc9585e3ec9c393ab0166</id>
<content type='text'>
With the same reasoning as the preceding commit, expose the function
`do_git_common_path()` as `repo_common_pathv()`. While at it, reorder
parameters such that they match the order we have in `repo_git_pathv()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>path: expose `do_git_path()` as `repo_git_pathv()`</title>
<updated>2024-08-13T17:01:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-13T09:13:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b6c6bfef31e1ffe3804e5a1ba5bfe9e7879eda92'/>
<id>urn:sha1:b6c6bfef31e1ffe3804e5a1ba5bfe9e7879eda92</id>
<content type='text'>
We're about to move functions of the "path" subsytem that do not use a
`struct repository` into "path.h" as static inlined functions. This will
require us to call `do_git_path()`, which is internal to "path.c".

Expose the function as `repo_git_pathv()` to prepare for the change.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: introduce `USE_THE_REPOSITORY_VARIABLE` macro</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7da9385708accf518a80a1e17969020fb361048'/>
<id>urn:sha1:e7da9385708accf518a80a1e17969020fb361048</id>
<content type='text'>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/undecided-is-not-necessarily-sha1'</title>
<updated>2024-05-30T21:15:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-05-30T21:15:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a60c21b7206fff1a6ab561e29ac7312c437d2592'/>
<id>urn:sha1:a60c21b7206fff1a6ab561e29ac7312c437d2592</id>
<content type='text'>
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.

* ps/undecided-is-not-necessarily-sha1:
  repository: stop setting SHA1 as the default object hash
  oss-fuzz/commit-graph: set up hash algorithm
  builtin/shortlog: don't set up revisions without repo
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/bundle: abort "verify" early when there is no repository
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/rev-parse: allow shortening to more than 40 hex characters
  remote-curl: fix parsing of detached SHA256 heads
  attr: fix BUG() when parsing attrs outside of repo
  attr: don't recompute default attribute source
  parse-options-cb: only abbreviate hashes when hash algo is known
  path: move `validate_headref()` to its only user
  path: harden validation of HEAD with non-standard hashes
</content>
</entry>
<entry>
<title>path: move `validate_headref()` to its only user</title>
<updated>2024-05-07T05:50:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-07T04:52:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0c6bd2b81d1e18dfa1e143c354c554cca34b3685'/>
<id>urn:sha1:0c6bd2b81d1e18dfa1e143c354c554cca34b3685</id>
<content type='text'>
While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>path: harden validation of HEAD with non-standard hashes</title>
<updated>2024-05-07T05:50:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-07T04:52:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a0851cece542be9fabf00d6145860562db450205'/>
<id>urn:sha1:a0851cece542be9fabf00d6145860562db450205</id>
<content type='text'>
The `validate_headref()` function takes a path to a supposed "HEAD" file
and checks whether its format is something that we understand. It is
used as part of our repository discovery to check whether a specific
directory is a Git directory or not.

Part of the validation is a check for a detached HEAD that contains a
plain object ID. To do this validation we use `get_oid_hex()`, which
relies on `the_hash_algo`. At this point in time the hash algo cannot
yet be initialized though because we didn't yet read the Git config.
Consequently, it will always be the SHA1 hash algorithm.

In practice this works alright because `get_oid_hex()` only ends up
checking whether the prefix of the buffer is a valid object ID. And
because SHA1 is shorter than SHA256, the function will successfully
parse SHA256 object IDs, as well.

It is somewhat fragile though and not really the intent to only check
for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
to check whether the "HEAD" file parses as any known hash.

One might be hard pressed to tighten the check even further and fully
validate the file contents, not only the prefix. In practice though that
wouldn't make a lot of sense as it could be that the repository uses a
hash function that produces longer hashes than SHA256, but which the
current version of Git doesn't understand yet. We'd still want to detect
the repository as proper Git repository in that case, and we will fail
eventually with a proper error message that the hash isn't understood
when trying to set up the repository format.

It follows that we could just leave the current code intact, as in
practice the code change doesn't have any user visible impact. But it
also prepares us for `the_hash_algo` being unset when there is no
repository.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
