<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/builtin/diff.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-08-10T00:22:01Z</updated>
<entry>
<title>diff: --no-index should ignore the worktree</title>
<updated>2025-08-10T00:22:01Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-08-10T00:20:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e1d3d61a45bfdc5031d2066c0e4505ebd8145777'/>
<id>urn:sha1:e1d3d61a45bfdc5031d2066c0e4505ebd8145777</id>
<content type='text'>
The act of giving "--no-index" tells Git to pretend that the current
directory is not under control of any Git index or repository, so
even when you happen to be in a Git controlled working tree, where
in that working tree should not matter.

But the start-up sequence tries to discover the top of the working
tree and chdir(2)'s there, even before Git passes control to the
subcommand being run.  When diff_no_index() starts running, it
starts at a wrong (from the end-user's point of view who thinks
"git diff --no-index" is merely a better version of GNU diff)
directory, and the original directory the user started the command
is at "prefix".

Because the paths given from argv[] have already been adjusted to
account for this path shuffling by prepending the prefix, and
showing the resulting path by stripping the prefix, the effect of
these nonsense operations (nonsense in the context of "--no-index",
that is) is usually not observable.

Except for special cases like "-", where it is not preprocessed by
prepending the prefix.

Instead of papering over by adding more special cases only to cater
to the no-index codepath in the generic code, drive the diff
machinery more faithfully to what is going on.  If the user started
"git diff --no-index" in directory X/Y/Z in a working tree
controlled by Git, and the start up sequence of Git chdir(2)'ed up
to directory X and left Y/Z in the prefix, revert the effect of the
start up sequence by chdir'ing back to Y/Z and emptying the prefix.

Reported-by: Gregoire Geis &lt;opensource@gregoirege.is&gt;
Helped-by: Ramsay Jones &lt;ramsay@ramsayjones.plus.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: drop `git_config()` wrapper</title>
<updated>2025-07-23T15:15:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-23T14:08:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9ce196e86b455fa2552812802c58f30c090c94af'/>
<id>urn:sha1:9ce196e86b455fa2552812802c58f30c090c94af</id>
<content type='text'>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'bc/use-sha256-by-default-in-3.0' into ps/config-wo-the-repository</title>
<updated>2025-07-17T16:30:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-07-17T16:30:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=86c9c14eb9c7bfa20efd8d65f1aaa685282b7221'/>
<id>urn:sha1:86c9c14eb9c7bfa20efd8d65f1aaa685282b7221</id>
<content type='text'>
* bc/use-sha256-by-default-in-3.0:
  Enable SHA-256 by default in breaking changes mode
  help: add a build option for default hash
  t5300: choose the built-in hash outside of a repo
  t4042: choose the built-in hash outside of a repo
  t1007: choose the built-in hash outside of a repo
  t: default to compile-time default hash if not set
  setup: use the default algorithm to initialize repo format
  Use legacy hash for legacy formats
  builtin: use default hash when outside a repository
  hash: add a constant for the legacy hash algorithm
  hash: add a constant for the default hash algorithm
</content>
</entry>
<entry>
<title>builtin: use default hash when outside a repository</title>
<updated>2025-07-01T21:58:24Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2025-07-01T21:22:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dc9c16c2fc8222364277696cb4d70782281d3c06'/>
<id>urn:sha1:dc9c16c2fc8222364277696cb4d70782281d3c06</id>
<content type='text'>
We have some commands that can operate inside or outside a repository.
If we're operating outside a repository, we clearly cannot use the
repository's hash algorithm as a default since it doesn't exist, so
instead, let's pick the default instead of specifically SHA-1.  Right
now this results in no functional change since the default is SHA-1, but
that may change in the future.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff --no-index: support limiting by pathspec</title>
<updated>2025-05-22T21:20:11Z</updated>
<author>
<name>Jacob Keller</name>
<email>jacob.keller@gmail.com</email>
</author>
<published>2025-05-21T23:29:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=09fb155f11128b505c227aae673de957c9388240'/>
<id>urn:sha1:09fb155f11128b505c227aae673de957c9388240</id>
<content type='text'>
The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.

When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:

  $ git diff-tree -r &lt;treeish1&gt; &lt;treeish2&gt;.

where the treeish could point to a subdirectory of the repository.

When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.

The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.

Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.

For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.

Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.

Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.

After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.

Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.

Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.

The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.

Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.

Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.

If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.

To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.

Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.

Signed-off-by: Jacob Keller &lt;jacob.keller@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: drop `UNLEAK()` annotation</title>
<updated>2024-11-20T23:23:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-11-20T13:39:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d91a9db33c53744b3d1034926edda7846442bc9e'/>
<id>urn:sha1:d91a9db33c53744b3d1034926edda7846442bc9e</id>
<content type='text'>
There are two users of `UNLEAK()` left in our codebase:

  - In "builtin/clone.c", annotating the `repo` variable. That leak has
    already been fixed though as you can see in the context, where we do
    know to free `repo_to_free`.

  - In "builtin/diff.c", to unleak entries of the `blob[]` array. That
    leak has also been fixed, because the entries we assign to that
    array come from `rev.pending.objects`, and we do eventually release
    `rev`.

This neatly demonstrates one of the issues with `UNLEAK()`: it is quite
easy for the annotation to become stale. A second issue is that its
whole intent is to paper over leaks. And while that has been a necessary
evil in the past, because Git was leaking left and right, it isn't
really much of an issue nowadays where our test suite has no known leaks
anymore.

Remove the last two users of this macro.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jc/pass-repo-to-builtins'</title>
<updated>2024-09-23T17:35:09Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-09-23T17:35:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b8e318ea58a0502ff99f37032ee8ac536df4e730'/>
<id>urn:sha1:b8e318ea58a0502ff99f37032ee8ac536df4e730</id>
<content type='text'>
The convention to calling into built-in command implementation has
been updated to pass the repository, if known, together with the
prefix value.

* jc/pass-repo-to-builtins:
  add: pass in repo variable instead of global the_repository
  builtin: remove USE_THE_REPOSITORY for those without the_repository
  builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h
  builtin: add a repository parameter for builtin functions
</content>
</entry>
<entry>
<title>Merge branch 'jc/range-diff-lazy-setup'</title>
<updated>2024-09-16T21:22:55Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-09-16T21:22:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=be8ca2848a9e73f6ddc31ebce2ddc3c367d4f0cb'/>
<id>urn:sha1:be8ca2848a9e73f6ddc31ebce2ddc3c367d4f0cb</id>
<content type='text'>
Code clean-up.

* jc/range-diff-lazy-setup:
  remerge-diff: clean up temporary objdir at a central place
  remerge-diff: lazily prepare temporary objdir on demand
</content>
</entry>
</feed>
