<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diff-no-index.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-09-25T18:35:20Z</updated>
<entry>
<title>diff --no-index: fix logic for paths ending in '/'</title>
<updated>2025-09-25T18:35:20Z</updated>
<author>
<name>Jacob Keller</name>
<email>jacob.e.keller@intel.com</email>
</author>
<published>2025-09-24T20:57:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c0bec06cfedb930ecb3cd8c8cdb0f3e14e5e9308'/>
<id>urn:sha1:c0bec06cfedb930ecb3cd8c8cdb0f3e14e5e9308</id>
<content type='text'>
If one of the two provided paths for git diff --no-index ends in a '/',
a failure similar to the following occurs:

  $ git diff --no-index -- /tmp/ /tmp/ ':!'
  fatal: `pos + len' is too far after the end of the buffer

This occurs because of an incorrect calculation of the skip lengths in
diff_no_index(). The code wants to calculate the length of the string,
but add one in case the string doesn't end with a slash.

The method it uses is incorrect, as it always checks the trailing NUL
character of the string. This will never be a '/', so we always add one.
In the event that we *do* have a trailing slash, this will create an
off-by-one length error later when using the skip value.

The most straightforward fix would be to correct the skip1 and skip2
lengths by using ends_with().

However, Johannes made a good point that the existing logic is wasting a
lot of computation. We generate the match string by copying the path in
and then skipping almost all of it immediately with a potentially
expensive memmove() from the strbuf_remove() call. We also re-initialize
the match stringbuf each time we call read_directory_contents.

The read_directory_contents really wants a path that is rooted at the
start of the directory scan. We're currently building this by taking the
full path and stripping out the start portion. Instead, replace this
logic by building up the portion of the match as we go.

Start by initializing two strbuf in diff_no_index containing the empty
string. Pass these into queue_diff, which in turn passes the appropriate
left or right side into read_directory_contents.

As before, we build up the matches by appending elements to the match
path and then clearing them using strbuf_setlen.

In the recursive portion of the queue_diff algorithm, we build up new
match paths the same way that we build up new buffer paths, by appending
the elements and then clearing them with strbuf_setlen after each
iteration. This is cheaper as it avoids repeated allocations, and is a
bit simpler to track what is going on.

Add a couple of test cases that pass in paths already ending in '/', to
ensure the tests cover this regression.

Reported-by: Johannes Schindelin &lt;Johannes.Schindelin@gmx.de&gt;
Closes: https://lore.kernel.org/git/c75ec5f9-407a-6555-d4fb-bb629d54ec61@gmx.de/
Signed-off-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
[jc: small leakfixes at the end of diff_no_index()]
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-no-index: do not reference .d_type member of struct dirent</title>
<updated>2025-06-18T20:05:29Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-06-18T20:04:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d094e05ea596145aa4362cf957f9d99663d4a2e3'/>
<id>urn:sha1:d094e05ea596145aa4362cf957f9d99663d4a2e3</id>
<content type='text'>
Some platforms like AIX lack .d_type member in "struct dirent"; use
the DTYPE(e) macro instead of a direct reference to e-&gt;d_type and
when it yields DT_UNKNOWN, find the real type with get_dtype().

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff --no-index: support limiting by pathspec</title>
<updated>2025-05-22T21:20:11Z</updated>
<author>
<name>Jacob Keller</name>
<email>jacob.keller@gmail.com</email>
</author>
<published>2025-05-21T23:29:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=09fb155f11128b505c227aae673de957c9388240'/>
<id>urn:sha1:09fb155f11128b505c227aae673de957c9388240</id>
<content type='text'>
The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.

When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:

  $ git diff-tree -r &lt;treeish1&gt; &lt;treeish2&gt;.

where the treeish could point to a subdirectory of the repository.

When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.

The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.

Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.

For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.

Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.

Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.

After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.

Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.

Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.

The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.

Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.

Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.

If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.

To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.

Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.

Signed-off-by: Jacob Keller &lt;jacob.keller@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>remerge-diff: clean up temporary objdir at a central place</title>
<updated>2024-08-09T22:42:40Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-08-09T22:31:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4460e052e074490cfc083703fba285d3c2e36560'/>
<id>urn:sha1:4460e052e074490cfc083703fba285d3c2e36560</id>
<content type='text'>
After running a diff between two things, or a series of diffs while
walking the history, the diff computation is concluded by a call to
diff_result_code() to extract the exit status of the diff machinery.

The function can work on "struct diffopt", but all the callers
historically and currently pass "struct diffopt" that is embedded in
the "struct rev_info" that is used to hold the remerge_diff bit and
the remerge_objdir variable that points at the temporary object
directory in use.

Redefine diff_result_code() to take the whole "struct rev_info" to
give it an access to these members related to remerge-diff, so that
it can get rid of the temporary object directory for any and all
callers that used the feature.  We can lose the equivalent code to
do so from the code paths for individual commands, diff-tree, diff,
and log.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes in source files</title>
<updated>2023-12-26T20:04:31Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-12-23T17:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eea0e59ffbed6e33d171ace5be13cde9faa41639'/>
<id>urn:sha1:eea0e59ffbed6e33d171ace5be13cde9faa41639</id>
<content type='text'>
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'pw/diff-no-index-from-named-pipes'</title>
<updated>2023-09-20T17:44:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-09-20T17:44:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7435d51bfd183d2f6fd9fc9fc20a11d413d152ac'/>
<id>urn:sha1:7435d51bfd183d2f6fd9fc9fc20a11d413d152ac</id>
<content type='text'>
"git diff --no-index -R &lt;(one) &lt;(two)" did not work correctly,
which has been corrected.

* pw/diff-no-index-from-named-pipes:
  diff --no-index: fix -R with stdin
</content>
</entry>
<entry>
<title>diff --no-index: fix -R with stdin</title>
<updated>2023-09-11T19:05:37Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2023-09-09T22:12:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=48944f214c7cd7402e70e661cf9efb8dd118fe0c'/>
<id>urn:sha1:48944f214c7cd7402e70e661cf9efb8dd118fe0c</id>
<content type='text'>
When -R is given, queue_diff() swaps the mode and name variables of the
two files to produce a reverse diff.  1e3f26542a (diff --no-index:
support reading from named pipes, 2023-07-05) added variables that
indicate whether files are special, i.e named pipes or - for stdin.
These new variables were not swapped, though, which broke the handling
of stdin with with -R.  Swap them like the other metadata variables.

Reported-by: Martin Storsjö &lt;martin@martin.st&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: drop useless "status" parameter from diff_result_code()</title>
<updated>2023-08-21T22:33:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-08-21T20:20:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5cc6b2d70bc55ab75913ee93d9ac96ad875fbb29'/>
<id>urn:sha1:5cc6b2d70bc55ab75913ee93d9ac96ad875fbb29</id>
<content type='text'>
Many programs use diff_result_code() to get a user-visible program exit
code from a diff result (e.g., checking opts.found_changes if
--exit-code was requested).

This function also takes a "status" parameter, which seems at first
glance that it could be used to propagate an error encountered when
computing the diff. But it doesn't work that way:

  - negative values are passed through as-is, but are not appropriate as
    program exit codes

  - when --exit-code or --check is in effect, we _ignore_ the passed-in
    status completely. So a failed diff which did not have a chance to
    set opts.found_changes would erroneously report "success, no
    changes" instead of propagating the error.

After recent cleanups, neither of these bugs is possible to trigger, as
every caller just passes in "0". So rather than fixing them, we can
simply drop the useless parameter instead.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
