<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/userdiff.h, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-08-14T17:08:01Z</updated>
<entry>
<title>userdiff: fix leaking memory for configured diff drivers</title>
<updated>2024-08-14T17:08:01Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-14T06:52:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=38678e5df55b77e98b47e7847faf83451eb161de'/>
<id>urn:sha1:38678e5df55b77e98b47e7847faf83451eb161de</id>
<content type='text'>
The userdiff structures may be initialized either statically on the
stack or dynamically via configuration keys. In the latter case we end
up leaking memory because we didn't have any infrastructure to discern
those strings which have been allocated statically and those which have
been allocated dynamically.

Refactor the code such that we have two pointers for each of these
strings: one that holds the value as accessed by other subsystems, and
one that points to the same string in case it has been allocated. Like
this, we can safely free the second pointer and thus plug those memory
leaks.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/diff-exit-code-with-external-diff'</title>
<updated>2024-06-20T22:45:08Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-06-20T22:45:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8ba7dbdefbc274628b6bdc0a4cb573c2fa08b2cf'/>
<id>urn:sha1:8ba7dbdefbc274628b6bdc0a4cb573c2fa08b2cf</id>
<content type='text'>
"git diff --exit-code --ext-diff" learned to take the exit status
of the external diff driver into account when deciding the exit
status of the overall "git diff" invocation when configured to do
so.

* rs/diff-exit-code-with-external-diff:
  diff: let external diffs report that changes are uninteresting
  userdiff: add and use struct external_diff
  t4020: test exit code with external diffs
</content>
</entry>
<entry>
<title>diff: let external diffs report that changes are uninteresting</title>
<updated>2024-06-10T16:20:46Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-06-09T07:41:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d7b97b7185521e3b9364b3abc6553df2480da173'/>
<id>urn:sha1:d7b97b7185521e3b9364b3abc6553df2480da173</id>
<content type='text'>
The options --exit-code and --quiet instruct git diff to indicate
whether it found any significant changes by exiting with code 1 if it
did and 0 if there were none.  Currently this doesn't work if external
diff programs are involved, as we have no way to learn what they found.

Add that ability in the form of the new configuration options
diff.trustExitCode and diff.&lt;driver&gt;.trustExitCode and the environment
variable GIT_EXTERNAL_DIFF_TRUST_EXIT_CODE.  They pair with the config
options diff.external and diff.&lt;driver&gt;.command and the environment
variable GIT_EXTERNAL_DIFF, respectively.

The new options are off by default, keeping the old behavior.  Enabling
them indicates that the external diff returns exit code 1 if it finds
significant changes and 0 if it doesn't, like diff(1).

The name of the new options is taken from the git difftool and mergetool
options of similar purpose.  (There they enable passing on the exit code
of a diff tool and to infer whether a merge done by a merge tool is
successful.)

The new feature sets the diff flag diff_from_contents in
diff_setup_done() if we need the exit code and are allowed to call
external diffs.  This disables the optimization that avoids calling the
program with --quiet.  Add it back by skipping the call if the external
diff is not able to report empty diffs.  We can only do that check after
evaluating the file-specific attributes in run_external_diff().

If we do run the external diff with --quiet, send its output to
/dev/null.

I considered checking the output of the external diff to check whether
its empty.  It was added as 11be65cfa4 (diff: fix --exit-code with
external diff, 2024-05-05) and quickly reverted, as it does not work
with external diffs that do not write to stdout.  There's no reason why
a graphical diff tool would even need to write anything there at all.

I also considered using a non-zero exit code for empty diffs, which
could be done without adding new configuration options.  We'd need to
disable the optimization that allows git diff --quiet to skip calling
external diffs, though -- that might be quite surprising if graphical
diff programs are involved.  And assigning the opposite meaning of the
exit codes compared to diff(1) and git diff --exit-code to the external
diff can cause unnecessary confusion.

Suggested-by: Phillip Wood &lt;phillip.wood123@gmail.com&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: add and use struct external_diff</title>
<updated>2024-06-10T16:19:20Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-06-09T07:39:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=54443bbfc38d0252b31c821fea77320fcf0fe277'/>
<id>urn:sha1:54443bbfc38d0252b31c821fea77320fcf0fe277</id>
<content type='text'>
Wrap the string specifying the external diff command in a new struct to
simplify adding attributes, which the next patch will do.

Make sure external_diff() still returns NULL if neither the environment
variable GIT_EXTERNAL_DIFF nor the configuration option diff.external is
set, to continue allowing its use in a boolean context.

Use a designated initializer for the default builtin userdiff driver to
adjust to the type change of the second struct member.  Spelling out
only the non-zero members improves readability as a nice side-effect.

No functional change intended.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: convert intentionally-leaking config strings to consts</title>
<updated>2024-06-07T17:30:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-07T06:37:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c113c5df7911bf7bc6a4542131ac5bf983532a97'/>
<id>urn:sha1:c113c5df7911bf7bc6a4542131ac5bf983532a97</id>
<content type='text'>
There are multiple cases where we intentionally leak config strings:

  - `struct gpg_format` is used to track programs that can be used for
    signing commits, either via gpg(1), gpgsm(1) or ssh-keygen(1). The
    user can override the commands via several config variables. As the
    array is populated once, only, and the struct memers are never
    written to or free'd.

  - `struct ll_merge_driver` is used to track merge drivers. Same as
    with the GPG format, these drivers are populated once and then
    reused. Its data is never written to or free'd, either.

  - `struct userdiff_funcname` and `struct userdiff_driver` can be
    configured via `diff.&lt;driver&gt;.*` to add additional drivers. Again,
    these have a global lifetime and are never written to or free'd.

All of these are intentionally kept alive and are never written to.
Furthermore, all of these are being assigned both string constants in
some places, and allocated strings in other places. This will cause
warnings once we enable `-Wwrite-strings`, so let's mark the respective
fields as `const char *` and cast away the constness when assigning
those values.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: clarify memory ownership in `git_config_string()`</title>
<updated>2024-05-27T18:20:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-27T11:46:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1b261c20ed28ad26ddbcd3dff94a248ac6866ac8'/>
<id>urn:sha1:1b261c20ed28ad26ddbcd3dff94a248ac6866ac8</id>
<content type='text'>
The out parameter of `git_config_string()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/userdiff-multibyte-regex'</title>
<updated>2023-04-20T21:33:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-20T21:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cbfe844aa1b6b0b9513f2ae2fc3d18ff3dd385e6'/>
<id>urn:sha1:cbfe844aa1b6b0b9513f2ae2fc3d18ff3dd385e6</id>
<content type='text'>
The userdiff regexp patterns for various filetypes that are built
into the system have been updated to avoid triggering regexp errors
from UTF-8 aware regex engines.

* rs/userdiff-multibyte-regex:
  userdiff: support regexec(3) with multi-byte support
</content>
</entry>
<entry>
<title>userdiff: support regexec(3) with multi-byte support</title>
<updated>2023-04-07T14:38:09Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2023-04-06T20:19:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=be391449542d8a67cb343ec9d0b2f6854d665354'/>
<id>urn:sha1:be391449542d8a67cb343ec9d0b2f6854d665354</id>
<content type='text'>
Since 1819ad327b (grep: fix multibyte regex handling under macOS,
2022-08-26) we use the system library for all regular expression
matching on macOS, not just for git grep.  It supports multi-byte
strings and rejects invalid multi-byte characters.

This broke all built-in userdiff word regexes in UTF-8 locales because
they all include such invalid bytes in expressions that are intended to
match multi-byte characters without explicit support for that from the
regex engine.

"|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word
regexes to match a single non-space or multi-byte character.  The \xNN
characters are invalid if interpreted as UTF-8 because they have their
high bit set, which indicates they are part of a multi-byte character,
but they are surrounded by single-byte characters.

Replace that expression with "|[^[:space:]]" if the regex engine
supports multi-byte matching, as there is no need to have an explicit
range for multi-byte characters then.  Check for that capability at
runtime, because it depends on the locale and thus on environment
variables.  Construct the full replacement expression at build time
and just switch it in if necessary to avoid string manipulation and
allocations at runtime.

Additionally the word regex for tex contains the expression
"[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range.  The best
replacement with only valid characters that I can come up with is
"([a-zA-Z0-9]|[^\x01-\x7f])+".  Unlike the original it matches NUL
characters, though.  Assuming that tex files usually don't contain NUL
this should be acceptable.

Reported-by: D. Ben Knoble &lt;ben.knoble@gmail.com&gt;
Reported-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Helped-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: teach diff to read algorithm from diff driver</title>
<updated>2023-02-21T17:29:10Z</updated>
<author>
<name>John Cai</name>
<email>johncai86@gmail.com</email>
</author>
<published>2023-02-20T21:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4cf900ee734ce9bb73d57c5dfbb1da4a5a88bd3'/>
<id>urn:sha1:a4cf900ee734ce9bb73d57c5dfbb1da4a5a88bd3</id>
<content type='text'>
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.

The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.

Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes &amp; config combination, then
finally the diff.algorithm config.

To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: add and use for_each_userdiff_driver()</title>
<updated>2021-04-08T19:19:10Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-04-08T15:04:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f12fa9ee6c87efa8a926973bd203ef327686fb62'/>
<id>urn:sha1:f12fa9ee6c87efa8a926973bd203ef327686fb62</id>
<content type='text'>
Refactor the userdiff_find_by_namelen() function so that a new
for_each_userdiff_driver() API function does most of the work.

This will be useful for the same reason we've got other for_each_*()
API functions as part of various APIs, and will be used in a follow-up
commit.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
