<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/userdiff.c, branch v2.41.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.41.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-05-06T21:34:09Z</updated>
<entry>
<title>attr: teach "--attr-source=&lt;tree&gt;" global option to "git"</title>
<updated>2023-05-06T21:34:09Z</updated>
<author>
<name>John Cai</name>
<email>johncai86@gmail.com</email>
</author>
<published>2023-05-06T04:15:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=44451a2e5eec5360378be23e2cdbd9ecee49e14e'/>
<id>urn:sha1:44451a2e5eec5360378be23e2cdbd9ecee49e14e</id>
<content type='text'>
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=&lt;tree&gt;" option to
allow it to read attribute files from a tree-ish, but did so only
for the command.  Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.

Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=&lt;tree&gt;" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.

Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/userdiff-multibyte-regex'</title>
<updated>2023-04-20T21:33:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-20T21:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cbfe844aa1b6b0b9513f2ae2fc3d18ff3dd385e6'/>
<id>urn:sha1:cbfe844aa1b6b0b9513f2ae2fc3d18ff3dd385e6</id>
<content type='text'>
The userdiff regexp patterns for various filetypes that are built
into the system have been updated to avoid triggering regexp errors
from UTF-8 aware regex engines.

* rs/userdiff-multibyte-regex:
  userdiff: support regexec(3) with multi-byte support
</content>
</entry>
<entry>
<title>userdiff: support regexec(3) with multi-byte support</title>
<updated>2023-04-07T14:38:09Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2023-04-06T20:19:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=be391449542d8a67cb343ec9d0b2f6854d665354'/>
<id>urn:sha1:be391449542d8a67cb343ec9d0b2f6854d665354</id>
<content type='text'>
Since 1819ad327b (grep: fix multibyte regex handling under macOS,
2022-08-26) we use the system library for all regular expression
matching on macOS, not just for git grep.  It supports multi-byte
strings and rejects invalid multi-byte characters.

This broke all built-in userdiff word regexes in UTF-8 locales because
they all include such invalid bytes in expressions that are intended to
match multi-byte characters without explicit support for that from the
regex engine.

"|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word
regexes to match a single non-space or multi-byte character.  The \xNN
characters are invalid if interpreted as UTF-8 because they have their
high bit set, which indicates they are part of a multi-byte character,
but they are surrounded by single-byte characters.

Replace that expression with "|[^[:space:]]" if the regex engine
supports multi-byte matching, as there is no need to have an explicit
range for multi-byte characters then.  Check for that capability at
runtime, because it depends on the locale and thus on environment
variables.  Construct the full replacement expression at build time
and just switch it in if necessary to avoid string manipulation and
allocations at runtime.

Additionally the word regex for tex contains the expression
"[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range.  The best
replacement with only valid characters that I can come up with is
"([a-zA-Z0-9]|[^\x01-\x7f])+".  Unlike the original it matches NUL
characters, though.  Assuming that tex files usually don't contain NUL
this should be acceptable.

Reported-by: D. Ben Knoble &lt;ben.knoble@gmail.com&gt;
Reported-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Helped-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/header-cleanup'</title>
<updated>2023-03-17T21:03:09Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-03-17T21:03:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=88cc8ed8bc7d4bc9521b426e95ae2a38d3aec13a'/>
<id>urn:sha1:88cc8ed8bc7d4bc9521b426e95ae2a38d3aec13a</id>
<content type='text'>
Code clean-up to clarify the rule that "git-compat-util.h" must be
the first to be included.

* en/header-cleanup:
  diff.h: remove unnecessary include of object.h
  Remove unnecessary includes of builtin.h
  treewide: replace cache.h with more direct headers, where possible
  replace-object.h: move read_replace_refs declaration from cache.h to here
  object-store.h: move struct object_info from cache.h
  dir.h: refactor to no longer need to include cache.h
  object.h: stop depending on cache.h; make cache.h depend on object.h
  ident.h: move ident-related declarations out of cache.h
  pretty.h: move has_non_ascii() declaration from commit.h
  cache.h: remove dependence on hex.h; make other files include it explicitly
  hex.h: move some hex-related declarations from cache.h
  hash.h: move some oid-related declarations from cache.h
  alloc.h: move ALLOC_GROW() functions from cache.h
  treewide: remove unnecessary cache.h includes in source files
  treewide: remove unnecessary cache.h includes
  treewide: remove unnecessary git-compat-util.h includes in headers
  treewide: ensure one of the appropriate headers is sourced first
</content>
</entry>
<entry>
<title>Merge branch 'jc/diff-algo-attribute'</title>
<updated>2023-02-27T18:08:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-02-27T18:08:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ece8dc97ae53d08247aa283b6f299e3e5b2669db'/>
<id>urn:sha1:ece8dc97ae53d08247aa283b6f299e3e5b2669db</id>
<content type='text'>
The "diff" drivers specified by the "diff" attribute attached to
paths can now specify which algorithm (e.g. histogram) to use.

* jc/diff-algo-attribute:
  diff: teach diff to read algorithm from diff driver
  diff: consolidate diff algorithm option parsing
</content>
</entry>
<entry>
<title>alloc.h: move ALLOC_GROW() functions from cache.h</title>
<updated>2023-02-24T01:25:28Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-02-24T00:09:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=36bf19589055fb71aac0ed6719dfe5b385adc2bf'/>
<id>urn:sha1:36bf19589055fb71aac0ed6719dfe5b385adc2bf</id>
<content type='text'>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places.  It does mean that we also need to add
includes of alloc.h in a number of C files.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: teach diff to read algorithm from diff driver</title>
<updated>2023-02-21T17:29:10Z</updated>
<author>
<name>John Cai</name>
<email>johncai86@gmail.com</email>
</author>
<published>2023-02-20T21:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4cf900ee734ce9bb73d57c5dfbb1da4a5a88bd3'/>
<id>urn:sha1:a4cf900ee734ce9bb73d57c5dfbb1da4a5a88bd3</id>
<content type='text'>
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.

The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.

Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes &amp; config combination, then
finally the diff.algorithm config.

To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: support Java sealed classes</title>
<updated>2023-02-08T20:57:13Z</updated>
<author>
<name>Andrei Rybak</name>
<email>rybak.a.v@gmail.com</email>
</author>
<published>2023-02-07T23:42:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=93d52ed050f5613897b73e75961df5c589d63a4b'/>
<id>urn:sha1:93d52ed050f5613897b73e75961df5c589d63a4b</id>
<content type='text'>
A new kind of class was added in Java 17 -- sealed classes.[1]  This
feature includes several new keywords that may appear in a declaration
of a class.  New modifiers before name of the class: "sealed" and
"non-sealed", and a clause after name of the class marked by keyword
"permits".

The current set of regular expressions in userdiff.c already allows the
modifier "sealed" and the "permits" clause, but not the modifier
"non-sealed", which is the first hyphenated keyword in Java.[2]  Allow
hyphen in the words that precede the name of type to match the
"non-sealed" modifier.

In new input file "java-sealed" for the test t4018-diff-funcname.sh, use
a Java code comment for the marker "RIGHT".  This workaround is needed,
because the name of the sealed class appears on the line of code that
has the "ChangeMe" marker.

[1] Detailed description in "JEP 409: Sealed Classes"
    https://openjdk.org/jeps/409
[2] "JEP draft: Keyword Management for the Java Language"
    https://openjdk.org/jeps/8223002

Signed-off-by: Andrei Rybak &lt;rybak.a.v@gmail.com&gt;
Reviewed-by: Johannes Sixt &lt;j6t@kdbg.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: support Java record types</title>
<updated>2023-02-08T20:57:11Z</updated>
<author>
<name>Andrei Rybak</name>
<email>rybak.a.v@gmail.com</email>
</author>
<published>2023-02-07T23:42:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=575e6fcfcc961a64a222e0241cdc117d24f9ec87'/>
<id>urn:sha1:575e6fcfcc961a64a222e0241cdc117d24f9ec87</id>
<content type='text'>
A new kind of class was added in Java 16 -- records.[1]  The syntax of
records is similar to regular classes with one important distinction:
the name of the record class is followed by a mandatory list of
components.  The list is enclosed in parentheses, it may be empty, and
it may immediately follow the name of the class or type parameters, if
any, with or without separating whitespace.  For example:

    public record Example(int i, String s) {
    }

    public record WithTypeParameters&lt;A, B&gt;(A a, B b, String s) {
    }

    record SpaceBeforeComponents (String comp1, int comp2) {
    }

Support records in the builtin userdiff pattern for Java.  Add "record"
to the alternatives of keywords for kinds of class.

Allowing matching various possibilities for the type parameters and/or
list of the components of a record has already been covered by the
preceding patch.

[1] detailed description is available in "JEP 395: Records"
    https://openjdk.org/jeps/395

Signed-off-by: Andrei Rybak &lt;rybak.a.v@gmail.com&gt;
Reviewed-by: Johannes Sixt &lt;j6t@kdbg.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: support Java type parameters</title>
<updated>2023-02-08T20:56:57Z</updated>
<author>
<name>Andrei Rybak</name>
<email>rybak.a.v@gmail.com</email>
</author>
<published>2023-02-07T23:42:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=39226a8dacc866417be19b0a95b45e82d5975a84'/>
<id>urn:sha1:39226a8dacc866417be19b0a95b45e82d5975a84</id>
<content type='text'>
A class or interface in Java can have type parameters following the name
in the declared type, surrounded by angle brackets (paired less than and
greater than signs).[2]   The type parameters -- `A` and `B` in the
examples -- may follow the class name immediately:

    public class ParameterizedClass&lt;A, B&gt; {
    }

or may be separated by whitespace:

    public class SpaceBeforeTypeParameters &lt;A, B&gt; {
    }

A part of the builtin userdiff pattern for Java matches declarations of
classes, enums, and interfaces.  The regular expression requires at
least one whitespace character after the name of the declared type.
This disallows matching for opening angle bracket of type parameters
immediately after the name of the type.  Mandatory whitespace after the
name of the type also disallows using the pattern in repositories with a
fairly common code style that puts braces for the body of a class on
separate lines:

    class WithLineBreakBeforeOpeningBrace
    {
    }

Support matching Java code in more diverse code styles and declarations
of classes and interfaces with type parameters immediately following the
name of the type in the builtin userdiff pattern for Java.  Do so by
just matching anything until the end of the line after the keywords for
the kind of type being declared.

[1] Since Java 5 released in 2004.
[2] Detailed description is available in the Java Language
    Specification, sections "Type Variables" and "Parameterized Types":
    https://docs.oracle.com/javase/specs/jls/se17/html/jls-4.html#jls-4.4

Signed-off-by: Andrei Rybak &lt;rybak.a.v@gmail.com&gt;
Reviewed-by: Johannes Sixt &lt;j6t@kdbg.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
