<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/userdiff.h, branch v2.41.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.41.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-04-20T21:33:35Z</updated>
<entry>
<title>Merge branch 'rs/userdiff-multibyte-regex'</title>
<updated>2023-04-20T21:33:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-20T21:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cbfe844aa1b6b0b9513f2ae2fc3d18ff3dd385e6'/>
<id>urn:sha1:cbfe844aa1b6b0b9513f2ae2fc3d18ff3dd385e6</id>
<content type='text'>
The userdiff regexp patterns for various filetypes that are built
into the system have been updated to avoid triggering regexp errors
from UTF-8 aware regex engines.

* rs/userdiff-multibyte-regex:
  userdiff: support regexec(3) with multi-byte support
</content>
</entry>
<entry>
<title>userdiff: support regexec(3) with multi-byte support</title>
<updated>2023-04-07T14:38:09Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2023-04-06T20:19:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=be391449542d8a67cb343ec9d0b2f6854d665354'/>
<id>urn:sha1:be391449542d8a67cb343ec9d0b2f6854d665354</id>
<content type='text'>
Since 1819ad327b (grep: fix multibyte regex handling under macOS,
2022-08-26) we use the system library for all regular expression
matching on macOS, not just for git grep.  It supports multi-byte
strings and rejects invalid multi-byte characters.

This broke all built-in userdiff word regexes in UTF-8 locales because
they all include such invalid bytes in expressions that are intended to
match multi-byte characters without explicit support for that from the
regex engine.

"|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word
regexes to match a single non-space or multi-byte character.  The \xNN
characters are invalid if interpreted as UTF-8 because they have their
high bit set, which indicates they are part of a multi-byte character,
but they are surrounded by single-byte characters.

Replace that expression with "|[^[:space:]]" if the regex engine
supports multi-byte matching, as there is no need to have an explicit
range for multi-byte characters then.  Check for that capability at
runtime, because it depends on the locale and thus on environment
variables.  Construct the full replacement expression at build time
and just switch it in if necessary to avoid string manipulation and
allocations at runtime.

Additionally the word regex for tex contains the expression
"[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range.  The best
replacement with only valid characters that I can come up with is
"([a-zA-Z0-9]|[^\x01-\x7f])+".  Unlike the original it matches NUL
characters, though.  Assuming that tex files usually don't contain NUL
this should be acceptable.

Reported-by: D. Ben Knoble &lt;ben.knoble@gmail.com&gt;
Reported-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Helped-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: teach diff to read algorithm from diff driver</title>
<updated>2023-02-21T17:29:10Z</updated>
<author>
<name>John Cai</name>
<email>johncai86@gmail.com</email>
</author>
<published>2023-02-20T21:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4cf900ee734ce9bb73d57c5dfbb1da4a5a88bd3'/>
<id>urn:sha1:a4cf900ee734ce9bb73d57c5dfbb1da4a5a88bd3</id>
<content type='text'>
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.

The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.

Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes &amp; config combination, then
finally the diff.algorithm config.

To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: add and use for_each_userdiff_driver()</title>
<updated>2021-04-08T19:19:10Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-04-08T15:04:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f12fa9ee6c87efa8a926973bd203ef327686fb62'/>
<id>urn:sha1:f12fa9ee6c87efa8a926973bd203ef327686fb62</id>
<content type='text'>
Refactor the userdiff_find_by_namelen() function so that a new
for_each_userdiff_driver() API function does most of the work.

This will be useful for the same reason we've got other for_each_*()
API functions as part of various APIs, and will be used in a follow-up
commit.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>notes-cache.c: remove the_repository references</title>
<updated>2018-11-12T05:50:06Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-11-10T05:49:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bd7ad45b64bf8b5a5256bd1eeb3907f8516244a8'/>
<id>urn:sha1:bd7ad45b64bf8b5a5256bd1eeb3907f8516244a8</id>
<content type='text'>
Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff.c: remove implicit dependency on the_index</title>
<updated>2018-09-21T16:50:58Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-09-21T15:57:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=acd00ea04998ce469d1775c658134b097e18f5a3'/>
<id>urn:sha1:acd00ea04998ce469d1775c658134b097e18f5a3</id>
<content type='text'>
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]

Helped-by: Ramsay Jones &lt;ramsay@ramsayjones.plus.com&gt;
Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: clarify textconv interface</title>
<updated>2016-02-22T18:40:35Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2016-02-22T18:28:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a64e6a44c63a965c5bc26242ddd3ed049b42e117'/>
<id>urn:sha1:a64e6a44c63a965c5bc26242ddd3ed049b42e117</id>
<content type='text'>
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.

Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refactor get_textconv to not require diff_filespec</title>
<updated>2011-05-23T22:46:02Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2011-05-23T20:30:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3813e69031d2df2702f50b9649fa2e40ea11e558'/>
<id>urn:sha1:3813e69031d2df2702f50b9649fa2e40ea11e558</id>
<content type='text'>
This function actually does two things:

  1. Load the userdiff driver for the filespec.

  2. Decide whether the driver has a textconv component, and
     initialize the textconv cache if applicable.

Only part (1) requires the filespec object, and some callers
may not have a filespec at all. So let's split them it into
two functions, and put part (2) with the userdiff code,
which is a better fit.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: cache textconv output</title>
<updated>2010-04-02T07:05:31Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2010-04-02T00:12:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d9bae1a178f0f8b198ea611e874975214ad6f990'/>
<id>urn:sha1:d9bae1a178f0f8b198ea611e874975214ad6f990</id>
<content type='text'>
Running a textconv filter can take a long time. It's
particularly bad for a large file which needs to be spooled
to disk, but even for small files, the fork+exec overhead
can add up for something like "git log -p".

This patch uses the notes-cache mechanism to keep a fast
cache of textconv output. Caches are stored in
refs/notes/textconv/$x, where $x is the userdiff driver
defined in gitattributes.

Caching is enabled only if diff.$x.cachetextconv is true.

In my test repo, on a commit with 45 jpg and avi files
changed and a textconv to show their exif tags:

  [before]
  $ time git show &gt;/dev/null
  real    0m13.724s
  user    0m12.057s
  sys     0m1.624s

  [after, first run]
  $ git config diff.mfo.cachetextconv true
  $ time git show &gt;/dev/null
  real    0m14.252s
  user    0m12.197s
  sys     0m1.800s

  [after, subsequent runs]
  $ time git show &gt;/dev/null
  real    0m0.352s
  user    0m0.148s
  sys     0m0.200s

So for a slight (3.8%) cost on the first run, we achieve an
almost 40x speed up on subsequent runs.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>color-words: make regex configurable via attributes</title>
<updated>2009-01-17T18:44:21Z</updated>
<author>
<name>Thomas Rast</name>
<email>trast@student.ethz.ch</email>
</author>
<published>2009-01-17T16:29:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=80c49c3de2d5a3aa12b0980a65f1163c8aef0c16'/>
<id>urn:sha1:80c49c3de2d5a3aa12b0980a65f1163c8aef0c16</id>
<content type='text'>
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute.  The user can then set the
driver on a file in .gitattributes.  If a regex is given on the
command line, it overrides the driver's setting.

We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)

Signed-off-by: Thomas Rast &lt;trast@student.ethz.ch&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
