<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.c, branch v2.36.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.36.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.36.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-02-25T23:47:38Z</updated>
<entry>
<title>Merge branch 'rs/pcre-invalid-utf8-fix-fix'</title>
<updated>2022-02-25T23:47:38Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-02-25T23:47:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5c4f3804a73933504e76b6ebdd946a2db3b1d214'/>
<id>urn:sha1:5c4f3804a73933504e76b6ebdd946a2db3b1d214</id>
<content type='text'>
Workaround we have for versions of PCRE2 before their version 10.36
were in effect only for their versions newer than 10.36 by mistake,
which has been corrected.

* rs/pcre-invalid-utf8-fix-fix:
  grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround
</content>
</entry>
<entry>
<title>Merge branch 'ab/grep-patterntype'</title>
<updated>2022-02-25T23:47:36Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-02-25T23:47:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5b84280c65bef19bc3a810e17474c9861ea7ce89'/>
<id>urn:sha1:5b84280c65bef19bc3a810e17474c9861ea7ce89</id>
<content type='text'>
Some code clean-up in the "git grep" machinery.

* ab/grep-patterntype:
  grep: simplify config parsing and option parsing
  grep.c: do "if (bool &amp;&amp; memchr())" not "if (memchr() &amp;&amp; bool)"
  grep.h: make "grep_opt.pattern_type_option" use its enum
  grep API: call grep_config() after grep_init()
  grep.c: don't pass along NULL callback value
  built-ins: trust the "prefix" from run_builtin()
  grep tests: add missing "grep.patternType" config tests
  grep tests: create a helper function for "BRE" or "ERE"
  log tests: check if grep_config() is called by "log"-like cmds
  grep.h: remove unused "regex_t regexp" from grep_opt
</content>
</entry>
<entry>
<title>grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround</title>
<updated>2022-02-17T22:56:56Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2022-02-17T21:14:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=97169fc361fd3ab2eedbfedd6c13806433be4036'/>
<id>urn:sha1:97169fc361fd3ab2eedbfedd6c13806433be4036</id>
<content type='text'>
PCRE2 bug 2642 was fixed in version 10.36.  Our 95ca1f987e (grep/pcre2:
better support invalid UTF-8 haystacks, 2021-01-24) worked around it on
older versions by setting the flag PCRE2_NO_START_OPTIMIZE.  797c359978
(grep/pcre2: use compile-time PCREv2 version test, 2021-02-18) switched
it around to set the flag on 10.36 and higher instead, while it claimed
to use "the same test done at compile-time".

Switch the condition back to apply the workaround on PCRE2 versions
_before_ 10.36.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: simplify config parsing and option parsing</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04bf052eef53c6be04d313d8ce11690beaf890b6'/>
<id>urn:sha1:04bf052eef53c6be04d313d8ce11690beaf890b6</id>
<content type='text'>
Simplify the parsing of "grep.patternType" and
"grep.extendedRegexp". This changes no behavior, but gets rid of
complex parsing logic that isn't needed anymore.

When "grep.patternType" was introduced in 84befcd0a4a (grep: add a
grep.patternType configuration setting, 2012-08-03) we promised that:

 1. You can set "grep.patternType", and "[setting it to] 'default'
    will return to the default matching behavior".

    In that context "the default" meant whatever the configuration
    system specified before that change, i.e. via grep.extendedRegexp.

 2. We'd support the existing "grep.extendedRegexp" option, but ignore
    it when the new "grep.patternType" option is set. We said we'd
    only ignore the older "grep.extendedRegexp" option "when the
    `grep.patternType` option is set to a value other than
    'default'".

In a preceding commit we changed grep_config() to be called after
grep_init(), which means that much of the complexity here can go
away.

As before both "grep.patternType" and "grep.extendedRegexp" are
last-one-wins variable, with "grep.extendedRegexp" yielding to
"grep.patternType", except when "grep.patternType=default".

Note that as the previously added tests indicate this cannot be done
on-the-fly as we see the config variables, without introducing more
state keeping. I.e. if we see:

    -c grep.extendedRegexp=false
    -c grep.patternType=default
    -c extendedRegexp=true

We need to select ERE, since grep.patternType=default unselects that
variable, which normally has higher precedence, but we also need to
select BRE in cases of:

    -c grep.extendedRegexp=true \
    -c grep.extendedRegexp=false

Which would not be the case for this, which select ERE:

    -c grep.patternType=extended \
    -c grep.extendedRegexp=false

Therefore we cannot do this on-the-fly in grep_config without also
introducing tracking variables for not only the pattern type, but what
the source of that pattern type was.

So we need to decide on the pattern after our config was fully
parsed. Let's do that by deferring the decision on the pattern type
until it's time to compile it in compile_regexp().

By that time we've not only parsed the config, but also handled the
command-line options. Those will set "opt.pattern_type_option" (*not*
"opt.extended_regexp_option"!).

At that point all we need to do is see if "grep.patternType" was
UNSPECIFIED in the end (including an explicit "=default"), if so we'll
use the "grep.extendedRegexp" configuration, if any.

See my 07a3d411739 (grep: remove regflags from the public grep_opt
API, 2017-06-29) for addition of the two comments being removed here,
i.e. the complexity noted in that commit is now going away.

1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep.c: do "if (bool &amp;&amp; memchr())" not "if (memchr() &amp;&amp; bool)"</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ae807d778f502c81ec55897ac6d1f42ef3a4e23b'/>
<id>urn:sha1:ae807d778f502c81ec55897ac6d1f42ef3a4e23b</id>
<content type='text'>
Change code in compile_regexp() to check the cheaper boolean
"!opt-&gt;pcre2" condition before the "memchr()" search.

This doesn't noticeably optimize anything, but makes the code more
obvious and conventional. The line wrapping being added here also
makes a subsequent commit smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep API: call grep_config() after grep_init()</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=72365bb49923da065e7a43e61a912ef17f143c7f'/>
<id>urn:sha1:72365bb49923da065e7a43e61a912ef17f143c7f</id>
<content type='text'>
The grep_init() function used the odd pattern of initializing the
passed-in "struct grep_opt" with a statically defined "grep_defaults"
struct, which would be modified in-place when we invoked
grep_config().

So we effectively (b) initialized config, (a) then defaults, (c)
followed by user options. Usually those are ordered as "a", "b" and
"c" instead.

As the comments being removed here show the previous behavior needed
to be carefully explained as we'd potentially share the populated
configuration among different instances of grep_init(). In practice we
didn't do that, but now that it can't be a concern anymore let's
remove those comments.

This does not change the behavior of any of the configuration
variables or options. That would have been the case if we didn't move
around the grep_config() call in "builtin/log.c". But now that we call
"grep_config" after "git_log_config" and "git_format_config" we'll
need to pass in the already initialized "struct grep_opt *".

See 6ba9bb76e02 (grep: copy struct in one fell swoop, 2020-11-29) and
7687a0541e0 (grep: move the configuration parsing logic to grep.[ch],
2012-10-09) for the commits that added the comments.

The memcpy() pattern here will be optimized away and follows the
convention of other *_init() functions. See 5726a6b4012 (*.c *_init():
define in terms of corresponding *_INIT macro, 2021-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>built-ins: trust the "prefix" from run_builtin()</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9725c8dda20dc9bc02b552a2333963c8cb834d1d'/>
<id>urn:sha1:9725c8dda20dc9bc02b552a2333963c8cb834d1d</id>
<content type='text'>
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the
"prefix" passed from "run_builtin()". The "prefix" we get from setup.c
is either going to be NULL or a string of length &gt;0, never "".

So we can drop the "prefix &amp;&amp; *prefix" checks added for
"builtin/grep.c" in 0d042fecf2f (git-grep: show pathnames relative to
the current directory, 2006-08-11), and for "builtin/ls-tree.c" in
a69dd585fca (ls-tree: chomp leading directories when run from a
subdirectory, 2005-12-23).

As seen in code in revision.c that was added in cd676a51367 (diff
--relative: output paths as relative to the current subdirectory,
2008-02-12) we already have existing code that does away with this
assertion.

This makes it easier to reason about a subsequent change to the
"prefix_length" code in grep.c in a subsequent commit, and since we're
going to the trouble of doing that let's leave behind an assert() to
promise this to any future callers.

For "builtin/grep.c" it would be painful to pass the "prefix" down the
callchain of:

    cmd_grep -&gt; grep_tree -&gt; grep_submodule -&gt; grep_cache -&gt; grep_oid -&gt;
    grep_source_name

So for the code that needs it in grep_source_name() let's add a
"grep_prefix" variable similar to the existing "ls_tree_prefix".

While at it let's move the code in cmd_ls_tree() around so that we
assign to the "ls_tree_prefix" right after declaring the variables,
and stop assigning to "prefix". We only subsequently used that
variable later in the function after clobbering it. Let's just use our
own "grep_prefix" instead.

Let's also add an assert() in git.c, so that we'll make this promise
about the "prefix" to any current and future callers, as well as to
any readers of the code.

Code history:

 * The strlen() in "grep.c" hasn't been used since 493b7a08d80 (grep:
   accept relative paths outside current working directory, 2009-09-05).

   When that code was added in 0d042fecf2f (git-grep: show pathnames
   relative to the current directory, 2006-08-11) we used the length.

   But since 493b7a08d80 we haven't used it for anything except a
   boolean check that we could have done on the "prefix" member
   itself.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/grep-expr-cleanup'</title>
<updated>2022-02-05T17:42:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-02-05T17:42:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d0bb19cbf7ae9a077f67a97d786482313c3c3f0b'/>
<id>urn:sha1:d0bb19cbf7ae9a077f67a97d786482313c3c3f0b</id>
<content type='text'>
Code clean-up.

* rs/grep-expr-cleanup:
  grep: use grep_and_expr() in compile_pattern_and()
  grep: extract grep_binexp() from grep_or_expr()
  grep: use grep_not_expr() in compile_pattern_not()
  grep: use grep_or_expr() in compile_pattern_or()
</content>
</entry>
<entry>
<title>Merge branch 'lh/use-gnu-color-in-grep'</title>
<updated>2022-01-10T19:52:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-01-10T19:52:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c0450ca09864baae1cd80b746500aaef2eeda956'/>
<id>urn:sha1:c0450ca09864baae1cd80b746500aaef2eeda956</id>
<content type='text'>
The color palette used by "git grep" has been updated to match that
of GNU grep.

* lh/use-gnu-color-in-grep:
  grep: align default colors with GNU grep ones
</content>
</entry>
<entry>
<title>grep: use grep_and_expr() in compile_pattern_and()</title>
<updated>2022-01-06T22:15:33Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-01-06T19:50:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0a6adc26e2efd2dcfb3e65407623287e08989a63'/>
<id>urn:sha1:0a6adc26e2efd2dcfb3e65407623287e08989a63</id>
<content type='text'>
In a similar spirit as a previous commit, use the new `grep_and_expr()`
to construct the AND node in `compile_pattern_and()`.

Unlike the aforementioned previous commit, this is not about code
duplication, since this is the only spot in grep.c where an AND node is
constructed. Rather, this is about visual consistency with the other
`compile_pattern_xyz()` functions.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
