<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.h, branch v2.41.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.41.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-03-23T18:19:34Z</updated>
<entry>
<title>grep: work around UTF-8 related JIT bug in PCRE2 &lt;= 10.34</title>
<updated>2023-03-23T18:19:34Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@grsecurity.net</email>
</author>
<published>2023-03-23T17:25:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=14b9a044798ebb3858a1f1a1377309a3d6054ac8'/>
<id>urn:sha1:14b9a044798ebb3858a1f1a1377309a3d6054ac8</id>
<content type='text'>
Stephane is reporting[1] a regression introduced in git v2.40.0 that leads
to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an
older version of libpcre2 that triggers a wild pointer dereference in
the generated JIT code that was fixed in PCRE2 10.35.

Instead of completely disabling the JIT compiler for the buggy version,
just mask out the Unicode property handling as we used to do prior to
commit acabd2048ee0 ("grep: correctly identify utf-8 characters with
\{b,w} in -P").

[1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/

Reported-by: Stephane Odul &lt;stephane@clumio.com&gt;
Signed-off-by: Mathias Krause &lt;minipli@grsecurity.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/grep-simplify-extended-expression'</title>
<updated>2022-10-21T18:37:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-10-21T18:37:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91d3d7e6e2be4aad74c5c602ed4988fbb1d82960'/>
<id>urn:sha1:91d3d7e6e2be4aad74c5c602ed4988fbb1d82960</id>
<content type='text'>
Giving "--invert-grep" and "--all-match" without "--grep" to the
"git log" command resulted in an attempt to access grep pattern
expression structure that has not been allocated, which has been
corrected.

* ab/grep-simplify-extended-expression:
  grep.c: remove "extended" in favor of "pattern_expression", fix segfault
</content>
</entry>
<entry>
<title>grep.c: remove "extended" in favor of "pattern_expression", fix segfault</title>
<updated>2022-10-11T15:48:54Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-10-11T09:48:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=db84376f981c64a1577d17d99918b2ef65a07a11'/>
<id>urn:sha1:db84376f981c64a1577d17d99918b2ef65a07a11</id>
<content type='text'>
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching.,
2006-06-30) the "pattern_expression" member has been used for complex
queries (AND/OR...), with "pattern_list" being used for the simple OR
queries. Since then we've used both "pattern_expression" and its
associated boolean "extended" member to see if we have a complex
expression.

Since f41fb662f57 (revisions API: have release_revisions() release
"grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
we supplied options that were only used for "complex queries", but
didn't supply the query itself we'd set "opt-&gt;extended", but would
have a NULL "pattern_expression". As a result these would segfault as
we tried to call "free_grep_patterns()" from "release_revisions()":

	git -P log -1 --invert-grep
	git -P log -1 --all-match

The root cause of this is that we were conflating the state management
we needed in "compile_grep_patterns()" itself with whether or not we
had an "opt-&gt;pattern_expression" later on.

In this cases as we're going through "compile_grep_patterns()" we have
no "opt-&gt;pattern_list" but have "opt-&gt;no_body_match" or
"opt-&gt;all_match". So we'd set "opt-&gt;extended = 1", but not "return" on
"opt-&gt;extended" as that's an "else if" in the same "if" statement.

That behavior is intentional and required, as the common case is that
we have an "opt-&gt;pattern_list" that we're about to parse into the
"opt-&gt;pattern_expression".

But we don't need to keep track of this "extended" flag beyond the
state management in compile_grep_patterns() itself. It needs it, but
once we're out of that function we can rely on
"opt-&gt;pattern_expression" being non-NULL instead for using these
extended patterns.

As 79d3696cfb4 itself shows we've assumed that there's a one-to-one
mapping between the two since the very beginning. I.e. "match_line()"
would check "opt-&gt;extended" to see if it should call "match_expr()",
and the first thing we do in that function is assume that we have a
"opt-&gt;pattern_expression". We'd then call "match_expr_eval()", which
would have died if that "opt-&gt;pattern_expression" was NULL.

The "die" was added in c922b01f54c (grep: fix segfault when "git grep
'('" is given, 2009-04-27), and can now be removed as it's now clearly
unreachable. We still do the right thing in the case that prompted
that fix:

	git grep '('
	fatal: unmatched parenthesis

Arguably neither the "--invert-grep" option added in [1] nor the
earlier "--all-match" option added in [2] were intended to be used
stand-alone, and another approach[3] would be to error out in those
cases. But since we've been treating them as a NOOP when given without
--grep for a long time let's keep doing that.

We could also return in "free_pattern_expr()" if the argument is
non-NULL, as an alternative fix for this segfault does [4]. That would
be more elegant in making the "free_*()" function behave like
"free()", but it would also remove a sanity check: The
"free_pattern_expr()" function calls itself recursively, and only the
top-level is allowed to be NULL, let's not conflate those two
conditions.

1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31d (grep --all-match, 2006-09-27)
3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com

Reported-by: orygaw &lt;orygaw@protonmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: add --max-count command line option</title>
<updated>2022-06-22T20:23:29Z</updated>
<author>
<name>Carlos López</name>
<email>00xc@protonmail.com</email>
</author>
<published>2022-06-22T19:47:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68437ede53dccd1dea9e44e831a59de274d389de'/>
<id>urn:sha1:68437ede53dccd1dea9e44e831a59de274d389de</id>
<content type='text'>
This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: simplify config parsing and option parsing</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04bf052eef53c6be04d313d8ce11690beaf890b6'/>
<id>urn:sha1:04bf052eef53c6be04d313d8ce11690beaf890b6</id>
<content type='text'>
Simplify the parsing of "grep.patternType" and
"grep.extendedRegexp". This changes no behavior, but gets rid of
complex parsing logic that isn't needed anymore.

When "grep.patternType" was introduced in 84befcd0a4a (grep: add a
grep.patternType configuration setting, 2012-08-03) we promised that:

 1. You can set "grep.patternType", and "[setting it to] 'default'
    will return to the default matching behavior".

    In that context "the default" meant whatever the configuration
    system specified before that change, i.e. via grep.extendedRegexp.

 2. We'd support the existing "grep.extendedRegexp" option, but ignore
    it when the new "grep.patternType" option is set. We said we'd
    only ignore the older "grep.extendedRegexp" option "when the
    `grep.patternType` option is set to a value other than
    'default'".

In a preceding commit we changed grep_config() to be called after
grep_init(), which means that much of the complexity here can go
away.

As before both "grep.patternType" and "grep.extendedRegexp" are
last-one-wins variable, with "grep.extendedRegexp" yielding to
"grep.patternType", except when "grep.patternType=default".

Note that as the previously added tests indicate this cannot be done
on-the-fly as we see the config variables, without introducing more
state keeping. I.e. if we see:

    -c grep.extendedRegexp=false
    -c grep.patternType=default
    -c extendedRegexp=true

We need to select ERE, since grep.patternType=default unselects that
variable, which normally has higher precedence, but we also need to
select BRE in cases of:

    -c grep.extendedRegexp=true \
    -c grep.extendedRegexp=false

Which would not be the case for this, which select ERE:

    -c grep.patternType=extended \
    -c grep.extendedRegexp=false

Therefore we cannot do this on-the-fly in grep_config without also
introducing tracking variables for not only the pattern type, but what
the source of that pattern type was.

So we need to decide on the pattern after our config was fully
parsed. Let's do that by deferring the decision on the pattern type
until it's time to compile it in compile_regexp().

By that time we've not only parsed the config, but also handled the
command-line options. Those will set "opt.pattern_type_option" (*not*
"opt.extended_regexp_option"!).

At that point all we need to do is see if "grep.patternType" was
UNSPECIFIED in the end (including an explicit "=default"), if so we'll
use the "grep.extendedRegexp" configuration, if any.

See my 07a3d411739 (grep: remove regflags from the public grep_opt
API, 2017-06-29) for addition of the two comments being removed here,
i.e. the complexity noted in that commit is now going away.

1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep.h: make "grep_opt.pattern_type_option" use its enum</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=321ee43628c53d6050fb7fbc552332bab681f1a4'/>
<id>urn:sha1:321ee43628c53d6050fb7fbc552332bab681f1a4</id>
<content type='text'>
Change the "pattern_type_option" member of "struct grep_opt" to use
the enum type we use for it.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep API: call grep_config() after grep_init()</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=72365bb49923da065e7a43e61a912ef17f143c7f'/>
<id>urn:sha1:72365bb49923da065e7a43e61a912ef17f143c7f</id>
<content type='text'>
The grep_init() function used the odd pattern of initializing the
passed-in "struct grep_opt" with a statically defined "grep_defaults"
struct, which would be modified in-place when we invoked
grep_config().

So we effectively (b) initialized config, (a) then defaults, (c)
followed by user options. Usually those are ordered as "a", "b" and
"c" instead.

As the comments being removed here show the previous behavior needed
to be carefully explained as we'd potentially share the populated
configuration among different instances of grep_init(). In practice we
didn't do that, but now that it can't be a concern anymore let's
remove those comments.

This does not change the behavior of any of the configuration
variables or options. That would have been the case if we didn't move
around the grep_config() call in "builtin/log.c". But now that we call
"grep_config" after "git_log_config" and "git_format_config" we'll
need to pass in the already initialized "struct grep_opt *".

See 6ba9bb76e02 (grep: copy struct in one fell swoop, 2020-11-29) and
7687a0541e0 (grep: move the configuration parsing logic to grep.[ch],
2012-10-09) for the commits that added the comments.

The memcpy() pattern here will be optimized away and follows the
convention of other *_init() functions. See 5726a6b4012 (*.c *_init():
define in terms of corresponding *_INIT macro, 2021-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>built-ins: trust the "prefix" from run_builtin()</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9725c8dda20dc9bc02b552a2333963c8cb834d1d'/>
<id>urn:sha1:9725c8dda20dc9bc02b552a2333963c8cb834d1d</id>
<content type='text'>
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the
"prefix" passed from "run_builtin()". The "prefix" we get from setup.c
is either going to be NULL or a string of length &gt;0, never "".

So we can drop the "prefix &amp;&amp; *prefix" checks added for
"builtin/grep.c" in 0d042fecf2f (git-grep: show pathnames relative to
the current directory, 2006-08-11), and for "builtin/ls-tree.c" in
a69dd585fca (ls-tree: chomp leading directories when run from a
subdirectory, 2005-12-23).

As seen in code in revision.c that was added in cd676a51367 (diff
--relative: output paths as relative to the current subdirectory,
2008-02-12) we already have existing code that does away with this
assertion.

This makes it easier to reason about a subsequent change to the
"prefix_length" code in grep.c in a subsequent commit, and since we're
going to the trouble of doing that let's leave behind an assert() to
promise this to any future callers.

For "builtin/grep.c" it would be painful to pass the "prefix" down the
callchain of:

    cmd_grep -&gt; grep_tree -&gt; grep_submodule -&gt; grep_cache -&gt; grep_oid -&gt;
    grep_source_name

So for the code that needs it in grep_source_name() let's add a
"grep_prefix" variable similar to the existing "ls_tree_prefix".

While at it let's move the code in cmd_ls_tree() around so that we
assign to the "ls_tree_prefix" right after declaring the variables,
and stop assigning to "prefix". We only subsequently used that
variable later in the function after clobbering it. Let's just use our
own "grep_prefix" instead.

Let's also add an assert() in git.c, so that we'll make this promise
about the "prefix" to any current and future callers, as well as to
any readers of the code.

Code history:

 * The strlen() in "grep.c" hasn't been used since 493b7a08d80 (grep:
   accept relative paths outside current working directory, 2009-09-05).

   When that code was added in 0d042fecf2f (git-grep: show pathnames
   relative to the current directory, 2006-08-11) we used the length.

   But since 493b7a08d80 we haven't used it for anything except a
   boolean check that we could have done on the "prefix" member
   itself.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep.h: remove unused "regex_t regexp" from grep_opt</title>
<updated>2022-02-16T02:00:49Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=77e3f931ef76444d6b28d50ac4fbdc933a15f358'/>
<id>urn:sha1:77e3f931ef76444d6b28d50ac4fbdc933a15f358</id>
<content type='text'>
This "regex_t" in grep_opt has not been used since
f9b9faf6f8a (builtin-grep: allow more than one patterns., 2006-05-02),
we still use a "regex_t" for compiling regexes, but that's in the
"grep_pat" struct".

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>log: let --invert-grep only invert --grep</title>
<updated>2021-12-17T22:13:08Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2021-12-17T16:48:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=794c000267b7bd29024b56e282509a82b31e6fc8'/>
<id>urn:sha1:794c000267b7bd29024b56e282509a82b31e6fc8</id>
<content type='text'>
The option --invert-grep is documented to filter out commits whose
messages match the --grep filters.  However, it also affects the
header matches (--author, --committer), which is not intended.

Move the handling of that option to grep.c, as only the code there can
distinguish between matches in the header from those in the message
body.  If --invert-grep is given then enable extended expressions (not
the regex type, we just need git grep's --not to work), negate the body
patterns and check if any of them match by piggy-backing on the
collect_hits mechanism of grep_source_1().

Collecting the matches in struct grep_opt is a bit iffy, but with
"last_shown" we have a precedent for writing state information to that
struct.

Reported-by: Dotan Cohen &lt;dotancohen@gmail.com&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
