<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.c, branch v2.40.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.40.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.40.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-02-22T22:55:45Z</updated>
<entry>
<title>Merge branch 'ab/various-leak-fixes'</title>
<updated>2023-02-22T22:55:45Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-02-22T22:55:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=72972ea0b978f20b335339d18e497da617398967'/>
<id>urn:sha1:72972ea0b978f20b335339d18e497da617398967</id>
<content type='text'>
Leak fixes.

* ab/various-leak-fixes:
  push: free_refs() the "local_refs" in set_refspecs()
  push: refactor refspec_append_mapped() for subsequent leak-fix
  receive-pack: release the linked "struct command *" list
  grep API: plug memory leaks by freeing "header_list"
  grep.c: refactor free_grep_patterns()
  builtin/merge.c: free "&amp;buf" on "Your local changes..." error
  builtin/merge.c: use fixed strings, not "strbuf", fix leak
  show-branch: free() allocated "head" before return
  commit-graph: fix a parse_options_concat() leak
  http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}()
  http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main()
  worktree: fix a trivial leak in prune_worktrees()
  repack: fix leaks on error with "goto cleanup"
  name-rev: don't xstrdup() an already dup'd string
  various: add missing clear_pathspec(), fix leaks
  clone: use free() instead of UNLEAK()
  commit-graph: use free_commit_graph() instead of UNLEAK()
  bundle.c: don't leak the "args" in the "struct child_process"
  tests: mark tests as passing with SANITIZE=leak
</content>
</entry>
<entry>
<title>Merge branch 'cb/grep-fallback-failing-jit'</title>
<updated>2023-02-16T01:11:51Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-02-16T01:11:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=214242a6ab8a971fb1a8e80459b561bfcf93ae18'/>
<id>urn:sha1:214242a6ab8a971fb1a8e80459b561bfcf93ae18</id>
<content type='text'>
In an environment where dynamically generated code is prohibited to
run (e.g. SELinux), failure to JIT pcre patterns is expected.  Fall
back to interpreted execution in such a case.

* cb/grep-fallback-failing-jit:
  grep: fall back to interpreter if JIT memory allocation fails
</content>
</entry>
<entry>
<title>grep API: plug memory leaks by freeing "header_list"</title>
<updated>2023-02-06T23:34:39Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2023-02-06T23:07:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fb2ebe72a37423e7c375d933d3c277b8cc81efba'/>
<id>urn:sha1:fb2ebe72a37423e7c375d933d3c277b8cc81efba</id>
<content type='text'>
When the "header_list" struct member was added in [1], freeing this
field was neglected. Fix that now, so that commands like

	./git -P log -1 --color=always --author=A origin/master

will run leak-free.

1. 80235ba79ef ("log --author=me --grep=it" should find intersection,
   not union, 2010-01-17)

Helped-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep.c: refactor free_grep_patterns()</title>
<updated>2023-02-06T23:34:39Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2023-02-06T23:07:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=891c9965fbc05848fb66444274e39c7ae2c6f321'/>
<id>urn:sha1:891c9965fbc05848fb66444274e39c7ae2c6f321</id>
<content type='text'>
Refactor the free_grep_patterns() function to split out the freeing of
the "struct grep_pat" it contains. Right now we're only freeing the
"pattern_list", but we should be freeing another member of the same
type, which we'll do in the subsequent commit.

Let's also replace the "return" if we don't have an
"opt-&gt;pattern_expression" with a conditional call of
free_pattern_expr().

Before db84376f981 (grep.c: remove "extended" in favor of
"pattern_expression", fix segfault, 2022-10-11) the pattern here was:

	if (!x)
		return;
	free_pattern_expr(y);

While at it, instead of:

	if (!x)
		return;
	free_pattern_expr(x);

Let's instead do:

	if (x)
		free_pattern_expr(x);

This will make it easier to free additional members from
free_grep_patterns() in the future.

Helped-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: fall back to interpreter if JIT memory allocation fails</title>
<updated>2023-01-31T19:39:02Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@grsecurity.net</email>
</author>
<published>2023-01-31T18:56:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=50b6ad55b04d741eaa6121deed6876ff4fc28bc8'/>
<id>urn:sha1:50b6ad55b04d741eaa6121deed6876ff4fc28bc8</id>
<content type='text'>
Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT
enabled, the allocation of PCRE2's JIT rwx memory may be prohibited,
making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):

  [user@fedora git]$ git grep -c PCRE2_JIT
  grep.c:1

  [user@fedora git]$ # Enable SELinux's W^X policy
  [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem

  [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
  [user@fedora git]$ git grep -c PCRE2_JIT
  fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'

Instead of failing hard in this case and making 'git grep' unusable on
such systems, simply fall back to interpreter mode, leading to a much
better user experience.

As having a functional PCRE2 JIT compiler is a legitimate use case for
performance reasons, we'll only do the fallback if the supposedly
available JIT is found to be non-functional by attempting to JIT compile
a very simple pattern. If this fails, JIT is deemed to be non-functional
and we do the interpreter fallback. For all other cases, i.e. the simple
pattern can be compiled but the user provided cannot, we fail hard as we
do now as the reason for the failure must be the pattern itself. To aid
users in helping themselves change the error message to include a hint
about the '(*NO_JIT)' prefix. Also clip the pattern at 64 characters to
ensure the hint will be seen by the user and not internally truncated by
the die() function.

Cc: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Mathias Krause &lt;minipli@grsecurity.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: correctly identify utf-8 characters with \{b,w} in -P</title>
<updated>2023-01-18T23:24:52Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2023-01-08T15:52:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=acabd2048ee0ee53728100408970ab45a6dab65e'/>
<id>urn:sha1:acabd2048ee0ee53728100408970ab45a6dab65e</id>
<content type='text'>
When UTF is enabled for a PCRE match, the corresponding flags are
added to the pcre2_compile() call, but PCRE2_UCP wasn't included.

This prevents extending the meaning of the character classes to
include those new valid characters and therefore result in failed
matches for expressions that rely on that extention, for ex:

  $ git grep -P '\bÆvar'

Add PCRE2_UCP so that \w will include Æ and therefore \b could
correctly match the beginning of that word.

This has an impact on performance that has been estimated to be
between 20% to 40% and that is shown through the added performance
test.

Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Acked-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/grep-simplify-extended-expression'</title>
<updated>2022-10-21T18:37:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-10-21T18:37:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91d3d7e6e2be4aad74c5c602ed4988fbb1d82960'/>
<id>urn:sha1:91d3d7e6e2be4aad74c5c602ed4988fbb1d82960</id>
<content type='text'>
Giving "--invert-grep" and "--all-match" without "--grep" to the
"git log" command resulted in an attempt to access grep pattern
expression structure that has not been allocated, which has been
corrected.

* ab/grep-simplify-extended-expression:
  grep.c: remove "extended" in favor of "pattern_expression", fix segfault
</content>
</entry>
<entry>
<title>grep.c: remove "extended" in favor of "pattern_expression", fix segfault</title>
<updated>2022-10-11T15:48:54Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-10-11T09:48:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=db84376f981c64a1577d17d99918b2ef65a07a11'/>
<id>urn:sha1:db84376f981c64a1577d17d99918b2ef65a07a11</id>
<content type='text'>
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching.,
2006-06-30) the "pattern_expression" member has been used for complex
queries (AND/OR...), with "pattern_list" being used for the simple OR
queries. Since then we've used both "pattern_expression" and its
associated boolean "extended" member to see if we have a complex
expression.

Since f41fb662f57 (revisions API: have release_revisions() release
"grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
we supplied options that were only used for "complex queries", but
didn't supply the query itself we'd set "opt-&gt;extended", but would
have a NULL "pattern_expression". As a result these would segfault as
we tried to call "free_grep_patterns()" from "release_revisions()":

	git -P log -1 --invert-grep
	git -P log -1 --all-match

The root cause of this is that we were conflating the state management
we needed in "compile_grep_patterns()" itself with whether or not we
had an "opt-&gt;pattern_expression" later on.

In this cases as we're going through "compile_grep_patterns()" we have
no "opt-&gt;pattern_list" but have "opt-&gt;no_body_match" or
"opt-&gt;all_match". So we'd set "opt-&gt;extended = 1", but not "return" on
"opt-&gt;extended" as that's an "else if" in the same "if" statement.

That behavior is intentional and required, as the common case is that
we have an "opt-&gt;pattern_list" that we're about to parse into the
"opt-&gt;pattern_expression".

But we don't need to keep track of this "extended" flag beyond the
state management in compile_grep_patterns() itself. It needs it, but
once we're out of that function we can rely on
"opt-&gt;pattern_expression" being non-NULL instead for using these
extended patterns.

As 79d3696cfb4 itself shows we've assumed that there's a one-to-one
mapping between the two since the very beginning. I.e. "match_line()"
would check "opt-&gt;extended" to see if it should call "match_expr()",
and the first thing we do in that function is assume that we have a
"opt-&gt;pattern_expression". We'd then call "match_expr_eval()", which
would have died if that "opt-&gt;pattern_expression" was NULL.

The "die" was added in c922b01f54c (grep: fix segfault when "git grep
'('" is given, 2009-04-27), and can now be removed as it's now clearly
unreachable. We still do the right thing in the case that prompted
that fix:

	git grep '('
	fatal: unmatched parenthesis

Arguably neither the "--invert-grep" option added in [1] nor the
earlier "--all-match" option added in [2] were intended to be used
stand-alone, and another approach[3] would be to error out in those
cases. But since we've been treating them as a NOOP when given without
--grep for a long time let's keep doing that.

We could also return in "free_pattern_expr()" if the argument is
non-NULL, as an alternative fix for this segfault does [4]. That would
be more elegant in making the "free_*()" function behave like
"free()", but it would also remove a sanity check: The
"free_pattern_expr()" function calls itself recursively, and only the
top-level is allowed to be NULL, let's not conflate those two
conditions.

1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31d (grep --all-match, 2006-09-27)
3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com

Reported-by: orygaw &lt;orygaw@protonmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: add --max-count command line option</title>
<updated>2022-06-22T20:23:29Z</updated>
<author>
<name>Carlos López</name>
<email>00xc@protonmail.com</email>
</author>
<published>2022-06-22T19:47:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68437ede53dccd1dea9e44e831a59de274d389de'/>
<id>urn:sha1:68437ede53dccd1dea9e44e831a59de274d389de</id>
<content type='text'>
This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/pcre-invalid-utf8-fix-fix'</title>
<updated>2022-02-25T23:47:38Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-02-25T23:47:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5c4f3804a73933504e76b6ebdd946a2db3b1d214'/>
<id>urn:sha1:5c4f3804a73933504e76b6ebdd946a2db3b1d214</id>
<content type='text'>
Workaround we have for versions of PCRE2 before their version 10.36
were in effect only for their versions newer than 10.36 by mistake,
which has been corrected.

* rs/pcre-invalid-utf8-fix-fix:
  grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround
</content>
</entry>
</feed>
