<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.h, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-09-17T00:59:53Z</updated>
<entry>
<title>color: use git_colorbool enum type to store colorbools</title>
<updated>2025-09-17T00:59:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-09-16T23:13:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e9330ae4b820147c98e723399e9438c8bee60a80'/>
<id>urn:sha1:e9330ae4b820147c98e723399e9438c8bee60a80</id>
<content type='text'>
We traditionally used "int" to store and pass around the values defined
by "enum git_colorbool" (which were originally just #define macros).
Using an int doesn't produce incorrect results, but using the actual
enum makes the intent of the code more clear.

It would be nice if the compiler could catch cases where we used the
enum and an int interchangeably, since it's very easy to accidentally
check the boolean true/false of a colorbool like:

  if (branch_use_color)

This is wrong because GIT_COLOR_UNKNOWN and GIT_COLOR_AUTO evaluate to
true in C, even though we may ultimately decide not to use color. But C
is pretty happy to convert between ints and enums (even with various
-Wenum-* warnings). So this sadly doesn't protect us from such mistakes,
but it hopefully does make the code easier to read.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>color: use GIT_COLOR_* instead of numeric constants</title>
<updated>2025-09-16T20:37:03Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-09-16T20:13:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3c3e9b830383364316ba07730aecbc47a680b513'/>
<id>urn:sha1:3c3e9b830383364316ba07730aecbc47a680b513</id>
<content type='text'>
Long ago Git's decision to show color for a subsytem was stored in a
tri-state variable: it could be true (1), false (0), or unknown (-1).
But since daa0c3d971 (color: delay auto-color decision until point of
use, 2011-08-17) we want to carry around a new state, "auto", which
bases the decision on the tty-ness of stdout (rather than collapsing
that "auto" state to a true/false immediately).

That commit introduced a set of GIT_COLOR_* defines to represent each
state: UNKNOWN, ALWAYS, NEVER, and AUTO. But it only used the AUTO
value, and left alone code using bare 0/1/-1 values. And of course since
then we've grown many new spots that use those bare values.

Let's switch all of these to use the named constants. That should make
the code a bit easier to read, as it is more obvious that we're
representing a color decision.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: add ctx arg to config_fn_t</title>
<updated>2023-06-28T21:06:39Z</updated>
<author>
<name>Glen Choo</name>
<email>chooglen@google.com</email>
</author>
<published>2023-06-28T19:26:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4e7e317f8f27f861321e6eb08b9c8c0f3ab570c'/>
<id>urn:sha1:a4e7e317f8f27f861321e6eb08b9c8c0f3ab570c</id>
<content type='text'>
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).

In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.

Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:

- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed

Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.

The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:

- trace2/tr2_cfg.c:tr2_cfg_set_fl()

  This is indirectly called by git_config_set() so that the trace2
  machinery can notice the new config values and update its settings
  using the tr2 config parsing function, i.e. tr2_cfg_cb().

- builtin/checkout.c:checkout_main()

  This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
  This might be worth refactoring away in the future, since
  git_xmerge_config() can call git_default_config(), which can do much
  more than just parsing.

Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.

Signed-off-by: Glen Choo &lt;chooglen@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: work around UTF-8 related JIT bug in PCRE2 &lt;= 10.34</title>
<updated>2023-03-23T18:19:34Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@grsecurity.net</email>
</author>
<published>2023-03-23T17:25:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=14b9a044798ebb3858a1f1a1377309a3d6054ac8'/>
<id>urn:sha1:14b9a044798ebb3858a1f1a1377309a3d6054ac8</id>
<content type='text'>
Stephane is reporting[1] a regression introduced in git v2.40.0 that leads
to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an
older version of libpcre2 that triggers a wild pointer dereference in
the generated JIT code that was fixed in PCRE2 10.35.

Instead of completely disabling the JIT compiler for the buggy version,
just mask out the Unicode property handling as we used to do prior to
commit acabd2048ee0 ("grep: correctly identify utf-8 characters with
\{b,w} in -P").

[1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/

Reported-by: Stephane Odul &lt;stephane@clumio.com&gt;
Signed-off-by: Mathias Krause &lt;minipli@grsecurity.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/grep-simplify-extended-expression'</title>
<updated>2022-10-21T18:37:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-10-21T18:37:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91d3d7e6e2be4aad74c5c602ed4988fbb1d82960'/>
<id>urn:sha1:91d3d7e6e2be4aad74c5c602ed4988fbb1d82960</id>
<content type='text'>
Giving "--invert-grep" and "--all-match" without "--grep" to the
"git log" command resulted in an attempt to access grep pattern
expression structure that has not been allocated, which has been
corrected.

* ab/grep-simplify-extended-expression:
  grep.c: remove "extended" in favor of "pattern_expression", fix segfault
</content>
</entry>
<entry>
<title>grep.c: remove "extended" in favor of "pattern_expression", fix segfault</title>
<updated>2022-10-11T15:48:54Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-10-11T09:48:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=db84376f981c64a1577d17d99918b2ef65a07a11'/>
<id>urn:sha1:db84376f981c64a1577d17d99918b2ef65a07a11</id>
<content type='text'>
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching.,
2006-06-30) the "pattern_expression" member has been used for complex
queries (AND/OR...), with "pattern_list" being used for the simple OR
queries. Since then we've used both "pattern_expression" and its
associated boolean "extended" member to see if we have a complex
expression.

Since f41fb662f57 (revisions API: have release_revisions() release
"grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
we supplied options that were only used for "complex queries", but
didn't supply the query itself we'd set "opt-&gt;extended", but would
have a NULL "pattern_expression". As a result these would segfault as
we tried to call "free_grep_patterns()" from "release_revisions()":

	git -P log -1 --invert-grep
	git -P log -1 --all-match

The root cause of this is that we were conflating the state management
we needed in "compile_grep_patterns()" itself with whether or not we
had an "opt-&gt;pattern_expression" later on.

In this cases as we're going through "compile_grep_patterns()" we have
no "opt-&gt;pattern_list" but have "opt-&gt;no_body_match" or
"opt-&gt;all_match". So we'd set "opt-&gt;extended = 1", but not "return" on
"opt-&gt;extended" as that's an "else if" in the same "if" statement.

That behavior is intentional and required, as the common case is that
we have an "opt-&gt;pattern_list" that we're about to parse into the
"opt-&gt;pattern_expression".

But we don't need to keep track of this "extended" flag beyond the
state management in compile_grep_patterns() itself. It needs it, but
once we're out of that function we can rely on
"opt-&gt;pattern_expression" being non-NULL instead for using these
extended patterns.

As 79d3696cfb4 itself shows we've assumed that there's a one-to-one
mapping between the two since the very beginning. I.e. "match_line()"
would check "opt-&gt;extended" to see if it should call "match_expr()",
and the first thing we do in that function is assume that we have a
"opt-&gt;pattern_expression". We'd then call "match_expr_eval()", which
would have died if that "opt-&gt;pattern_expression" was NULL.

The "die" was added in c922b01f54c (grep: fix segfault when "git grep
'('" is given, 2009-04-27), and can now be removed as it's now clearly
unreachable. We still do the right thing in the case that prompted
that fix:

	git grep '('
	fatal: unmatched parenthesis

Arguably neither the "--invert-grep" option added in [1] nor the
earlier "--all-match" option added in [2] were intended to be used
stand-alone, and another approach[3] would be to error out in those
cases. But since we've been treating them as a NOOP when given without
--grep for a long time let's keep doing that.

We could also return in "free_pattern_expr()" if the argument is
non-NULL, as an alternative fix for this segfault does [4]. That would
be more elegant in making the "free_*()" function behave like
"free()", but it would also remove a sanity check: The
"free_pattern_expr()" function calls itself recursively, and only the
top-level is allowed to be NULL, let's not conflate those two
conditions.

1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31d (grep --all-match, 2006-09-27)
3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com

Reported-by: orygaw &lt;orygaw@protonmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: add --max-count command line option</title>
<updated>2022-06-22T20:23:29Z</updated>
<author>
<name>Carlos López</name>
<email>00xc@protonmail.com</email>
</author>
<published>2022-06-22T19:47:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68437ede53dccd1dea9e44e831a59de274d389de'/>
<id>urn:sha1:68437ede53dccd1dea9e44e831a59de274d389de</id>
<content type='text'>
This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: simplify config parsing and option parsing</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04bf052eef53c6be04d313d8ce11690beaf890b6'/>
<id>urn:sha1:04bf052eef53c6be04d313d8ce11690beaf890b6</id>
<content type='text'>
Simplify the parsing of "grep.patternType" and
"grep.extendedRegexp". This changes no behavior, but gets rid of
complex parsing logic that isn't needed anymore.

When "grep.patternType" was introduced in 84befcd0a4a (grep: add a
grep.patternType configuration setting, 2012-08-03) we promised that:

 1. You can set "grep.patternType", and "[setting it to] 'default'
    will return to the default matching behavior".

    In that context "the default" meant whatever the configuration
    system specified before that change, i.e. via grep.extendedRegexp.

 2. We'd support the existing "grep.extendedRegexp" option, but ignore
    it when the new "grep.patternType" option is set. We said we'd
    only ignore the older "grep.extendedRegexp" option "when the
    `grep.patternType` option is set to a value other than
    'default'".

In a preceding commit we changed grep_config() to be called after
grep_init(), which means that much of the complexity here can go
away.

As before both "grep.patternType" and "grep.extendedRegexp" are
last-one-wins variable, with "grep.extendedRegexp" yielding to
"grep.patternType", except when "grep.patternType=default".

Note that as the previously added tests indicate this cannot be done
on-the-fly as we see the config variables, without introducing more
state keeping. I.e. if we see:

    -c grep.extendedRegexp=false
    -c grep.patternType=default
    -c extendedRegexp=true

We need to select ERE, since grep.patternType=default unselects that
variable, which normally has higher precedence, but we also need to
select BRE in cases of:

    -c grep.extendedRegexp=true \
    -c grep.extendedRegexp=false

Which would not be the case for this, which select ERE:

    -c grep.patternType=extended \
    -c grep.extendedRegexp=false

Therefore we cannot do this on-the-fly in grep_config without also
introducing tracking variables for not only the pattern type, but what
the source of that pattern type was.

So we need to decide on the pattern after our config was fully
parsed. Let's do that by deferring the decision on the pattern type
until it's time to compile it in compile_regexp().

By that time we've not only parsed the config, but also handled the
command-line options. Those will set "opt.pattern_type_option" (*not*
"opt.extended_regexp_option"!).

At that point all we need to do is see if "grep.patternType" was
UNSPECIFIED in the end (including an explicit "=default"), if so we'll
use the "grep.extendedRegexp" configuration, if any.

See my 07a3d411739 (grep: remove regflags from the public grep_opt
API, 2017-06-29) for addition of the two comments being removed here,
i.e. the complexity noted in that commit is now going away.

1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep.h: make "grep_opt.pattern_type_option" use its enum</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=321ee43628c53d6050fb7fbc552332bab681f1a4'/>
<id>urn:sha1:321ee43628c53d6050fb7fbc552332bab681f1a4</id>
<content type='text'>
Change the "pattern_type_option" member of "struct grep_opt" to use
the enum type we use for it.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep API: call grep_config() after grep_init()</title>
<updated>2022-02-16T02:00:50Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-16T00:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=72365bb49923da065e7a43e61a912ef17f143c7f'/>
<id>urn:sha1:72365bb49923da065e7a43e61a912ef17f143c7f</id>
<content type='text'>
The grep_init() function used the odd pattern of initializing the
passed-in "struct grep_opt" with a statically defined "grep_defaults"
struct, which would be modified in-place when we invoked
grep_config().

So we effectively (b) initialized config, (a) then defaults, (c)
followed by user options. Usually those are ordered as "a", "b" and
"c" instead.

As the comments being removed here show the previous behavior needed
to be carefully explained as we'd potentially share the populated
configuration among different instances of grep_init(). In practice we
didn't do that, but now that it can't be a concern anymore let's
remove those comments.

This does not change the behavior of any of the configuration
variables or options. That would have been the case if we didn't move
around the grep_config() call in "builtin/log.c". But now that we call
"grep_config" after "git_log_config" and "git_format_config" we'll
need to pass in the already initialized "struct grep_opt *".

See 6ba9bb76e02 (grep: copy struct in one fell swoop, 2020-11-29) and
7687a0541e0 (grep: move the configuration parsing logic to grep.[ch],
2012-10-09) for the commits that added the comments.

The memcpy() pattern here will be optimized away and follows the
convention of other *_init() functions. See 5726a6b4012 (*.c *_init():
define in terms of corresponding *_INIT macro, 2021-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
