<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-09-16T20:37:05Z</updated>
<entry>
<title>grep: don't treat grep_opt.color as a strict bool</title>
<updated>2025-09-16T20:37:05Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-09-16T20:16:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8ee247671ddbe5b1128d774b0950627bb4afc4a6'/>
<id>urn:sha1:8ee247671ddbe5b1128d774b0950627bb4afc4a6</id>
<content type='text'>
In show_line(), we check to see if colors are desired with just:

  if (opt-&gt;color)
     ...we want colors...

But this is incorrect. The color field here is really a git_colorbool,
so it may be "true" for GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO. Either of
those _might_ end up true eventually (once we apply default fallbacks
and check stdout's tty), but they may not. E.g.:

  git grep foo | cat

will enter the conditional even though we're not going to show colors.
We should collapse it into a true boolean by calling want_color().

It turns out that this does not produce a user-visible bug. We do some
extra processing to isolate the matched portion of the line in order to
colorize it, but ultimately we pass it to our output_color() helper,
which does correctly check want_color(). So we end up with no colors.

But dropping the extra processing saves a measurable amount of time. For
example, running under hyperfine (which redirects to /dev/null, and thus
does not colorize):

  Benchmark 1: ./git.old grep a
    Time (mean ± σ):      58.7 ms ±   3.5 ms    [User: 580.6 ms, System: 74.3 ms]
    Range (min … max):    53.5 ms …  67.1 ms    48 runs

  Benchmark 2: ./git.new grep a
    Time (mean ± σ):      35.5 ms ±   0.9 ms    [User: 276.8 ms, System: 73.8 ms]
    Range (min … max):    34.3 ms …  39.3 ms    79 runs

  Summary
    ./git.new grep a ran
      1.65 ± 0.11 times faster than ./git.old grep a

That's a fairly extreme benchmark, just because it will come up with a
ton of small matches, but it shows that this really does matter.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>odb: rename `repo_read_object_file()`</title>
<updated>2025-07-01T21:46:38Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d4ff88aee3967e5d1ef1237cd9b8792b7cdb304c'/>
<id>urn:sha1:d4ff88aee3967e5d1ef1237cd9b8792b7cdb304c</id>
<content type='text'>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.

Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: rename files to "odb.{c,h}"</title>
<updated>2025-07-01T21:46:34Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8f49151763cb81adf4bcec53c1ae67057081b02d'/>
<id>urn:sha1:8f49151763cb81adf4bcec53c1ae67057081b02d</id>
<content type='text'>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: merge "object-store-ll.h" and "object-store.h"</title>
<updated>2025-04-15T15:24:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68cd492a3e662c75dec364986c81e94716d4ac56'/>
<id>urn:sha1:68cd492a3e662c75dec364986c81e94716d4ac56</id>
<content type='text'>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/leakfixes-part-9'</title>
<updated>2024-11-12T23:35:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-11-12T23:35:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6890c99e38c378c2fff7747b52bd2102b403357d'/>
<id>urn:sha1:6890c99e38c378c2fff7747b52bd2102b403357d</id>
<content type='text'>
More leakfixes.

* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
</content>
</entry>
<entry>
<title>grep: fix leak in `grep_splice_or()`</title>
<updated>2024-11-05T06:37:52Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-11-05T06:16:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a6590ccdd431e2ab7b9c521cac674546725a54d2'/>
<id>urn:sha1:a6590ccdd431e2ab7b9c521cac674546725a54d2</id>
<content type='text'>
In `grep_splice_or()` we search for the next `TRUE` node in our tree of
grep expressions and replace it with the given new expression. But we
don't free the old node, which causes a memory leak. Plug it.

This leak is exposed by t7810, but plugging it alone isn't sufficient to
make the test suite pass.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: disable lookahead on error</title>
<updated>2024-10-22T16:45:49Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-10-20T11:02:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ce025ae4f61e8e32b2ae6589e43e03e60f713f2d'/>
<id>urn:sha1:ce025ae4f61e8e32b2ae6589e43e03e60f713f2d</id>
<content type='text'>
regexec(3) can fail.  E.g. on macOS it fails if it is used with an UTF-8
locale to match a valid regex against a buffer containing invalid UTF-8
characters.

git grep has two ways to search for matches in a file: Either it splits
its contents into lines and matches them separately, or it matches the
whole content and figures out line boundaries later.  The latter is done
by look_ahead() and it's quicker in the common case where most files
don't contain a match.

Fall back to line-by-line matching if look_ahead() encounters an
regexec(3) error by propagating errors out of patmatch() and bailing out
of look_ahead() if there is one.  This way we at least can find matches
in lines that contain only valid characters.  That matches the behavior
of grep(1) on macOS.

pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since
we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8
characters gracefully.  So implement the fall-back only for regexec(3)
and leave the PCRE2 matching unchanged.

Reported-by: David Gstir &lt;david@sigma-star.at&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Tested-by: David Gstir &lt;david@sigma-star.at&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>grep: fix leaking grep pattern</title>
<updated>2024-09-27T15:25:36Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-26T11:46:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6d82437a470fc7797e41f5a7ac4874914db7cdb8'/>
<id>urn:sha1:6d82437a470fc7797e41f5a7ac4874914db7cdb8</id>
<content type='text'>
When creating a pattern via `create_grep_pat()` we allocate the pattern
member of the structure regardless of the token type. But later, when we
try to free the structure, we free the pattern member conditionally on
the token type and thus leak memory.

Plug this leak. The leak is exposed by t7814, but plugging it alone does
not make the whole test suite pass.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
