<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.c, branch v2.50.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.50.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.50.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-04-15T15:24:37Z</updated>
<entry>
<title>object-store: merge "object-store-ll.h" and "object-store.h"</title>
<updated>2025-04-15T15:24:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68cd492a3e662c75dec364986c81e94716d4ac56'/>
<id>urn:sha1:68cd492a3e662c75dec364986c81e94716d4ac56</id>
<content type='text'>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/leakfixes-part-9'</title>
<updated>2024-11-12T23:35:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-11-12T23:35:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6890c99e38c378c2fff7747b52bd2102b403357d'/>
<id>urn:sha1:6890c99e38c378c2fff7747b52bd2102b403357d</id>
<content type='text'>
More leakfixes.

* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
</content>
</entry>
<entry>
<title>grep: fix leak in `grep_splice_or()`</title>
<updated>2024-11-05T06:37:52Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-11-05T06:16:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a6590ccdd431e2ab7b9c521cac674546725a54d2'/>
<id>urn:sha1:a6590ccdd431e2ab7b9c521cac674546725a54d2</id>
<content type='text'>
In `grep_splice_or()` we search for the next `TRUE` node in our tree of
grep expressions and replace it with the given new expression. But we
don't free the old node, which causes a memory leak. Plug it.

This leak is exposed by t7810, but plugging it alone isn't sufficient to
make the test suite pass.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: disable lookahead on error</title>
<updated>2024-10-22T16:45:49Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-10-20T11:02:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ce025ae4f61e8e32b2ae6589e43e03e60f713f2d'/>
<id>urn:sha1:ce025ae4f61e8e32b2ae6589e43e03e60f713f2d</id>
<content type='text'>
regexec(3) can fail.  E.g. on macOS it fails if it is used with an UTF-8
locale to match a valid regex against a buffer containing invalid UTF-8
characters.

git grep has two ways to search for matches in a file: Either it splits
its contents into lines and matches them separately, or it matches the
whole content and figures out line boundaries later.  The latter is done
by look_ahead() and it's quicker in the common case where most files
don't contain a match.

Fall back to line-by-line matching if look_ahead() encounters an
regexec(3) error by propagating errors out of patmatch() and bailing out
of look_ahead() if there is one.  This way we at least can find matches
in lines that contain only valid characters.  That matches the behavior
of grep(1) on macOS.

pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since
we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8
characters gracefully.  So implement the fall-back only for regexec(3)
and leave the PCRE2 matching unchanged.

Reported-by: David Gstir &lt;david@sigma-star.at&gt;
Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Tested-by: David Gstir &lt;david@sigma-star.at&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>grep: fix leaking grep pattern</title>
<updated>2024-09-27T15:25:36Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-26T11:46:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6d82437a470fc7797e41f5a7ac4874914db7cdb8'/>
<id>urn:sha1:6d82437a470fc7797e41f5a7ac4874914db7cdb8</id>
<content type='text'>
When creating a pattern via `create_grep_pat()` we allocate the pattern
member of the structure regardless of the token type. But later, when we
try to free the structure, we free the pattern member conditionally on
the token type and thus leak memory.

Plug this leak. The leak is exposed by t7814, but plugging it alone does
not make the whole test suite pass.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: prefer UNUSED to MAYBE_UNUSED for pcre allocators</title>
<updated>2024-08-29T20:59:46Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-08-29T20:09:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=516a9ec3d5874aad41757e573f7b841bb45cb098'/>
<id>urn:sha1:516a9ec3d5874aad41757e573f7b841bb45cb098</id>
<content type='text'>
We provide custom malloc/free callbacks for the pcre library to use.
Those take an extra "data" parameter, but we don't use it. Back when
these were added in 513f2b0bbd (grep: make PCRE2 aware of custom
allocator, 2019-10-16), we only had MAYBE_UNUSED.

But these days we have UNUSED, which we should prefer, as it will
let the compiler inform us if the code changes to actually use the
parameters.

I also moved the annotations to come after the variable name, which is
how we typically spell it.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: -W: skip trailing empty lines at EOF, too</title>
<updated>2024-07-30T16:59:04Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-07-30T14:18:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8e5dd94e68e37d2d4b67e34db00b9e9790a2325b'/>
<id>urn:sha1:8e5dd94e68e37d2d4b67e34db00b9e9790a2325b</id>
<content type='text'>
4aa2c4753d (grep: -W: don't extend context to trailing empty lines,
2016-05-28) stopped showing empty lines at the end of function context
when using -W.  Do the same for trailing empty lines at the end of
files, for consistency -- it doesn't matter whether a function section
is ended by the next function or the end of the file.

Test it by adding a trailing empty line to the file used by the test
"grep -W" and leave its expected output the same.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: improve errors for unmatched ( and )</title>
<updated>2024-03-25T18:40:53Z</updated>
<author>
<name>Ahelenia Ziemiańska</name>
<email>nabijaczleweli@nabijaczleweli.xyz</email>
</author>
<published>2024-03-23T13:18:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0d527842b7633091158450d4dcda3ceb3547a636'/>
<id>urn:sha1:0d527842b7633091158450d4dcda3ceb3547a636</id>
<content type='text'>
Imagine you want to grep for (. Easy:

  $ git grep '('
  fatal: unmatched parenthesis

uhoh. This is plainly wrong. Unless you know specifically that

 (a) git grep has expression groups and '(' ... ')' are used for them.
 (b) you can use -e '(' to explicitly say '(' is what you are looking
     for, not the beginning of a group.

Similarly,

  $ git grep ')'
  fatal: incomplete pattern expression: )

is somehow worse. ")" is a complete regular expression pattern.
Of course, the error wants to say "group" here.
In this case it is also not "incomplete", it is unmatched.

Make them say

  $ ./git grep '('
  fatal: unmatched ( for expression group
  $ ./git grep ')'
  fatal: incomplete pattern expression group: )

which are clearer in indicating that it is not the expression that
is wrong (since no pattern had been parsed at all), but rather that
it is been misconstrued as a grouping operator.

Link: https://bugs.debian.org/1051205
Signed-off-by: Ahelenia Ziemiańska &lt;nabijaczleweli@nabijaczleweli.xyz&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
