<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.c, branch v2.34.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.34.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.34.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-11-19T17:10:27Z</updated>
<entry>
<title>Revert "grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data"</title>
<updated>2021-11-19T17:10:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-11-19T17:06:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7f3925bed86edf1b79fd18e5600252e445019d1'/>
<id>urn:sha1:e7f3925bed86edf1b79fd18e5600252e445019d1</id>
<content type='text'>
This reverts commit ae39ba431ab861548eb60b4bd2e1d8b8813db76f, as it
breaks "grep" when looking for a string in non UTF-8 haystack, when
linked with certain versions of PCREv2 library.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data</title>
<updated>2021-10-15T19:45:39Z</updated>
<author>
<name>Hamza Mahfooz</name>
<email>someguy@effective-light.com</email>
</author>
<published>2021-10-15T16:13:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ae39ba431ab861548eb60b4bd2e1d8b8813db76f'/>
<id>urn:sha1:ae39ba431ab861548eb60b4bd2e1d8b8813db76f</id>
<content type='text'>
If we attempt to grep non-ascii log message text with an ascii pattern, we
run into the following issue:

    $ git log --color --author='.var.*Bjar' -1 origin/master | grep ^Author
    grep: (standard input): binary file matches

So, to fix this teach the grep code to use PCRE2_UTF, as long as the log
output is encoded in UTF-8.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Hamza Mahfooz &lt;someguy@effective-light.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: refactor next_match() and match_one_pattern() for external use</title>
<updated>2021-09-29T20:23:11Z</updated>
<author>
<name>Hamza Mahfooz</name>
<email>someguy@effective-light.com</email>
</author>
<published>2021-09-29T11:57:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3f566c4e695a6df8237c34b7c1f34f0832b7e575'/>
<id>urn:sha1:3f566c4e695a6df8237c34b7c1f34f0832b7e575</id>
<content type='text'>
These changes are made in preparation of, the colorization support for the
"git log" subcommands that, rely on regex functionality (i.e. "--author",
"--committer" and "--grep"). These changes are necessary primarily because
match_one_pattern() expects header lines to be prefixed, however, in
pretty, the prefixes are stripped from the lines because the name-email
pairs need to go through additional parsing, before they can be printed and
because next_match() doesn't handle the case of
"ctx == GREP_CONTEXT_HEAD" at all. So, teach next_match() how to handle the
new case and move match_one_pattern()'s core logic to
headerless_match_one_pattern() while preserving match_one_pattern()'s uses
that depend on the additional processing.

Signed-off-by: Hamza Mahfooz &lt;someguy@effective-light.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: store grep_source buffer as const</title>
<updated>2021-09-22T18:59:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-09-21T03:51:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1e66871608d1f6f4cd66e899ee33755bbf6deafa'/>
<id>urn:sha1:1e66871608d1f6f4cd66e899ee33755bbf6deafa</id>
<content type='text'>
Our grep_buffer() function takes a non-const buffer, which is confusing:
we don't take ownership of nor write to the buffer.

This mostly comes from the fact that the underlying grep_source struct
in which we store the buffer uses non-const pointer. The memory pointed
to by the struct is sometimes owned by us (for FILE or OID sources), and
sometimes not (for BUF sources).

Let's store it as const, which lets us err on the side of caution (i.e.,
the compiler will warn us if any of our code writes to or tries to free
it).

As a result, we must annotate the one place where we do free it by
casting away the constness. But that's a small price to pay for the
extra safety and clarity elsewhere (and indeed, it already had a comment
explaining why GREP_SOURCE_BUF _didn't_ free it).

And then we can mark grep_buffer() as taking a const buffer.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: mark "haystack" buffers as const</title>
<updated>2021-09-22T18:59:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-09-21T03:49:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1a845fbc48f2613c0daab717ee934e066e65723d'/>
<id>urn:sha1:1a845fbc48f2613c0daab717ee934e066e65723d</id>
<content type='text'>
When we're grepping in a buffer, we don't need to modify it. So we can
take "const char *" buffers, rather than "char *". This can avoid some
awkward casts in our callers, and make our expectations more clear (we
will not take ownership of the memory, nor will we ever write to it).

These spots don't all necessarily have to be converted at the same time,
but some of them are dependent on others, because we pass
pointers-to-pointers in a few cases. And none of this should change any
behavior, since we're just adding "const" qualifiers (and likewise, the
compiler will let us know if we missed any spots). So it's relatively
low-risk to just do this all at once.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: stop modifying buffer in grep_source_1()</title>
<updated>2021-09-22T18:59:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-09-21T03:48:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f84e79ff4bffbcd9de85adc270f5164a6b024d34'/>
<id>urn:sha1:f84e79ff4bffbcd9de85adc270f5164a6b024d34</id>
<content type='text'>
We find the end of each matching line of a buffer, and then temporarily
write a NUL to turn it into a regular C string. But we don't need to do
so, because the only thing we do in the interim is pass the line and its
length (via an "eol" pointer) to match_line(). And that function should
only look at the bytes we passed it, whether it has a terminating NUL or
not.

We can drop this temporary write in order to simplify the code and make
it easier to use const buffers in more of grep.c.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: stop modifying buffer in show_line()</title>
<updated>2021-09-22T18:59:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-09-21T03:48:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=995e525b1729ada354e443f16e1c0fad59df25a8'/>
<id>urn:sha1:995e525b1729ada354e443f16e1c0fad59df25a8</id>
<content type='text'>
When showing lines via grep (or looking for funcnames), we call
show_line() on a multi-line buffer. It finds the end of line and marks
it with a NUL. However, we don't need to do so, as the resulting line is
only used along with its "eol" marker:

 - we pass both to next_match(), which takes care to look at only the
   bytes we specified

 - we pass the line to output_color() without its matching eol marker.
   However, we do use the "match" struct we got from next_match() to
   tell it how many bytes to look at (which can never exceed the string
   we passed it).

So we can stop setting and restoring this NUL marker. That makes the
code simpler, and will allow us to take a const buffer in a future
patch.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: stop modifying buffer in strip_timestamp</title>
<updated>2021-09-22T18:59:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-09-21T03:46:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cc8e26ee8d56dbdb442796a43aa7f30d3684b036'/>
<id>urn:sha1:cc8e26ee8d56dbdb442796a43aa7f30d3684b036</id>
<content type='text'>
When grepping for headers in commit objects, we receive individual
lines (e.g., "author Name &lt;email&gt; 1234 -0000"), and then strip off the
timestamp to do our match. We do so by writing a NUL byte over the
whitespace separator, and then remembering to restore it later.

We had to do it this way when this was added back in a4d7d2c6db (log
--author/--committer: really match only with name part, 2008-09-04),
because we fed the result directly to regexec(), which expects a
NUL-terminated string. But since b7d36ffca0 (regex: use regexec_buf(),
2016-09-21), we have a function which can match part of a buffer.

So instead of modifying the string, we can instead just move the "eol"
pointer, and the rest of the code will do the right thing. This will let
further patches mark more buffers as "const".

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: add repository to OID grep sources</title>
<updated>2021-09-08T18:48:05Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2021-08-16T21:09:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0693806bf82fb76347e226d8fc5e69077c0a3df5'/>
<id>urn:sha1:0693806bf82fb76347e226d8fc5e69077c0a3df5</id>
<content type='text'>
Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Reviewed-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: typesafe versions of grep_source_init</title>
<updated>2021-09-08T18:47:55Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2021-08-16T21:09:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=50d92b5f03f3c84d581b27cb8fa3a4b8cfbf2567'/>
<id>urn:sha1:50d92b5f03f3c84d581b27cb8fa3a4b8cfbf2567</id>
<content type='text'>
grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Reviewed-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
