<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/grep.c, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-11-30T21:55:54Z</updated>
<entry>
<title>grep: copy struct in one fell swoop</title>
<updated>2020-11-30T21:55:54Z</updated>
<author>
<name>Martin Ågren</name>
<email>martin.agren@gmail.com</email>
</author>
<published>2020-11-29T19:52:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6ba9bb76e0279bce9f614cb7f4ee28d8a601e79e'/>
<id>urn:sha1:6ba9bb76e0279bce9f614cb7f4ee28d8a601e79e</id>
<content type='text'>
We have a `struct grep_opt` with our defaults which we then copy into
the caller's struct. Rather than zeroing the target struct and copying
each element one by one, just copy everything at once. This leaves the
code simpler and more maintainable.

We don't have any ownership issues with what we're copying now and can
just greedily copy the whole thing. If and when we do need to handle
such elements (`char *`?), we must and can handle it appropriately. Make
sure to leave a comment to our future selves.

Signed-off-by: Martin Ågren &lt;martin.agren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: use designated initializers for `grep_defaults`</title>
<updated>2020-11-21T22:50:33Z</updated>
<author>
<name>Martin Ågren</name>
<email>martin.agren@gmail.com</email>
</author>
<published>2020-11-21T18:31:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=96313423a75fa8d88b6ecd5a15c21a7fbaf9e9be'/>
<id>urn:sha1:96313423a75fa8d88b6ecd5a15c21a7fbaf9e9be</id>
<content type='text'>
In 15fabd1bbd ("builtin/grep.c: make configuration callback more
reusable", 2012-10-09), we learned to fill a `static struct grep_opt
grep_defaults` which we can use as a blueprint for other such structs.

At the time, we didn't consider designated initializers to be widely
useable, but these days, we do. (See, e.g., cbc0f81d96 ("strbuf: use
designated initializers in STRBUF_INIT", 2017-07-10).)

Use designated initializers to let the compiler set up the struct and so
that we don't need to remember to call `init_grep_defaults()`.

Signed-off-by: Martin Ågren &lt;martin.agren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: don't set up a "default" repo for grep</title>
<updated>2020-11-21T22:50:29Z</updated>
<author>
<name>Martin Ågren</name>
<email>martin.agren@gmail.com</email>
</author>
<published>2020-11-21T18:31:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1d3878799f8260968ea9f6a75a92c4daca1da133'/>
<id>urn:sha1:1d3878799f8260968ea9f6a75a92c4daca1da133</id>
<content type='text'>
`init_grep_defaults()` fills a `static struct grep_opt grep_defaults`.
This struct is then used by `grep_init()` as a blueprint for other such
structs. Notably, `grep_init()` takes a `struct repo *` and assigns it
into the target struct.

As a result, it is unnecessary for us to take a `struct repo *` in
`init_grep_defaults()` as well. We assign it into the default struct and
never look at it again. And in light of how we return early if we have
already set up the default struct, it's not just unnecessary, but is
also a bit confusing: If we are called twice and with different repos,
is it a bug or a feature that we ignore the second repo?

Drop the repo parameter for `init_grep_defaults()`.

Signed-off-by: Martin Ågren &lt;martin.agren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>comment: fix spelling mistakes inside comments</title>
<updated>2020-07-29T18:39:40Z</updated>
<author>
<name>Steve Kemp</name>
<email>steve@steve.org.uk</email>
</author>
<published>2020-07-29T03:33:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=84544f2ea3441a5715fc3e2dfbb025083872fac5'/>
<id>urn:sha1:84544f2ea3441a5715fc3e2dfbb025083872fac5</id>
<content type='text'>
This commit fixes a couple of minor spelling mistakes inside
comments.

Signed-off-by: Steve Kemp &lt;steve@steve.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: replace grep_read_mutex by internal obj read lock</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1d1729caebd41b340dd8dd61057f613da4df526c'/>
<id>urn:sha1:1d1729caebd41b340dd8dd61057f613da4df526c</id>
<content type='text'>
git-grep uses 'grep_read_mutex' to protect its calls to object reading
operations. But these have their own internal lock now, which ensures a
better performance (allowing parallel access to more regions). So, let's
remove the former and, instead, activate the latter with
enable_obj_read_lock().

Sections that are currently protected by 'grep_read_mutex' but are not
internally protected by the object reading lock should be surrounded by
obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion
with object reading operations, keeping the current behavior and
avoiding race conditions. Namely, these places are:

  In grep.c:

  - fill_textconv() at fill_textconv_grep().
  - userdiff_get_textconv() at grep_source_1().

  In builtin/grep.c:

  - parse_object_or_die() and the submodule functions at
    grep_submodule().
  - deref_tag() and gitmodules_config_oid() at grep_objects().

If these functions become thread-safe, in the future, we might remove
the locking and probably get some speedup.

Note that some of the submodule functions will already be thread-safe
(or close to being thread-safe) with the internal object reading lock.
However, as some of them will require additional modifications to be
removed from the critical section, this will be done in its own patch.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: fix race conditions on userdiff calls</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c3a5bb31c1567edd1671b54726d4acbb10563a2c'/>
<id>urn:sha1:c3a5bb31c1567edd1671b54726d4acbb10563a2c</id>
<content type='text'>
git-grep uses an internal grep_read_mutex to protect object reading
operations. Similarly, there's a grep_attr_mutex to protect calls to the
gitattributes machinery. However, two of the three functions protected
by the last mutex may also perform object reading, as seen below:

- userdiff_get_textconv() &gt; notes_cache_init() &gt;
  notes_cache_match_validity() &gt; lookup_commit_reference_gently() &gt;
  parse_object() &gt; repo_has_object_file() &gt;
  repo_has_object_file_with_flags() &gt; oid_object_info_extended()

- userdiff_find_by_path() &gt; git_check_attr() &gt; collect_some_attrs() &gt;
  prepare_attr_stack() &gt; read_attr() &gt; read_attr_from_index() &gt;
  read_blob_data_from_index() &gt; read_object_file()

As these calls are not protected by grep_read_mutex, there might be race
conditions with other threads performing object reading (e.g. threads
calling fill_textconv() at grep.c:fill_textconv_grep()). To prevent
that, let's make sure to acquire the lock before both of these calls.

Note: this patch might slow down the threaded grep in worktree, for the
sake of thread-safeness. However, in the following patches, we should
regain performance by replacing grep_read_mutex for an internal object
reading lock and allowing parallel inflation during object reading.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: don't return an expression from pcre2_free()</title>
<updated>2019-11-30T22:06:58Z</updated>
<author>
<name>Hans Jerry Illikainen</name>
<email>hji@dyntopia.com</email>
</author>
<published>2019-11-27T20:24:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=867fc7f31010f668e18caa3e0825a36c308d3e3b'/>
<id>urn:sha1:867fc7f31010f668e18caa3e0825a36c308d3e3b</id>
<content type='text'>
Previously, the void pcre2_free() function in grep.c returned free().
While free() itself is void, afaict it's still an expression as per
section A.2.3, subsection 6.8.6 (jump-statement) in both C99 [1] and C11
[2]:

&gt; return expression

Section 6.8.6.4 in C99 [1] and C11 [2] says that:

&gt; A return statement with an expression shall not appear in a function
&gt; whose return type is void.

The consequence of the old behavior was that developer builds with
pedantic errors enabled broke Git if PCRE2 was enabled and a
smart-enough compiler to detect these errors was used.  This commit
fixes pedantic builds of Git that enables --with-libpcre.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

Signed-off-by: Hans Jerry Illikainen &lt;hji@dyntopia.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'cb/pcre2-chartables-leakfix'</title>
<updated>2019-10-23T05:43:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-23T05:43:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e0ff2d4c7ec338e30ea5e0340cda7f5fe8a187dc'/>
<id>urn:sha1:e0ff2d4c7ec338e30ea5e0340cda7f5fe8a187dc</id>
<content type='text'>
Leakfix.

* cb/pcre2-chartables-leakfix:
  grep: avoid leak of chartables in PCRE2
  grep: make PCRE2 aware of custom allocator
  grep: make PCRE1 aware of custom allocator
</content>
</entry>
<entry>
<title>grep: avoid leak of chartables in PCRE2</title>
<updated>2019-10-18T01:33:18Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2019-10-16T12:10:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=10da030ab757c38a01507bf18e2484698000b791'/>
<id>urn:sha1:10da030ab757c38a01507bf18e2484698000b791</id>
<content type='text'>
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced
a small memory leak visible with valgrind in t7813.

Complete the creation of a PCRE2 specific variable that was missing from
the original change and free the generated table just like it is done
for PCRE1.

Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: make PCRE2 aware of custom allocator</title>
<updated>2019-10-18T01:33:18Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2019-10-16T12:10:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=513f2b0bbd47015640723a10210c527839e8946a'/>
<id>urn:sha1:513f2b0bbd47015640723a10210c527839e8946a</id>
<content type='text'>
94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include
a way to override the system allocator, and so it is incompatible with
custom allocators (e.g. nedmalloc). This problem became obvious when we
tried to plug a memory leak by `free()`ing a data structure allocated by
PCRE2, triggering a segfault in Windows (where we use nedmalloc by
default).

PCRE2 requires the use of a general context to override the allocator
and therefore, there is a lot more code needed than in PCRE1, including
a couple of wrapper functions.

Extend the grep API with a "destructor" that could be called to cleanup
any objects that were created and used globally.

Update `builtin/grep.c` to use that new API, but any other future users
should make sure to have matching `grep_init()`/`grep_destroy()` calls
if they are using the pattern matching functionality.

Move some of the logic that was before done per thread (in the workers)
into an earlier phase to avoid degrading performance, but as the use of
PCRE2 with custom allocators is better understood it is expected more of
its functions will be instructed to use the custom allocator as well as
was done in the original code[1] this work was based on.

[1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/

Reported-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
