<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diffcore-pickaxe.c, branch v2.18.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.18.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.18.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2018-05-21T04:58:32Z</updated>
<entry>
<title>regex: do not call `regfree()` if compilation fails</title>
<updated>2018-05-21T04:58:32Z</updated>
<author>
<name>Martin Ågren</name>
<email>martin.agren@gmail.com</email>
</author>
<published>2018-05-20T10:50:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=17154b1576f8be5aa3044f3adadb5473da618a42'/>
<id>urn:sha1:17154b1576f8be5aa3044f3adadb5473da618a42</id>
<content type='text'>
It is apparently undefined behavior to call `regfree()` on a regex where
`regcomp()` failed. The language in [1] is a bit muddy, at least to me,
but the clearest hint is this (`preg` is the `regex_t *`):

    Upon successful completion, the regcomp() function shall return 0.
    Otherwise, it shall return an integer value indicating an error as
    described in &lt;regex.h&gt;, and the content of preg is undefined.

Funnily enough, there is also the `regerror()` function which should be
given a pointer to such a "failed" `regex_t` -- the content of which
would supposedly be undefined -- and which may investigate it to come up
with a detailed error message.

In any case, the example in that document shows how `regfree()` is not
called after `regcomp()` fails.

We have quite a few users of this API and most get this right. These
three users do not.

Several implementations can handle this just fine [2] and these code paths
supposedly have not wreaked havoc or we'd have heard about it. (These
are all in code paths where git got bad input and is just about to die
anyway.) But let's just avoid the issue altogether.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html

[2] https://www.redhat.com/archives/libvir-list/2013-September/msg00262.html

Researched-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-byi Martin Ågren &lt;martin.agren@gmail.com&gt;

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: properly error out when combining multiple pickaxe options</title>
<updated>2018-01-04T23:02:40Z</updated>
<author>
<name>Stefan Beller</name>
<email>sbeller@google.com</email>
</author>
<published>2018-01-04T22:50:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5e505257f2651647c072f9c61fdc5dd52bbce8b2'/>
<id>urn:sha1:5e505257f2651647c072f9c61fdc5dd52bbce8b2</id>
<content type='text'>
In f506b8e8b5 (git log/diff: add -G&lt;regexp&gt; that greps in the patch text,
2010-08-23) we were hesitant to check if the user requests both -S and
-G at the same time. Now that the pickaxe family also offers --find-object,
which looks slightly more different than the former two, let's add a check
that those are not used at the same time.

Signed-off-by: Stefan Beller &lt;sbeller@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore: add a pickaxe option to find a specific blob</title>
<updated>2018-01-04T23:02:40Z</updated>
<author>
<name>Stefan Beller</name>
<email>sbeller@google.com</email>
</author>
<published>2018-01-04T22:50:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=15af58c1adba431c216e2a45fa0d22944560ba02'/>
<id>urn:sha1:15af58c1adba431c216e2a45fa0d22944560ba02</id>
<content type='text'>
Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe &lt;blob-id&gt;` gives a description as
'&lt;commit-ish&gt;:&lt;path&gt;'.  This was implemented at [2]; as seen by the sheer
number of responses (&gt;110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

    $ ./git log --oneline --find-object=v2.0.0:Makefile
    b2feb64309 Revert the whole "ask curl-config" topic for now
    47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was appeared in
v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b.  The
reason why these commits both occur prior to v2.0.0 are evil
merges that are not found using this new mechanism.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller &lt;sbeller@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bit</title>
<updated>2018-01-04T23:02:40Z</updated>
<author>
<name>Stefan Beller</name>
<email>sbeller@google.com</email>
</author>
<published>2018-01-04T22:50:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c1ddc4610c553b06591aac74b610b56448cbb976'/>
<id>urn:sha1:c1ddc4610c553b06591aac74b610b56448cbb976</id>
<content type='text'>
Currently flags for pickaxing are found in different places. Unify the
flags into the `pickaxe_opts` field, which will contain any pickaxe related
flags.

Signed-off-by: Stefan Beller &lt;sbeller@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: make struct diff_flags members lowercase</title>
<updated>2017-11-01T02:51:40Z</updated>
<author>
<name>Brandon Williams</name>
<email>bmwill@google.com</email>
</author>
<published>2017-10-31T18:19:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0d1e0e7801bb2ae22036ad09f9f5cec17e08c48b'/>
<id>urn:sha1:0d1e0e7801bb2ae22036ad09f9f5cec17e08c48b</id>
<content type='text'>
Now that the flags stored in struct diff_flags are being accessed
directly and not through macros, change all struct members from being
uppercase to lowercase.
This conversion is done using the following semantic patch:

	@@
	expression E;
	@@
	- E.RECURSIVE
	+ E.recursive

	@@
	expression E;
	@@
	- E.TREE_IN_RECURSIVE
	+ E.tree_in_recursive

	@@
	expression E;
	@@
	- E.BINARY
	+ E.binary

	@@
	expression E;
	@@
	- E.TEXT
	+ E.text

	@@
	expression E;
	@@
	- E.FULL_INDEX
	+ E.full_index

	@@
	expression E;
	@@
	- E.SILENT_ON_REMOVE
	+ E.silent_on_remove

	@@
	expression E;
	@@
	- E.FIND_COPIES_HARDER
	+ E.find_copies_harder

	@@
	expression E;
	@@
	- E.FOLLOW_RENAMES
	+ E.follow_renames

	@@
	expression E;
	@@
	- E.RENAME_EMPTY
	+ E.rename_empty

	@@
	expression E;
	@@
	- E.HAS_CHANGES
	+ E.has_changes

	@@
	expression E;
	@@
	- E.QUICK
	+ E.quick

	@@
	expression E;
	@@
	- E.NO_INDEX
	+ E.no_index

	@@
	expression E;
	@@
	- E.ALLOW_EXTERNAL
	+ E.allow_external

	@@
	expression E;
	@@
	- E.EXIT_WITH_STATUS
	+ E.exit_with_status

	@@
	expression E;
	@@
	- E.REVERSE_DIFF
	+ E.reverse_diff

	@@
	expression E;
	@@
	- E.CHECK_FAILED
	+ E.check_failed

	@@
	expression E;
	@@
	- E.RELATIVE_NAME
	+ E.relative_name

	@@
	expression E;
	@@
	- E.IGNORE_SUBMODULES
	+ E.ignore_submodules

	@@
	expression E;
	@@
	- E.DIRSTAT_CUMULATIVE
	+ E.dirstat_cumulative

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_FILE
	+ E.dirstat_by_file

	@@
	expression E;
	@@
	- E.ALLOW_TEXTCONV
	+ E.allow_textconv

	@@
	expression E;
	@@
	- E.TEXTCONV_SET_VIA_CMDLINE
	+ E.textconv_set_via_cmdline

	@@
	expression E;
	@@
	- E.DIFF_FROM_CONTENTS
	+ E.diff_from_contents

	@@
	expression E;
	@@
	- E.DIRTY_SUBMODULES
	+ E.dirty_submodules

	@@
	expression E;
	@@
	- E.IGNORE_UNTRACKED_IN_SUBMODULES
	+ E.ignore_untracked_in_submodules

	@@
	expression E;
	@@
	- E.IGNORE_DIRTY_SUBMODULES
	+ E.ignore_dirty_submodules

	@@
	expression E;
	@@
	- E.OVERRIDE_SUBMODULE_CONFIG
	+ E.override_submodule_config

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_LINE
	+ E.dirstat_by_line

	@@
	expression E;
	@@
	- E.FUNCCONTEXT
	+ E.funccontext

	@@
	expression E;
	@@
	- E.PICKAXE_IGNORE_CASE
	+ E.pickaxe_ignore_case

	@@
	expression E;
	@@
	- E.DEFAULT_FOLLOW_RENAMES
	+ E.default_follow_renames

Signed-off-by: Brandon Williams &lt;bmwill@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: remove DIFF_OPT_TST macro</title>
<updated>2017-11-01T02:50:03Z</updated>
<author>
<name>Brandon Williams</name>
<email>bmwill@google.com</email>
</author>
<published>2017-10-31T18:19:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3b69daed861daec1923c369d59c97e46eb3c3d7b'/>
<id>urn:sha1:3b69daed861daec1923c369d59c97e46eb3c3d7b</id>
<content type='text'>
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_TST(&amp;E, fld)
	+ E.flags.fld

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_TST(ptr, fld)
	+ ptr-&gt;flags.fld

Signed-off-by: Brandon Williams &lt;bmwill@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'js/regexec-buf'</title>
<updated>2017-03-24T20:07:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2017-03-24T20:07:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0efeb5ca12f070ca3a8cec48761af863e9a7f6fe'/>
<id>urn:sha1:0efeb5ca12f070ca3a8cec48761af863e9a7f6fe</id>
<content type='text'>
Fix for potential segv introduced in v2.11.0 and later (also
v2.10.2).

* js/regexec-buf:
  pickaxe: fix segfault with '-S&lt;...&gt; --pickaxe-regex'
</content>
</entry>
<entry>
<title>pickaxe: fix segfault with '-S&lt;...&gt; --pickaxe-regex'</title>
<updated>2017-03-18T19:22:33Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2017-03-18T18:24:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f53c5de29cec68e3294a008052251631eaffcf07'/>
<id>urn:sha1:f53c5de29cec68e3294a008052251631eaffcf07</id>
<content type='text'>
'git {log,diff,...} -S&lt;...&gt; --pickaxe-regex' can segfault as a result
of out-of-bounds memory reads.

diffcore-pickaxe.c:contains() looks for all matches of the given regex
in a buffer in a loop, advancing the buffer pointer to the end of the
last match in each iteration.  When we switched to REG_STARTEND in
b7d36ffca (regex: use regexec_buf(), 2016-09-21), we started passing
the size of that buffer to the regexp engine, too.  Unfortunately,
this buffer size is never updated on subsequent iterations, and as the
buffer pointer advances on each iteration, this "bufptr+bufsize"
points past the end of the buffer.  This results in segmentation
fault, if that memory can't be accessed.  In case of 'git log' it can
also result in erroneously listed commits, if the memory past the end
of buffer is accessible and happens to contain data matching the
regex.

Reduce the buffer size on each iteration as the buffer pointer is
advanced, thus maintaining the correct end of buffer location.
Furthermore, make sure that the buffer pointer is not dereferenced in
the control flow statements when we already reached the end of the
buffer.

The new test is flaky, I've never seen it fail on my Linux box even
without the fix, but this is expected according to db5dfa3 (regex:
-G&lt;pattern&gt; feeds a non NUL-terminated string to regexec() and fails,
2016-09-21).  However, it did fail on Travis CI with the first (and
incomplete) version of the fix, and based on that commit message I
would expect the new test without the fix to fail most of the time on
Windows.

Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'js/regexec-buf'</title>
<updated>2016-09-26T23:09:19Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2016-09-26T23:09:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6a67695268562f67babdb7d5195c8a43cc4015fa'/>
<id>urn:sha1:6a67695268562f67babdb7d5195c8a43cc4015fa</id>
<content type='text'>
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region.  This was fixed by introducing
a regexec_buf() helper that takes a &lt;ptr,len&gt; pair with REG_STARTEND
extension.

* js/regexec-buf:
  regex: use regexec_buf()
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: -G&lt;pattern&gt; feeds a non NUL-terminated string to regexec() and fails
</content>
</entry>
<entry>
<title>regex: use regexec_buf()</title>
<updated>2016-09-21T20:56:15Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2016-09-21T18:24:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b7d36ffca02c23f545d6e098d78180e6e72dfd8d'/>
<id>urn:sha1:b7d36ffca02c23f545d6e098d78180e6e72dfd8d</id>
<content type='text'>
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G &lt;regex&gt;` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf(), the code has
become much cleaner.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
