<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/wildmatch.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-03-29T00:38:11Z</updated>
<entry>
<title>wildmatch: avoid using of the comma operator</title>
<updated>2025-03-29T00:38:11Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2025-03-27T11:53:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=752fe9dc929afe1944e44b852f1248df4fb82986'/>
<id>urn:sha1:752fe9dc929afe1944e44b852f1248df4fb82986</id>
<content type='text'>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.

In this instance, the usage is intentional because it allows storing the
value of the current character as `prev_ch` before making the next
character the current one, all of which happens in the loop condition
that lets the loop stop at a closing bracket.

However, it is hard to read.

The chosen alternative to using the comma operator is to move those
assignments from the condition into the loop body; In this particular
case that requires special care because the loop body contains a
`continue` for the case where a character class is found that starts
with `[:` but does not end in `:]` (and the assignments should occur
even when that code path is taken), which needs to be turned into a
`goto`.

Helped-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Acked-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'pw/wildmatch-fixes'</title>
<updated>2023-04-04T21:28:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-04T21:28:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f834089925e39cdf786f07757308e14b57973542'/>
<id>urn:sha1:f834089925e39cdf786f07757308e14b57973542</id>
<content type='text'>
The wildmatch library code unlearns exponential behaviour it
acquired some time ago since it was borrowed from rsync.

* pw/wildmatch-fixes:
  t3070: make chain lint tester happy
  wildmatch: hide internal return values
  wildmatch: avoid undefined behavior
  wildmatch: fix exponential behavior
</content>
</entry>
<entry>
<title>wildmatch: hide internal return values</title>
<updated>2023-03-20T17:58:53Z</updated>
<author>
<name>Phillip Wood</name>
<email>phillip.wood@dunelm.org.uk</email>
</author>
<published>2023-03-20T16:10:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91b81b64e332da185d3ac8679a977c665c80914e'/>
<id>urn:sha1:91b81b64e332da185d3ac8679a977c665c80914e</id>
<content type='text'>
WM_ABORT_ALL and WM_ABORT_TO_STARSTAR are used internally to limit
backtracking when a match fails, they are not of interest to the caller
and so should not be public.

Suggested-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>wildmatch: avoid undefined behavior</title>
<updated>2023-03-20T17:58:53Z</updated>
<author>
<name>Phillip Wood</name>
<email>phillip.wood@dunelm.org.uk</email>
</author>
<published>2023-03-20T16:10:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=81b26f8f2891f1a63d5dbf7c2d4209b8325062b6'/>
<id>urn:sha1:81b26f8f2891f1a63d5dbf7c2d4209b8325062b6</id>
<content type='text'>
The code changed in this commit is designed to check if the pattern
starts with "**/" or contains "/**/" (see 3a078dec33 (wildmatch: fix
"**" special case, 2013-01-01)). Unfortunately when the pattern begins
with "**/" `prev_p = p - 2` is evaluated when `p` points to the second
"*" and so the subtraction is undefined according to section 6.5.6 of
the C standard because the result does not point within the same object
as `p`. Fix this by avoiding the subtraction unless it is well defined.

Signed-off-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>wildmatch: fix exponential behavior</title>
<updated>2023-03-20T17:58:53Z</updated>
<author>
<name>Phillip Wood</name>
<email>phillip.wood@dunelm.org.uk</email>
</author>
<published>2023-03-20T16:10:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1f2e05f0b794d9e4b1cf07d63c9efd1325893ecc'/>
<id>urn:sha1:1f2e05f0b794d9e4b1cf07d63c9efd1325893ecc</id>
<content type='text'>
When dowild() cannot match a '*' or '/**/' wildcard then it must return
WM_ABORT_TO_STARSTAR or WM_ABORT_ALL respectively. Failure to observe
this results in unnecessary backtracking and the time taken for a failed
match increases exponentially with the number of wildcards in the
pattern [1]. Unfortunately in some instances dowild() returns WM_NOMATCH
for a failed match resulting in long match times for patterns containing
multiple wildcards as can be seen in the following benchmark.
(Note that the timings in the Benchmark 1 are really measuring the time
to execute test-tool rather than the time to match the pattern)

Benchmark 1: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a"
  Time (mean ± σ):      22.8 ms ±   1.7 ms    [User: 12.1 ms, System: 10.6 ms]
  Range (min … max):    19.4 ms …  26.9 ms    113 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a"
  Time (mean ± σ):      5.244 s ±  0.228 s    [User: 5.229 s, System: 0.010 s]
  Range (min … max):    4.969 s …  5.707 s    10 runs

  Warning: Ignoring non-zero exit code.

Summary
  't/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a"' ran
  230.37 ± 20.04 times faster than 't/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a"'

The security implications are limited as it only affects operations that
are potentially DoS vectors. For example by creating a blob containing
such a pattern a malicious user can exploit this behavior to use large
amounts of CPU time on a remote server by pushing the blob and then
creating a new clone with --filter=sparse:oid. However this filter type
is usually disabled as it is known to consume large amounts of CPU time
even without this bug.

The WM_MATCH changed in the first hunk of this patch comes from the
original implementation imported from rsync in 5230f605e1 (Import
wildmatch from rsync, 2012-10-15). Compared to the others converted here
it is fairly harmless as it only triggers at the end of the pattern and
so will only cause a single unnecessary backtrack. The others introduced
by 6f1a31f0aa (wildmatch: advance faster in &lt;asterisk&gt; + &lt;literal&gt;
patterns, 2013-01-01) and 46983441ae (wildmatch: make a special case for
"*/" with FNM_PATHNAME, 2013-01-01) are more pernicious and will cause
exponential behavior.

A new test is added to protect against future regressions.

[1] https://research.swtch.com/glob

Helped-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hex.h: move some hex-related declarations from cache.h</title>
<updated>2023-02-24T01:25:28Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-02-24T00:09:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b73ecb48114926d063d7ab96943bafcc0ae913b6'/>
<id>urn:sha1:b73ecb48114926d063d7ab96943bafcc0ae913b6</id>
<content type='text'>
hex.c contains code for hex-related functions, but for some reason these
functions were declared in the catch-all cache.h.  Move the function
declarations into a hex.h header instead.

This also allows us to remove includes of cache.h from a few C files.
For now, we make cache.h include hex.h, so that it is easier to review
the direct changes being made by this patch.  In the next patch, we will
remove that, and add the necessary direct '#include "hex.h"' in the
hundreds of C files that need it.

Note that reviewing the header changes in this commit might be
simplified via
    git log --no-walk -p --color-moved $COMMIT -- '*.h'`
In particular, it highlights the simple movement of code in .h files
rather nicely.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-wide: apply equals-null.cocci</title>
<updated>2022-05-02T16:50:37Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-05-02T16:50:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=afe8a9070bc62db9cfde1e30147178c40d391d93'/>
<id>urn:sha1:afe8a9070bc62db9cfde1e30147178c40d391d93</id>
<content type='text'>
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode</title>
<updated>2018-10-29T04:19:22Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-10-27T08:48:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e5bbe09e88545cd1a3bcf2b157f020f92e0b5def'/>
<id>urn:sha1:e5bbe09e88545cd1a3bcf2b157f020f92e0b5def</id>
<content type='text'>
In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**'
can but only in three patterns:

- '**/' matches zero or more leading directories
- '/**/' matches zero or more directories in between
- '/**' matches zero or more trailing directories/files

When '**' is present but not in one of these patterns, the current
behavior is consider the pattern invalid and stop matching. In other
words, 'foo**bar' never matches anything, whatever you throw at it.

This behavior is arguably a bit confusing partly because we can't
really tell the user their pattern is invalid so that they can fix
it. So instead, tolerate it and make '**' act like two regular '*'s
(which is essentially the same as a single asterisk). This behavior
seems more predictable.

Noticed-by: dana &lt;dana@dana.is&gt;
Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>wildmatch: remove unused wildopts parameter</title>
<updated>2017-06-24T01:27:07Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2017-06-22T21:38:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=55d3426929d4d8c3dec402cabe6fb1bf27d6abad'/>
<id>urn:sha1:55d3426929d4d8c3dec402cabe6fb1bf27d6abad</id>
<content type='text'>
Remove the unused wildopts placeholder struct from being passed to all
wildmatch() invocations, or rather remove all the boilerplate NULL
parameters.

This parameter was added back in commit 9b3497cab9 ("wildmatch: rename
constants and update prototype", 2013-01-01) as a placeholder for
future use. Over 4 years later nothing has made use of it, let's just
remove it. It can be added in the future if we find some reason to
start using such a parameter.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>typofix: assorted typofixes in comments, documentation and messages</title>
<updated>2016-05-06T20:16:37Z</updated>
<author>
<name>Li Peng</name>
<email>lip@dtdream.com</email>
</author>
<published>2016-05-06T12:36:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=832c0e5e63a0f61c3788847d4a7abb82d9e86ef4'/>
<id>urn:sha1:832c0e5e63a0f61c3788847d4a7abb82d9e86ef4</id>
<content type='text'>
Many instances of duplicate words (e.g. "the the path") and
a few typoes are fixed, originally in multiple patches.

    wildmatch: fix duplicate words of "the"
    t: fix duplicate words of "output"
    transport-helper: fix duplicate words of "read"
    Git.pm: fix duplicate words of "return"
    path: fix duplicate words of "look"
    pack-protocol.txt: fix duplicate words of "the"
    precompose-utf8: fix typo of "sequences"
    split-index: fix typo
    worktree.c: fix typo
    remote-ext: fix typo
    utf8: fix duplicate words of "the"
    git-cvsserver: fix duplicate words

Signed-off-by: Li Peng &lt;lip@dtdream.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
