<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/chainlint, branch master</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=master</id>
<link rel='self' href='https://git.shady.money/git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-09-10T17:01:40Z</updated>
<entry>
<title>chainlint: make error messages self-explanatory</title>
<updated>2024-09-10T17:01:40Z</updated>
<author>
<name>Eric Sunshine</name>
<email>sunshine@sunshineco.com</email>
</author>
<published>2024-09-10T04:10:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e44f15ba3ee873b5df5e8e5d8cc018df288472ef'/>
<id>urn:sha1:e44f15ba3ee873b5df5e8e5d8cc018df288472ef</id>
<content type='text'>
The annotations emitted by chainlint to indicate detected problems are
overly terse, so much so that developers new to the project -- those who
should most benefit from the linting -- may find them baffling. For
instance, although the author of chainlint and seasoned Git developers
may understand that "?!AMP?!" is an abbreviation of "ampersand" and
indicates a break in the &amp;&amp;-chain, this may not be obvious to newcomers.

The "?!LOOP?!" case is particularly serious because that terse single
word does nothing to convey that the loop body should end with
"|| return 1" (or "|| exit 1" in a subshell) to ensure that a failing
command in the body aborts the loop immediately. Moreover, unlike
&amp;&amp;-chaining which is ubiquitous in Git tests, the "|| return 1" idiom is
relatively infrequent, thus may be harder for a newcomer to discover by
consulting nearby code.

Address these shortcomings by emitting human-readable messages which
both explain the problem and give a strong hint about how to correct it.

Signed-off-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>chainlint.pl: add tests for test body in heredoc</title>
<updated>2024-07-10T17:14:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-07-10T08:39:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0c7d630220ee4df65db653300d236ac9f65be0b3'/>
<id>urn:sha1:0c7d630220ee4df65db653300d236ac9f65be0b3</id>
<content type='text'>
The chainlint.pl script recently learned about the upcoming:

  test_expect_success 'some test' - &lt;&lt;\EOT
	TEST_BODY
  EOT

syntax, where TEST_BODY should be checked in the usual way. Let's make
sure this works by adding a few tests. The "here-doc-body" file tests
the basic syntax, including an embedded here-doc which we should still
be able to recognize.

Likewise the "here-doc-body-indent" checks the same thing, but using the
"&lt;&lt;-" operator. We wouldn't expect this to be used normally, but we
would not want to accidentally miss a body that uses it. The
"pathological" variant checks the opposite: we don't get confused by an
indented tag within the here-doc body.

The "here-doc-double" tests the handling of two here-doc tags on the
same line. This is not something we'd expect anybody to do in practice,
but the code was written defensively to handle this, so let's make sure
it works.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>chainlint.pl: check line numbers in expected output</title>
<updated>2024-07-10T17:14:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-07-10T08:37:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=03763e68fb7fc4b37dd2d184dde501618e6c171d'/>
<id>urn:sha1:03763e68fb7fc4b37dd2d184dde501618e6c171d</id>
<content type='text'>
While working on chainlint.pl recently, we introduced some bugs that
showed incorrect line numbers in the output. But it was hard to notice,
since we sanitize the output by removing all of the line numbers! It
would be nice to retain these so we can catch any regressions.

The main reason we sanitize is for maintainability: we concatenate all
of the test snippets into a single file, so it's hard for each ".expect"
file to know at which offset its test input will be found. We can handle
that by storing the per-test line numbers in the ".expect" files, and
then dynamically offsetting them as we build the concatenated test and
expect files together.

The changes to the ".expect" files look like tedious boilerplate, but it
actually makes adding new tests easier. You can now just run:

  perl chainlint.pl chainlint/foo.test |
  tail -n +2 &gt;chainlint/foo.expect

to save the output of the script minus the comment headers (after
checking that it is correct, of course). Whereas before you had to strip
the line numbers. The conversions here were done mechanically using
something like the script above, and then spot-checked manually.

It would be possible to do all of this in shell via the Makefile, but it
gets a bit complicated (and requires a lot of extra processes). Instead,
I've written a short perl script that generates the concatenated files
(we already depend on perl, since chainlint.pl uses it). Incidentally,
this improves a few other things:

  - we incorrectly used $(CHAINLINTTMP_SQ) inside a double-quoted
    string. So if your test directory required quoting, like:

       make "TEST_OUTPUT_DIRECTORY=/tmp/h'orrible"

    we'd fail the chainlint tests.

  - the shell in the Makefile didn't handle &amp;&amp;-chaining correctly in its
    loops (though in practice the "sed" and "cat" invocations are not
    likely to fail).

  - likewise, the sed invocation to strip numbers was hiding the exit
    code of chainlint.pl itself. In practice this isn't a big deal;
    since there are linter violations in the test files, we expect it to
    exit non-zero. But we could later use exit codes to distinguish
    serious errors from expected ones.

  - we now use a constant number of processes, instead of scaling with
    the number of test scripts. So it should be a little faster (on my
    machine, "make check-chainlint" goes from 133ms to 73ms).

There are some alternatives to this approach, but I think this is still
a good intermediate step:

  1. We could invoke chainlint.pl individually on each test file, and
     compare it to the expected output (and possibly using "make" to
     avoid repeating already-done checks). This is a much bigger change
     (and we'd have to figure out what to do with the "# LINT" lines in
     the inputs). But in this case we'd still want the "expect" files to
     be annotated with line numbers. So most of what's in this patch
     would be needed anyway.

  2. Likewise, we could run a single chainlint.pl and feed it all of the
     scripts (with "--jobs=1" to get deterministic output). But we'd
     still need to annotate the scripts as we did here, and we'd still
     need to either assemble the "expect" file, or break apart the
     script output to compare to each individual ".expect" file.

So we may pursue those in the long run, but this patch gives us more
robust tests without too much extra work or moving in a useless
direction.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>chainlint.pl: add test_expect_success call to test snippets</title>
<updated>2024-07-10T17:14:21Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-07-10T08:34:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a5e450144dbc85b5cd7b0e01c6aa7fd30f117ee2'/>
<id>urn:sha1:a5e450144dbc85b5cd7b0e01c6aa7fd30f117ee2</id>
<content type='text'>
The chainlint tests are a series of individual files, each holding a
test body. The "make check-chainlint" target assembles them into a
single file, adding a "test_expect_success" function call around each.
Let's instead include that function call in the files themselves. This
is a little more boilerplate, but has several advantages:

  1. You can now run chainlint manually on snippets with just "perl
     chainlint.perl chainlint/foo.test". This can make developing and
     debugging a little easier.

  2. Many of the tests implicitly relied on the syntax of the lines
     added by the Makefile (in particular the use of single-quotes).
     This assumption is much easier to see when the single-quotes are
     alongside the test body.

  3. We had no way to test how the chainlint program handled
     various test_expect_success lines themselves. Now we'll be able to
     check variations.

The change to the .test files was done mechanically, using the same
test names they would have been assigned by the Makefile (this is
important to match the expected output). The Makefile has the minimal
change to drop the extra lines; there are more cleanups possible but a
future patch in this series will rewrite this substantially anyway.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tests: adjust whitespace in chainlint expectations</title>
<updated>2023-12-15T16:36:14Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2023-12-15T06:42:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=647b5e09987ac74e158d66ac2fd0036f2c9c99fe'/>
<id>urn:sha1:647b5e09987ac74e158d66ac2fd0036f2c9c99fe</id>
<content type='text'>
The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.

The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
`diff -w -u`.

To accomodate for cases where the host system has no Git installation we
use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand. It will refuse to
run and thus cause the checks to fail.

Instead of improving the detection logic, fix our ".expect" files so
that we do not need any post-processing at all anymore. This allows us
to drop the `-w` flag when diffing so that we can always use diff(1)
now.

Note that we keep some of the post-processing of `chainlint.pl` output
intact to strip leading line numbers generated by the script. Having
these would cause a rippling effect whenever we add a new test that
sorts into the middle of existing tests and would require us to
renumerate all subsequent lines, which seems rather pointless.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Reviewed-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tests: diagnose unclosed here-doc in chainlint.pl</title>
<updated>2023-03-30T20:07:29Z</updated>
<author>
<name>Eric Sunshine</name>
<email>sunshine@sunshineco.com</email>
</author>
<published>2023-03-30T19:30:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2b61c8dc8843319d09f1485fbcb3b1dc4aecb36d'/>
<id>urn:sha1:2b61c8dc8843319d09f1485fbcb3b1dc4aecb36d</id>
<content type='text'>
An unclosed here-doc in a test is a problem, because it silently gobbles
up any remaining commands. Since 99a64e4b73c (tests: lint for run-away
here-doc, 2017-03-22) we detect this by piggy-backing on the internal
chainlint checker in test-lib.sh.

However, it would be nice to detect it in chainlint.pl, for a few
reasons:

  - the output from chainlint.pl is much nicer; it can show the exact
    spot of the error, rather than a vague "somewhere in this test you
    broke the &amp;&amp;-chain or had a bad here-doc" message.

  - the implementation in test-lib.sh runs for each test snippet. And
    since it requires a subshell, the extra cost is small but not zero.
    If chainlint.pl can reliably find the problem, we can optimize the
    test-lib.sh code.

The chainlint.pl code never intended to find here-doc problems. But
since it has to parse them anyway (to avoid reporting problems inside
here-docs), most of what we need is already there. We can detect the
problem when we fail to find the missing end-tag in swallow_heredocs().
The extra change in scan_heredoc_tag() stores the location of the start
of the here-doc, which lets us mark it as the source of the error in the
output (see the new tests for examples).

[jk: added commit message and tests]

Signed-off-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>chainlint: annotate original test definition rather than token stream</title>
<updated>2022-11-08T20:10:49Z</updated>
<author>
<name>Eric Sunshine</name>
<email>sunshine@sunshineco.com</email>
</author>
<published>2022-11-08T19:08:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=73c768dae9ea4838736693965b25ba34e941ac88'/>
<id>urn:sha1:73c768dae9ea4838736693965b25ba34e941ac88</id>
<content type='text'>
When chainlint detects problems in a test, such as a broken &amp;&amp;-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.

Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:

    msg="total: $(cd data; wc -w *.txt) words"

requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&amp;&amp;-chain between the `cd` and `wc` commands, returning the token stream:

    cd data ; ?!AMP?! wc -w *.txt

However, the parent parse context will see everything inside the
double-quotes as a single string token:

    "total: $(cd data ; ?!AMP?! wc -w *.txt) words"

losing whatever positional information was attached to the ";" token
where the problem was detected.

One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:

    "total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"

and then extract the positional information when annotating the original
test definition.

However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.

Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:

    * indentation (and whitespace, in general)

    * comments

    * here-doc bodies

    * here-doc tag quoting (i.e. "\EOF")

    * line-splices (i.e. "\" at the end of a line)

Signed-off-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>t/chainlint: add more chainlint.pl self-tests</title>
<updated>2022-09-01T17:07:41Z</updated>
<author>
<name>Eric Sunshine</name>
<email>sunshine@sunshineco.com</email>
</author>
<published>2022-09-01T00:29:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=56066523ed3ebd16b455e99ce954ec19b6ac5ada'/>
<id>urn:sha1:56066523ed3ebd16b455e99ce954ec19b6ac5ada</id>
<content type='text'>
During the development of chainlint.pl, numerous new self-tests were
created to verify correct functioning beyond the checks already
represented by the existing self-tests. The new checks fall into several
categories:

* behavior of the lexical analyzer for complex cases, such as line
  splicing, token pasting, entering and exiting string contexts inside
  and outside of test script bodies; for instance:

    test_expect_success 'title' '
      x=$(echo "something" |
        sed -e '\''s/\\/\\\\/g'\'' -e '\''s/[[/.*^$]/\\&amp;/g'\''
    '

* behavior of the parser for all compound grammatical constructs, such
  as `if...fi`, `case...esac`, `while...done`, `{...}`, etc., and for
  other legal shell grammatical constructs not covered by existing
  chainlint.sed self-tests, as well as complex cases, such as:

    OUT=$( ((large_git 1&gt;&amp;3) | :) 3&gt;&amp;1 ) &amp;&amp;

* detection of problems, such as &amp;&amp;-chain breakage, from top-level to
  any depth since the existing self-tests do not cover any top-level
  context and only cover subshells one level deep due to limitations of
  chainlint.sed

* address blind spots in chainlint.sed (such as not detecting a broken
  &amp;&amp;-chain on a one-line for-loop in a subshell[1]) which chainlint.pl
  correctly detects

* real-world cases which tripped up chainlint.pl during its development

[1]: https://lore.kernel.org/git/dce35a47012fecc6edc11c68e91dbb485c5bc36f.1661663880.git.gitgitgadget@gmail.com/

Signed-off-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>chainlint.pl: allow `|| echo` to signal failure upstream of a pipe</title>
<updated>2022-09-01T17:07:41Z</updated>
<author>
<name>Eric Sunshine</name>
<email>sunshine@sunshineco.com</email>
</author>
<published>2022-09-01T00:29:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ae0c55abf8217bb06422f9eafcd7a30b2c8f9e8b'/>
<id>urn:sha1:ae0c55abf8217bb06422f9eafcd7a30b2c8f9e8b</id>
<content type='text'>
The use of `|| return` (or `|| exit`) to signal failure within a loop
isn't effective when the loop is upstream of a pipe since the pipe
swallows all upstream exit codes and returns only the exit code of the
final command in the pipeline.

To work around this limitation, tests may adopt an alternative strategy
of signaling failure by emitting text which would never be emitted in
the non-failing case. For instance:

    while condition
    do
        command1 &amp;&amp;
        command2 ||
        echo "impossible text"
    done |
    sort &gt;actual &amp;&amp;

Such usage indicates deliberate thought about failure cases by the test
author, thus flagging them as missing `|| return` (or `|| exit`) is not
helpful. Therefore, take this case into consideration when checking for
explicit loop termination.

Signed-off-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>chainlint.pl: complain about loops lacking explicit failure handling</title>
<updated>2022-09-01T17:07:41Z</updated>
<author>
<name>Eric Sunshine</name>
<email>sunshine@sunshineco.com</email>
</author>
<published>2022-09-01T00:29:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fd4094c3cad7c62adb0b7080e0dca37f66bf0c6e'/>
<id>urn:sha1:fd4094c3cad7c62adb0b7080e0dca37f66bf0c6e</id>
<content type='text'>
Shell `for` and `while` loops do not terminate automatically just
because a command fails within the loop body. Instead, the loop
continues to iterate and eventually returns the exit status of the final
command of the final iteration, which may not be the command which
failed, thus it is possible for failures to go undetected. Consequently,
it is important for test authors to explicitly handle failure within the
loop body by terminating the loop manually upon failure. This can be
done by returning a non-zero exit code from within the loop body
(i.e. `|| return 1`) or exiting (i.e. `|| exit 1`) if the loop is within
a subshell, or by manually checking `$?` and taking some appropriate
action. Therefore, add logic to detect and complain about loops which
lack explicit `return` or `exit`, or `$?` check.

Signed-off-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
