git/trailer.c, branch v2.45.2

Merge branch 'la/format-trailer-info'

2024-04-23T18:52:39Z

The code to format trailers have been cleaned up. * la/format-trailer-info: trailer: finish formatting unification trailer: begin formatting unification format_trailer_info(): append newline for non-trailer lines format_trailer_info(): drop redundant unfold_value() format_trailer_info(): use trailer_item objects

Merge branch 'jk/core-comment-string'

2024-04-05T17:49:49Z

core.commentChar used to be limited to a single byte, but has been updated to allow an arbitrary multi-byte sequence. * jk/core-comment-string: config: add core.commentString config: allow multi-byte core.commentChar environment: drop comment_line_char compatibility macro wt-status: drop custom comment-char stringification sequencer: handle multi-byte comment characters when writing todo list find multi-byte comment chars in unterminated buffers find multi-byte comment chars in NUL-terminated strings prefer comment_line_str to comment_line_char for printing strbuf: accept a comment string for strbuf_add_commented_lines() strbuf: accept a comment string for strbuf_commented_addf() strbuf: accept a comment string for strbuf_stripspace() environment: store comment_line_char as a string strbuf: avoid shadowing global comment_line_char name commit: refactor base-case of adjust_comment_line_char() strbuf: avoid static variables in strbuf_add_commented_lines() strbuf: simplify comment-handling in add_lines() helper config: forbid newline as core.commentChar

trailer: finish formatting unification

2024-03-15T17:10:25Z

Rename format_trailer_info() to format_trailers(). Finally, both interpret-trailers and format_trailers_from_commit() can call "format_trailers()"! Update the comment in to remove the (now obsolete) caveats about format_trailers_from_commit(). Those caveats come from a388b10fc1 (pretty: move trailer formatting to trailer.c, 2017-08-15) where it says: pretty: move trailer formatting to trailer.c The next commit will add many features to the %(trailer) placeholder in pretty.c. We'll need to access some internal functions of trailer.c for that, so our options are either: 1. expose those functions publicly or 2. make an entry point into trailer.c to do the formatting Doing (2) ends up exposing less surface area, though do note that caveats in the docstring of the new function. which suggests format_trailers_from_commit() started out from pretty.c and did not have access to all of the trailer implementation internals, and was never intended to replace (unify) the formatting machinery in trailer.c. The refactors leading up to this commit (as well as additional refactors that will follow) expose additional functions publicly, and is therefore choosing option (1) as described in a388b10fc1. Signed-off-by: Linus Arver Signed-off-by: Junio C Hamano

trailer: begin formatting unification

2024-03-15T17:10:25Z

Now that the preparatory refactors are over, we can replace the call to format_trailers() in interpret-trailers with format_trailer_info(). This unifies the trailer formatting machinery In order to avoid breakages in t7502 and t7513, we have to steal the features present in format_trailers(). Namely, we have to teach format_trailer_info() as follows: (1) make it aware of opts->trim_empty, and (2) make it avoid hardcoding ": " as the separator and space (which can result in double-printing these characters). For (2), make it only print the separator and space if we cannot find any recognized separator somewhere in the key (yes, keys may have a trailing separator in it --- we will eventually fix this design but not now). Do so by copying the code out of print_tok_val(), and deleting the same function. Helped-by: Junio C Hamano Helped-by: Christian Couder Signed-off-by: Linus Arver Signed-off-by: Junio C Hamano

format_trailer_info(): append newline for non-trailer lines

2024-03-15T17:10:25Z

This wraps up the preparatory refactors to unify the trailer formatters. Two patches ago we made format_trailer_info() use trailer_item objects instead of the "trailers" string array. The strings in the array include trailing newlines, because the string array is split up with trailer_lines = strbuf_split_buf(str + trailer_block_start, end_of_log_message - trailer_block_start, '\n', 0); in trailer_info_get() and strbuf_split_buf() includes the terminator (in this case the newline character '\n') for each split-up substring. And before we made the transition to use trailer_item objects for it, format_trailer_info() called parse_trailer() (which trims newlines) for trailer lines but did _not_ call parse_trailer() for non-trailer lines. So for trailer lines it had to add back the trimmed newline like this if (!opts->separator) strbuf_addch(out, '\n'); But for non-trailer lines it didn't have to add back the newline because it could just reuse same string in the "trailers" string array (which again, already included the trailing newline). Now that format_trailer_info() uses trailer_item objects for all cases, it can't rely on "trailers" string array anymore. And so it must be taught to add a newline back when printing non-trailer lines, just like it already does for trailer lines. Do so now. The test suite can pass again without the need to hide failures with *_failure, so flip the affected test cases back to *_success. Now, format_trailer_info() is in better shape to supersede format_trailers(), which we'll do in the next commit. Signed-off-by: Linus Arver Signed-off-by: Junio C Hamano

format_trailer_info(): drop redundant unfold_value()

2024-03-15T17:10:24Z

This is another preparatory refactor to unify the trailer formatters. In the last patch we made format_trailer_info() use trailer_item objects instead of the "trailers" string array. This means that the call to unfold_value() here is redundant because the trailer_item objects are already unfolded in parse_trailers() which is a dependency of our caller, format_trailers_from_commit(). Remove the redundant call. Signed-off-by: Linus Arver Signed-off-by: Junio C Hamano

format_trailer_info(): use trailer_item objects

2024-03-15T17:10:24Z

This is another preparatory refactor to unify the trailer formatters. Make format_trailer_info() operate on trailer_item objects, not the raw string array. We will continue to make improvements, culminating in the renaming of format_trailer_info() to format_trailers(), at which point the unification of these formatters will be complete. In order to avoid breaking t4205 and t6300, flip *_success to *_failure in the affected test cases. Add a temporary "test_trailer_option_expect_failure" wrapper which we will use along with "test_expect_failure" in the next commit to avoid breaking tests. When the dust settles with the refactors a few more commits later, we will drop the use of *_failure to make the tests truly pass again. When the preparatory refactors are complete, we'll be able to drop the use of *_failure that we introduce here. Signed-off-by: Linus Arver Signed-off-by: Junio C Hamano

Merge branch 'la/trailer-api'

2024-03-14T21:05:24Z

Trailer API updates. Acked-by: Christian Couder cf. * la/trailer-api: format_trailers_from_commit(): indirectly call trailer_info_get() format_trailer_info(): move "fast path" to caller format_trailers(): use strbuf instead of FILE trailer_info_get(): reorder parameters trailer: move interpret_trailers() to interpret-trailers.c trailer: reorder format_trailers_from_commit() parameters trailer: rename functions to use 'trailer' shortlog: add test for de-duplicating folded trailers trailer: free trailer_info _after_ all related usage

find multi-byte comment chars in unterminated buffers

2024-03-12T20:28:10Z

As with the previous patch, we need to swap out single-byte matching for something like starts_with() to match all bytes of a multi-byte comment character. But for cases where the buffer is not NUL-terminated (and we instead have an explicit size or end pointer), it's not safe to use starts_with(), as it might walk off the end of the buffer. Let's introduce a new starts_with_mem() that does the same thing but also accepts the length of the "haystack" str and makes sure not to walk past it. Note that in most cases the existing code did not need a length check at all, since it was written in a way that knew we had at least one byte available (and that was all we checked). So I had to read each one to find the appropriate bounds. The one exception is sequencer.c's add_commented_lines(), where we can actually get rid of the length check. Just like starts_with(), our starts_with_mem() handles an empty haystack variable by not matching (assuming a non-empty prefix). A few notes on the implementation of starts_with_mem(): - it would be equally correct to take an "end" pointer (and indeed, many of the callers have this and have to subtract to come up with the length). I think taking a ptr/size combo is a more usual interface for our codebase, though, and has the added benefit that the function signature makes it harder to mix up the three parameters. - we could obviously build starts_with() on top of this by passing strlen(str) as the length. But it's possible that starts_with() is a relatively hot code path, and it should not pay that penalty (it can generally return an answer proportional to the size of the prefix, not the whole string). - it naively feels like xstrncmpz() should be able to do the same thing, but that's not quite true. If you pass the length of the haystack buffer, then strncmp() finds that a shorter prefix string is "less than" than the haystack, even if the haystack starts with the prefix. If you pass the length of the prefix, then you risk reading past the end of the haystack if it is shorter than the prefix. So I think we really do need a new function. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

find multi-byte comment chars in NUL-terminated strings

2024-03-12T20:28:10Z

Several parts of the code need to identify lines that begin with the comment character, and do so with a simple byte equality check. As part of the transition to handling multi-byte characters, we need to match all of the bytes. For cases where we are looking in a NUL-terminated string, we can just use starts_with(), which checks all of the characters in comment_line_str. Note that we can drop the "line.len" check in wt-status.c's read_rebase_todolist(). The starts_with() function handles the case of an empty haystack buffer (it will always return false for a non-empty prefix). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano