<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/string-list.h, branch v2.50.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.50.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.50.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-04-24T23:01:28Z</updated>
<entry>
<title>string-list: introduce `string_list_setlen()`</title>
<updated>2023-04-24T23:01:28Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-04-24T22:20:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=492ba81346cc45322d0c26bc927b01a34becf304'/>
<id>urn:sha1:492ba81346cc45322d0c26bc927b01a34becf304</id>
<content type='text'>
It is sometimes useful to reduce the size of a `string_list`'s list of
items without having to re-allocate them. For example, doing the
following:

    struct strbuf buf = STRBUF_INIT;
    struct string_list parts = STRING_LIST_INIT_NO_DUP;
    while (strbuf_getline(&amp;buf, stdin) != EOF) {
      parts.nr = 0;
      string_list_split_in_place(&amp;parts, buf.buf, ":", -1);
      /* ... */
    }
    string_list_clear(&amp;parts, 0);

is preferable over calling `string_list_clear()` on every iteration of
the loop. This is because `string_list_clear()` causes us free our
existing `items` array. This means that every time we call
`string_list_split_in_place()`, the string-list internals re-allocate
the same size array.

Since in the above example we do not care about the individual parts
after processing each line, it is much more efficient to pretend that
there aren't any elements in the `string_list` by setting `list-&gt;nr` to
0 while leaving the list of elements allocated as-is.

This allows `string_list_split_in_place()` to overwrite any existing
entries without needing to free and re-allocate them.

However, setting `list-&gt;nr` manually is not safe in all instances. There
are a couple of cases worth worrying about:

  - If the `string_list` is initialized with `strdup_strings`,
    truncating the list can lead to overwriting strings which are
    allocated elsewhere. If there aren't any other pointers to those
    strings other than the ones inside of the `items` array, they will
    become unreachable and leak.

    (We could ourselves free the truncated items between
    string_list-&gt;items[nr] and `list-&gt;nr`, but no present or future
    callers would benefit from this additional complexity).

  - If the given `nr` is larger than the current value of `list-&gt;nr`,
    we'll trick the `string_list` into a state where it thinks there are
    more items allocated than there actually are, which can lead to
    undefined behavior if we try to read or write those entries.

Guard against both of these by introducing a helper function which
guards assignment of `list-&gt;nr` against each of the above conditions.

Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>string-list: multi-delimiter `string_list_split_in_place()`</title>
<updated>2023-04-24T23:01:28Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-04-24T22:20:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=52acddf36c8cb3778ab2098a0d95cc2e375a4069'/>
<id>urn:sha1:52acddf36c8cb3778ab2098a0d95cc2e375a4069</id>
<content type='text'>
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.

This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:

    string_list_split_in_place(&amp;xs, "foo:;:bar:;:baz", ":;", -1)

would yield a string list of:

    ["foo", "", "", "bar", "", "", "baz"]

Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.

To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>string-list: document iterator behavior on NULL input</title>
<updated>2022-09-27T16:32:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-09-27T13:57:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d151f0cce7fca1fc156a9ea1dc98c59e1be512c9'/>
<id>urn:sha1:d151f0cce7fca1fc156a9ea1dc98c59e1be512c9</id>
<content type='text'>
The for_each_string_list_item() macro takes a string_list and
automatically constructs a for loop to iterate over its contents. This
macro will segfault if the list is non-NULL.

We cannot change the macro to be careful around NULL values because
there are many callers that use the address of a local variable, which
will never be NULL and will cause compile errors with -Werror=address.

For now, leave a documentation comment to try to avoid mistakes in the
future where a caller does not check for a NULL list.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>string-list API: change "nr" and "alloc" to "size_t"</title>
<updated>2022-03-07T20:02:04Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-03-07T15:27:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=99d60545f87445d7050999b826fc4cd49e69376c'/>
<id>urn:sha1:99d60545f87445d7050999b826fc4cd49e69376c</id>
<content type='text'>
Change the "nr" and "alloc" members of "struct string_list" to use
"size_t" instead of "nr". On some platforms the size of an "unsigned
int" will be smaller than a "size_t", e.g. a 32 bit unsigned v.s. 64
bit unsigned. As "struct string_list" is a generic API we use in a lot
of places this might cause overflows.

As one example: code in "refs.c" keeps track of the number of refs
with a "size_t", and auxiliary code in builtin/remote.c in
get_ref_states() appends those to a "struct string_list".

While we're at it split the "nr" and "alloc" in string-list.h across
two lines, which is the case for most such struct member
declarations (e.g. in "strbuf.h" and "strvec.h").

Changing e.g. "int i" to "size_t i" in run_and_feed_hook() isn't
strictly necessary, and there are a lot more cases where we'll use a
local "int", "unsigned int" etc. variable derived from the "nr" in the
"struct string_list". But in that case as well as
add_wrapped_shortlog_msg() in builtin/shortlog.c we need to adjust the
printf format referring to "nr" anyway, so let's also change the other
variables referring to it.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>string-list.[ch]: remove string_list_init() compatibility function</title>
<updated>2021-09-28T21:43:38Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-09-28T12:49:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=abf897bacd2d36b9dbd07c70b4a2f97a084704ee'/>
<id>urn:sha1:abf897bacd2d36b9dbd07c70b4a2f97a084704ee</id>
<content type='text'>
Remove this function left over to accommodate in-flight changes, see
770fedaf9fb (string-list.[ch]: add a string_list_init_{nodup,dup}(),
2021-07-01) for the recent change to add
"string_list_init_{nodup,dup}()" initializers.

There was only one user of the API left in remote-curl.c. I don't know
why I didn't include this change to remote-curl.c in
bc40dfb10a0 (string-list.h users: change to use *_{nodup,dup}(),
2021-07-01), perhaps I just missed it.

In any case, let's change that one user to use the new API, as of
writing this there are no in-flight changes that use, so this seems
like a good time to drop this before we get any new users of this
compatibility API.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>string-list.[ch]: add a string_list_init_{nodup,dup}()</title>
<updated>2021-07-01T19:32:22Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-07-01T10:51:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=770fedaf9fb156bd8c18da41770eac0cb63fba63'/>
<id>urn:sha1:770fedaf9fb156bd8c18da41770eac0cb63fba63</id>
<content type='text'>
In order to use the new "memcpy() a 'blank' struct on the stack"
pattern for string_list_init(), and to make the macro initialization
consistent with the function initialization introduce two new
string_list_init_{nodup,dup}() functions. These are like the old
string_list_init() when called with a false and true second argument,
respectively.

I think this not only makes things more consistent, but also easier to
read. I often had to lookup what the ", 0)" or ", 1)" in these
invocations meant, now it's right there in the function name, and
corresponds to the macros.

A subsequent commit will convert existing API users to this pattern,
but as this is a very common API let's leave a compatibility function
in place for later removal. This intermediate state also proves that
the compatibility function works.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>*.h: move some *_INIT to designated initializers</title>
<updated>2021-07-01T19:31:45Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-07-01T10:51:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3d97ea479fdcb88671105e2f2d04064bab110bd5'/>
<id>urn:sha1:3d97ea479fdcb88671105e2f2d04064bab110bd5</id>
<content type='text'>
Move *_INIT macros I'll use in a subsequent commits to designated
initializers. This isn't required for those follow-up changes, but
since next commits will change things in this area, let's use the
modern pattern over the old one while we're at it.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/string-list-can-be-custom-sorted'</title>
<updated>2020-01-22T23:07:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-01-22T23:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1f10b84e43cbcf50715b6effd858530e14626952'/>
<id>urn:sha1:1f10b84e43cbcf50715b6effd858530e14626952</id>
<content type='text'>
API-doc update.

* en/string-list-can-be-custom-sorted:
  string-list: note in docs that callers can specify sorting function
</content>
</entry>
<entry>
<title>string-list: note in docs that callers can specify sorting function</title>
<updated>2020-01-07T17:12:36Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2020-01-07T15:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=065027ee1a7b3ecc45b6d331983d38642835c451'/>
<id>urn:sha1:065027ee1a7b3ecc45b6d331983d38642835c451</id>
<content type='text'>
In commit 1959bf6430 (string_list API: document what "sorted" means,
2012-09-17), Documentation/technical/api-string-list.txt was updated to
specify that strcmp() was used for sorting.  In commit 8dd5afc926
(string-list: allow case-insensitive string list, 2013-01-07), a cmp
member was added to struct string_list to allow callers to specify an
alternative comparison function, but api-string-list.txt was not
updated.  In commit 4f665f2cf3 (string-list.h: move documentation from
Documentation/api/ into header, 2017-09-26), the now out-dated
api-string-list.txt documentation was moved into string-list.h.  Update
the docs to reflect the configurability of sorting.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Fix spelling errors in code comments</title>
<updated>2019-11-10T07:00:54Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-11-05T17:07:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=15beaaa3d1f6b555900446deb5e376b4f806d734'/>
<id>urn:sha1:15beaaa3d1f6b555900446deb5e376b4f806d734</id>
<content type='text'>
Reported-by: Jens Schleusener &lt;Jens.Schleusener@fossies.org&gt;
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
