<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/refs.c, branch v2.40.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.40.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.40.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-02-27T18:08:57Z</updated>
<entry>
<title>Merge branch 'jk/shorten-unambiguous-ref-wo-sscanf'</title>
<updated>2023-02-27T18:08:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-02-27T18:08:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dda83e69d08ce37c64157007e050a142660fd15c'/>
<id>urn:sha1:dda83e69d08ce37c64157007e050a142660fd15c</id>
<content type='text'>
sscanf(3) used in "git symbolic-ref --short" implementation found
to be not working reliably on macOS in UTF-8 locales.  Rewrite the
code to avoid sscanf() altogether to work it around.

* jk/shorten-unambiguous-ref-wo-sscanf:
  shorten_unambiguous_ref(): avoid sscanf()
  shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant
  shorten_unambiguous_ref(): avoid integer truncation
</content>
</entry>
<entry>
<title>shorten_unambiguous_ref(): avoid sscanf()</title>
<updated>2023-02-15T16:53:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-02-15T15:16:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=613bef56b820cf7f24dc4b3ae65fc91826368185'/>
<id>urn:sha1:613bef56b820cf7f24dc4b3ae65fc91826368185</id>
<content type='text'>
To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just
"foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop
the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match
that against the refname, pulling the "%s" content into a separate
buffer.

This has a few downsides:

  - sscanf("%s") reportedly misbehaves on macOS with some input and
    locale combinations, returning a partial or garbled string. See
    this thread:

      https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/

  - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD"
    rule would never pull "origin" out of "refs/remotes/origin/HEAD".
    Instead it always produced "origin/HEAD", which is redundant with
    the "refs/remotes/%s" rule.

  - scanf in general is an error-prone interface. For example, scanning
    for "%s" will copy bytes into a destination string, which must have
    been correctly sized ahead of time to avoid a buffer overflow. In
    this case, the code is OK (the buffer is pessimistically sized to
    match the original string, which should give us a maximum). But in
    general, we do not want to encourage people to use scanf at all.

So instead, let's note that our lookup rules are not arbitrary format
strings, but all contain exactly one "%.*s" placeholder. We already rely
on this, both for lookup (we feed the lookup format along with exactly
one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s"
to "%s", and then insist that sscanf() finds exactly one result).

We can parse this manually by just matching the bytes that occur before
and after the "%.*s" placeholder. While we have a few extra lines of
parsing code, the result is arguably simpler, as can skip the
preprocessing step and its tricky memory management entirely.

The in-code comments should explain the parsing strategy, but there's
one subtle change here. The original code allocated a single buffer, and
then overwrote it in each loop iteration, since that's the only option
sscanf() gives us. But our parser can actually return a ptr/len combo
for the matched string, which is all we need (since we just feed it back
to the lookup rules with "%.*s"), and then copy it only when returning
to the caller.

There are a few new tests here, all using symbolic-ref (the code can be
triggered in many ways, but symrefs are convenient in that we don't need
to create a real ref, which avoids any complications from the filesystem
munging the name):

  - the first covers the real-world case which misbehaved on macOS.
    Setting LC_ALL is required to trigger the problem there (since
    otherwise our tests use LC_ALL=C), and hopefully is at worst simply
    ignored on other systems (and doesn't cause libc to complain, etc,
    on systems without that locale).

  - the second covers the "origin/HEAD" case as discussed above, which
    is now fixed

  - the remainder are for "weird" cases that work both before and after
    this patch, but would be easy to get wrong with off-by-one problems
    in the parsing (and came out of discussions and earlier iterations
    of the patch that did get them wrong).

  - absent here are tests of boring, expected-to-work cases like
    "refs/heads/foo", etc. Those are covered all over the test suite
    both explicitly (for-each-ref's refname:short) and implicitly (in
    the output of git-status, etc).

Reported-by: 孟子易 &lt;mengziyi540841@gmail.com&gt;
Helped-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant</title>
<updated>2023-02-15T16:53:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-02-15T15:16:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8f416f65c9b1a42d7e6bf747c7ac5cf6a49250f8'/>
<id>urn:sha1:8f416f65c9b1a42d7e6bf747c7ac5cf6a49250f8</id>
<content type='text'>
The ref_rev_parse_rules[] array is terminated with a NULL entry, and we
count it and store the result in the local nr_rules variable. But we
don't need to do so; since the array is a constant, we can compute its
size directly. The original code probably didn't do that because it was
written as part of for-each-ref, and saw the array only as a pointer. It
was migrated in 7c2b3029df (make get_short_ref a public function,
2009-04-07) and could have been updated then, but that subtlety was not
noticed.

We even have a constant that represents this value already, courtesy of
60650a48c0 (remote: make refspec follow the same disambiguation rule as
local refs, 2018-08-01), though again, nobody noticed at the time that
it could be used here, too.

The current count-up isn't a big deal, as we need to preprocess that
array anyway. But it will become more cumbersome as we refactor the
shortening code. So let's get rid of it and just use the constant
everywhere.

Note that there are two things here that aren't just simple text
replacements:

  1. We also use nr_rules to see if a previous call has initialized the
     static pre-processing variables. We can just use the scanf_fmts
     pointer to do the same thing, as it is non-NULL only after we've
     done that initialization.

  2. If nr_rules is zero after we've counted it up, we bail from the
     function. This code is unreachable, though, as the set of rules is
     hard-coded and non-empty. And that becomes even more apparent now
     that we are using the constant. So we can drop this conditional
     completely (and ironically, the code would have the same output if
     it _did_ trigger, as we'd simply skip the loop entirely and return
     the whole refname).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>shorten_unambiguous_ref(): avoid integer truncation</title>
<updated>2023-02-15T16:53:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-02-15T15:16:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dd5e4d39765f44492031f2d9319c24f0868bbe1a'/>
<id>urn:sha1:dd5e4d39765f44492031f2d9319c24f0868bbe1a</id>
<content type='text'>
We parse the shortened name "foo" out of the full refname
"refs/heads/foo", and then assign the result of strlen(short_name) to an
int, which may truncate or wrap to negative.

In practice, this should never happen, as it requires a 2GB refname. And
even somebody trying to do something malicious should at worst end up
with a confused answer (we use the size only to feed back as a
placeholder length to strbuf_addf() to see if there are any collisions
in the lookup rules).

And it may even be impossible to trigger this, as we parse the string
with sscanf(), and stdio formatting functions are not known for handling
large strings well. I didn't test, but I wouldn't be surprised if
sscanf() on many platforms simply reports no match here.

But even if it is not a problem in practice so far, it is worth fixing
for two reasons:

  1. We'll shortly be replacing the sscanf() call with a real parser
     which will handle arbitrary-sized strings.

  2. Assigning strlen() to an int is an anti-pattern that requires
     people to look twice when auditing for real overflow problems.

So we'll make this a size_t. Unfortunately we still have to cast to int
eventually for the strbuf_addf() call, but at least we can localize the
cast there, and check that it will be valid. I used our new cast helper
here, which will just bail completely. That should be OK, as anybody
with a 2GB refname is up to no good, but if we really wanted to, we
could detect it manually and just refuse to shorten the refname.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>ls-refs: use repository parameter to iterate refs</title>
<updated>2022-12-13T13:16:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-12-13T11:11:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=91e2ab1587d8ee18e3d2978f2b7bc250faf5df8f'/>
<id>urn:sha1:91e2ab1587d8ee18e3d2978f2b7bc250faf5df8f</id>
<content type='text'>
The ls_refs() function (for the v2 protocol command of the same name)
takes a repository parameter (like all v2 commands), but ignores it. It
should use it to access the refs.

This isn't a bug in practice, since we only call this function when
serving upload-pack from the main repository. But it's an awkward
gotcha, and it causes -Wunused-parameter to complain.

The main reason we don't use the repository parameter is that the ref
iteration interface we call doesn't have a "refs_" variant that takes a
ref_store. However we can easily add one. In fact, since there is only
one other caller (in ref-filter.c), there is no need to maintain the
non-repository wrapper; that caller can just use the_repository. It's
still a long way from consistently using a repository object, but it's
one small step in the right direction.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refs: get rid of global list of hidden refs</title>
<updated>2022-11-17T21:22:51Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-11-17T05:46:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9b67eb6fbeb9666640f34cccf401cfea22f7bd22'/>
<id>urn:sha1:9b67eb6fbeb9666640f34cccf401cfea22f7bd22</id>
<content type='text'>
We're about to add a new argument to git-rev-list(1) that allows it to
add all references that are visible when taking `transfer.hideRefs` et
al into account. This will require us to potentially parse multiple sets
of hidden refs, which is not easily possible right now as there is only
a single, global instance of the list of parsed hidden refs.

Refactor `parse_hide_refs_config()` and `ref_is_hidden()` so that both
take the list of hidden references as input and adjust callers to keep a
local list, instead. This allows us to easily use multiple hidden-ref
lists. Furthermore, it allows us to properly free this list before we
exit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>refs: fix memory leak when parsing hideRefs config</title>
<updated>2022-11-17T21:22:51Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-11-17T05:46:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5eeb9aa2086edc95f4f2c9cc844f60535f0a5ca4'/>
<id>urn:sha1:5eeb9aa2086edc95f4f2c9cc844f60535f0a5ca4</id>
<content type='text'>
When parsing the hideRefs configuration, we first duplicate the config
value so that we can modify it. We then subsequently append it to the
`hide_refs` string list, which is initialized with `strdup_strings`
enabled. As a consequence we again reallocate the string, but never
free the first duplicate and thus have a memory leak.

While we never clean up the static `hide_refs` variable anyway, this is
no excuse to make the leak worse by leaking every value twice. We are
also about to change the way this variable will be handled so that we do
indeed start to clean it up. So let's fix the memory leak by using the
`string_list_append_nodup()` so that we pass ownership of the allocated
string to `hide_refs`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>refs: unify parse_worktree_ref() and ref_type()</title>
<updated>2022-09-19T18:11:11Z</updated>
<author>
<name>Han-Wen Nienhuys</name>
<email>hanwen@google.com</email>
</author>
<published>2022-09-19T16:34:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=71e5473493612f74244e2fa7a257a868df98be53'/>
<id>urn:sha1:71e5473493612f74244e2fa7a257a868df98be53</id>
<content type='text'>
The logic to handle worktree refs (worktrees/NAME/REF and
main-worktree/REF) existed in two places:

* ref_type() in refs.c

* parse_worktree_ref() in worktree.c

Collapse this logic together in one function parse_worktree_ref():
this avoids having to cross-check the result of parse_worktree_ref()
and ref_type().

Introduce enum ref_worktree_type, which is slightly different from
enum ref_type. The latter is a misleading name (one would think that
'ref_type' would have the symref option).

Instead, enum ref_worktree_type only makes explicit how a refname
relates to a worktree. From this point of view, HEAD and
refs/bisect/abc are the same: they specify the current worktree
implicitly.

The files-backend must avoid packing refs/bisect/* and friends into
packed-refs, so expose is_per_worktree_ref() separately.

Signed-off-by: Han-Wen Nienhuys &lt;hanwen@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/unused-annotation'</title>
<updated>2022-09-14T19:56:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-09-14T19:56:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dd407f1c7c7cce148d7313537494d0bc049ccb0e'/>
<id>urn:sha1:dd407f1c7c7cce148d7313537494d0bc049ccb0e</id>
<content type='text'>
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.

* ab/unused-annotation:
  git-compat-util.h: use "deprecated" for UNUSED variables
  git-compat-util.h: use "UNUSED", not "UNUSED(var)"
</content>
</entry>
<entry>
<title>Merge branch 'jk/unused-annotation'</title>
<updated>2022-09-14T19:56:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-09-14T19:56:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a6b42ec0c6e2ef492b0ed6d1f1123dc7e724154e'/>
<id>urn:sha1:a6b42ec0c6e2ef492b0ed6d1f1123dc7e724154e</id>
<content type='text'>
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.

* jk/unused-annotation:
  is_path_owned_by_current_uid(): mark "report" parameter as unused
  run-command: mark unused async callback parameters
  mark unused read_tree_recursive() callback parameters
  hashmap: mark unused callback parameters
  config: mark unused callback parameters
  streaming: mark unused virtual method parameters
  transport: mark bundle transport_options as unused
  refs: mark unused virtual method parameters
  refs: mark unused reflog callback parameters
  refs: mark unused each_ref_fn parameters
  git-compat-util: add UNUSED macro
</content>
</entry>
</feed>
