<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/ref-filter.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-06T11:20:02Z</updated>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>ref-filter: format iteratively with lexicographic refname sorting</title>
<updated>2024-10-21T20:46:03Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-10-21T11:33:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2e7c6d2f4112dd374f615f4e612e1cebbcb6d431'/>
<id>urn:sha1:2e7c6d2f4112dd374f615f4e612e1cebbcb6d431</id>
<content type='text'>
In bd98f9774e (ref-filter.c: filter &amp; format refs in the same callback,
2023-11-14), we have introduced logic into the ref-filter subsystem that
determines whether or not we can output references iteratively instead
of first collecting all references, post-processing them and printing
them once done. This has the advantage that we don't have to store all
refs in memory and, when used with e.g. `--count=1`, that we don't have
to read all refs in the first place.

One restriction we have in place for that is that caller must not ask
for sorted refs, because there is no way to sort the refs without first
reading them all into an array. So the benefits can only be reaped when
explicitly asking for output not to be sorted.

But there is one exception here where we _can_ get away with sorting
refs while streaming: ref backends sort references returned by their
iterators in lexicographic order. So if the following conditions are all
true we can do iterative streaming:

  - There must be at most a single sorting specification, as otherwise
    we're not using plain lexicographic ordering.

  - The sorting specification must use the "refname".

  - The sorting specification must not be using any flags, like
    case-insensitive sorting.

Now the resulting logic does feel quite fragile overall, which makes me
a bit uneasy. But after thinking about this for a while I couldn't find
any obvious gaps in my reasoning. Furthermore, given that lexicographic
sorting order is the default in git-for-each-ref(1), this is likely to
benefit a whole lot of usecases out there.

The following benchmark executes git-for-each-ref(1) in a crafted repo
with 1 million references:

  Benchmark 1: git for-each-ref (revision = HEAD~)
    Time (mean ± σ):      6.756 s ±  0.014 s    [User: 3.004 s, System: 3.541 s]
    Range (min … max):    6.738 s …  6.784 s    10 runs

  Benchmark 2: git for-each-ref (revision = HEAD)
    Time (mean ± σ):      6.479 s ±  0.017 s    [User: 2.858 s, System: 3.422 s]
    Range (min … max):    6.450 s …  6.519 s    10 runs

  Summary
    git for-each-ref (revision = HEAD)
      1.04 ± 0.00 times faster than git for-each-ref (revision = HEAD~)

The change results in a slight performance improvement, but nothing that
would really stand out. Something that cannot be seen in the benchmark
though is peak memory usage, which went from 404.5MB to 68.96kB.

A more interesting benchmark is printing a single referenence with
`--count=1`:

  Benchmark 1: git for-each-ref --count=1 (revision = HEAD~)
    Time (mean ± σ):      6.655 s ±  0.018 s    [User: 2.865 s, System: 3.576 s]
    Range (min … max):    6.630 s …  6.680 s    10 runs

  Benchmark 2: git for-each-ref --count=1 (revision = HEAD)
    Time (mean ± σ):       8.6 ms ±   1.3 ms    [User: 2.3 ms, System: 6.1 ms]
    Range (min … max):     6.7 ms …  14.4 ms    266 runs

  Summary
    git git for-each-ref --count=1 (revision = HEAD)
    770.58 ± 116.19 times faster than git for-each-ref --count=1 (revision = HEAD~)

Whereas we scaled with the number of references before, we now print the
first reference and exit immediately, which provides a massive win.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ak/refs-symref-referent-typofix'</title>
<updated>2024-09-25T17:37:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-09-25T17:37:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7834cc321294cec6dafb91dc6e9485a9d65742ff'/>
<id>urn:sha1:7834cc321294cec6dafb91dc6e9485a9d65742ff</id>
<content type='text'>
Typofix.

* ak/refs-symref-referent-typofix:
  ref-filter: fix a typo
</content>
</entry>
<entry>
<title>Merge branch 'ps/environ-wo-the-repository'</title>
<updated>2024-09-23T17:35:05Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-09-23T17:35:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3eb66799593f3676d85ca66f9e3192a7db603805'/>
<id>urn:sha1:3eb66799593f3676d85ca66f9e3192a7db603805</id>
<content type='text'>
Code clean-up.

* ps/environ-wo-the-repository: (21 commits)
  environment: stop storing "core.notesRef" globally
  environment: stop storing "core.warnAmbiguousRefs" globally
  environment: stop storing "core.preferSymlinkRefs" globally
  environment: stop storing "core.logAllRefUpdates" globally
  refs: stop modifying global `log_all_ref_updates` variable
  branch: stop modifying `log_all_ref_updates` variable
  repo-settings: track defaults close to `struct repo_settings`
  repo-settings: split out declarations into a standalone header
  environment: guard state depending on a repository
  environment: reorder header to split out `the_repository`-free section
  environment: move `set_git_dir()` and related into setup layer
  environment: make `get_git_namespace()` self-contained
  environment: move object database functions into object layer
  config: make dependency on repo in `read_early_config()` explicit
  config: document `read_early_config()` and `read_very_early_config()`
  environment: make `get_git_work_tree()` accept a repository
  environment: make `get_graft_file()` accept a repository
  environment: make `get_index_file()` accept a repository
  environment: make `get_object_directory()` accept a repository
  environment: make `get_git_common_dir()` accept a repository
  ...
</content>
</entry>
<entry>
<title>ref-filter: fix a typo</title>
<updated>2024-09-19T20:50:36Z</updated>
<author>
<name>Andrew Kreimer</name>
<email>algonell@gmail.com</email>
</author>
<published>2024-09-19T18:34:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=20652956420f7bd64339b4566623c293016e86b7'/>
<id>urn:sha1:20652956420f7bd64339b4566623c293016e86b7</id>
<content type='text'>
Fix a typo in comments.

Signed-off-by: Andrew Kreimer &lt;algonell@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment: stop storing "core.warnAmbiguousRefs" globally</title>
<updated>2024-09-12T17:15:44Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-12T11:30:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=11dbb4ace3ac428574fadf6f7895f56aba9dca81'/>
<id>urn:sha1:11dbb4ace3ac428574fadf6f7895f56aba9dca81</id>
<content type='text'>
Same as the preceding commits, storing the "core.warnAmbiguousRefs"
value globally is misdesigned as this setting may be set per repository.

Move the logic into the repo-settings subsystem. The usual pattern here
is that users are expected to call `prepare_repo_settings()` before they
access the settings themselves. This seems somewhat fragile though, as
it is easy to miss and leads to somewhat ugly code patterns at the call
sites.

Instead, introduce a new function that encapsulates this logic for us.
This also allows us to change how exactly the lazy initialization works
in the future, e.g. by only partially initializing values as requested
by the caller.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>ref-filter: fix leak with unterminated %(if) atoms</title>
<updated>2024-09-10T16:26:13Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-10T06:57:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=04d9744f839dc90f27f08f94cc26f8bb33b3adfa'/>
<id>urn:sha1:04d9744f839dc90f27f08f94cc26f8bb33b3adfa</id>
<content type='text'>
When parsing `%(if)` atoms we expect a few other atoms to exist to
complete it, like `%(then)` and `%(end)`. Whether or not we have seen
these other atoms is tracked in an allocated `if_then_else` structure,
which gets free'd by the `if_then_else_handler()` once we have parsed
the complete conditional expression.

This results in a memory leak when the `%(if)` atom is not terminated
correctly and thus incomplete. We never end up executing its handler and
thus don't end up freeing the structure.

Plug this memory leak by introducing a new `at_end_data_free` callback
function. If set, we'll execute it in `pop_stack_element()` and pass it
the `at_end_data` variable with the intent to free its state. Wire it up
for the `%(if)` atom accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>ref-filter: add ref_format_clear() function</title>
<updated>2024-09-09T23:26:11Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-09-09T23:21:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=db629c61f0be3665a36750fe2353b9ee958b0376'/>
<id>urn:sha1:db629c61f0be3665a36750fe2353b9ee958b0376</id>
<content type='text'>
After using the ref-filter API, callers should use ref_filter_clear() to
free any used memory. However, there's not a matching function to clear
the ref_format struct.

Traditionally this did not need to be cleaned up, as it was just a way
for the caller to store and pass format options as a single unit. Even
though the parsing step of some placeholders may allocate data, that's
usually inside their "used_atom" structs, which are part of the
ref_filter itself.

But a few placeholders keep data outside of there. The %(ahead-behind)
and %(is-base) parsers both keep a master list of bases, because they
perform a single filtering pass outside of the use of any particular
atom. And since the format parser does not have access to the ref_filter
struct, they store their cross-atom data in the ref_format struct
itself.

And thus when they are finished, the ref_format also needs to be cleaned
up. So let's add a function to do so, and call it from all of the users
of the ref-filter API.

The %(is-base) case is found by running LSan on t6300. After this patch,
the script can now be marked leak-free.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>ref-filter: fix leak when formatting %(push:remoteref)</title>
<updated>2024-09-09T23:26:10Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-09-09T23:19:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f046127b6682f98d41bb4d26164da7f1a4a8e8d0'/>
<id>urn:sha1:f046127b6682f98d41bb4d26164da7f1a4a8e8d0</id>
<content type='text'>
When we expand the %(upstream) or %(push) placeholders, we rely on
remote.c's remote_ref_for_branch() to fill in the ":refname" argument.
But that function has confusing memory ownership semantics: it may or
may not return an allocated string, depending on whether we are in
"upstream" mode or "push" mode. The caller in ref-filter.c always
duplicates the result, meaning that we leak the original in the case of
%(push:refname).

To solve this, let's make the return value from remote_ref_for_branch()
consistent, by always returning an allocated pointer. Note that the
switch to returning a non-const pointer has a ripple effect inside the
function, too. We were storing the "dst" result as a const pointer, too,
even though it is always allocated! It is the return value from
apply_refspecs(), which is always a non-const allocated string.

And then on the caller side in ref-filter.c (and this is the only caller
at all), we just need to avoid the extra duplication when the return
value is non-NULL.

This clears up one case that LSan finds in t6300, but there are more.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>ref-filter: fix leak with %(describe) arguments</title>
<updated>2024-09-09T23:26:10Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-09-09T23:19:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ec007cde941226ed35434ea6443ad1ba066279cf'/>
<id>urn:sha1:ec007cde941226ed35434ea6443ad1ba066279cf</id>
<content type='text'>
When we parse a %(describe) placeholder, we stuff its arguments into a
strvec, which is then detached into the used_atom struct. But later,
when ref_array_clear() frees the atom, we never free the memory.

To solve this, we just need to add the appropriate free() calls. But
it's a little awkward, since we have to free each element of the array,
in addition to the array itself. Instead, let's store the actual strvec,
which lets us do a simple strvec_clear().

This clears up one case that LSan finds in t6300, but there are more.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
