<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diff-lib.c, branch v2.50.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.50.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.50.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-03-10T20:16:20Z</updated>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run_diff_files(): de-mystify the size of combine_diff_path struct</title>
<updated>2025-01-09T20:24:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-09T08:44:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ca3abe41d71c4789ca00cba0ca2b6c22d67f08a3'/>
<id>urn:sha1:ca3abe41d71c4789ca00cba0ca2b6c22d67f08a3</id>
<content type='text'>
We allocate a combine_diff_path struct with space for 5 parents. Why 5?

The history is not particularly enlightening. The allocation comes from
b4b1550315 (Don't instantiate structures with FAMs., 2006-06-18), which
just switched to xmalloc from a stack struct with 5 elements. That
struct changed to 5 from 4 in 2454c962fb (combine-diff: show mode
changes as well., 2006-02-06), when we also moved from storing raw sha1
bytes to the combine_diff_parent struct. But no explanation is given.
That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and
--cc options., 2006-01-28).

One might guess it is for the 4 stages we can store in the index. But
this code path only ever diffs the current state against stages 2 and 3.
So we only need two slots.

And it's easy to see this is still the case. We fill the parent slots by
subtracting 2 from the ce_stage() values, ignoring values below 2. And
since ce_stage() is only 2 bits, there are 4 values, and thus we need 2
slots.

Let's use the correct value (saving a tiny bit of memory) and add a
comment explaining what's going on (saving a tiny bit of programmer
brain power).

Arguably we could use:

  1 + (STAGEMASK &gt;&gt; STAGESHIFT) - 2

which lets the compiler enforce that we will not go out-of-bounds if we
see an unexpected value from ce_stage(). But that is more confusing to
explain, and the constant "2" is baked into other parts of the function.
It is a fundamental constant, not something where somebody might bump a
macro and forget to update this code.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>combine-diff: add combine_diff_path_new()</title>
<updated>2025-01-09T17:57:44Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-09T08:32:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=706779344155823518745a19515601905877c41f'/>
<id>urn:sha1:706779344155823518745a19515601905877c41f</id>
<content type='text'>
The combine_diff_path struct has variable size, since it embeds both the
memory allocation for the path field as well as a variable-sized parent
array. This makes allocating one a bit tricky.

We have a helper to compute the required size, but it's up to individual
sites to actually initialize all of the fields. Let's provide a
constructor function to make that a little nicer. Besides being shorter,
it also hides away tricky bits like the computation of the "path"
pointer (which is right after the "parent" flex array).

As a bonus, using the same constructor everywhere means that we'll
consistently initialize all parts of the struct. A few code paths left
the parent array unitialized. This didn't cause any bugs, but we'll be
able to simplify some code in the next few patches knowing that the
parent fields have all been zero'd.

This also gets rid of some questionable uses of "int" to store buffer
lengths. Though we do use them to allocate, I don't think there are any
integer overflow vulnerabilities here (the allocation helper promotes
them to size_t and checks arithmetic for overflow, and the actual memcpy
of the bytes is done using the possibly-truncated "int" value).

Sadly we can't use the FLEX_* macros to simplify the allocation here,
because there are two variable-sized parts to the struct (and those
macros only handle one).

Nor can we get stop publicly declaring combine_diff_path_size(). This
patch does not touch the code in path_appendnew() at all, which is not
ready to be moved to our new constructor for a few reasons:

  - path_appendnew() has a memory-reuse optimization where it tries to
    reuse combine_diff_path structs rather than freeing and
    reallocating.

  - path_appendnew() does not create the struct from a single path
    string, but rather allocates and copies into the buffer from
    multiple sources.

These can be addressed by some refactoring, but let's leave it as-is for
now.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run_diff_files(): delay allocation of combine_diff_path</title>
<updated>2025-01-09T17:56:28Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-09T08:28:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=949bb8f74f6db7405d6ad8bbf02ebc42a947801d'/>
<id>urn:sha1:949bb8f74f6db7405d6ad8bbf02ebc42a947801d</id>
<content type='text'>
While looping over the index entries, when we see a higher level stage
the first thing we do is allocate a combine_diff_path struct for it. But
this can leak; if check_removed() returns an error, we'll continue to
the next iteration of the loop without cleaning up.

We can fix this by just delaying the allocation by a few lines.

I don't think this leak is triggered in the test suite, but it's pretty
easy to see by inspection. My ulterior motive here is that the delayed
allocation means we have all of the data needed to initialize "dpath" at
the time of malloc, making it easier to factor out a constructor
function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-lib: fix leaking diffopts in `do_diff_cache()`</title>
<updated>2024-11-05T06:37:52Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-11-05T06:17:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8dd3cb4b45d06959f3344f02b2d45a4ccf8207d8'/>
<id>urn:sha1:8dd3cb4b45d06959f3344f02b2d45a4ccf8207d8</id>
<content type='text'>
In `do_diff_cache()` we initialize a new `rev_info` and then overwrite
its `diffopt` with a user-provided set of options. This can leak memory
because `repo_init_revisions()` may end up allocating memory for the
`diffopt` itself depending on the configuration. And since that field is
overwritten we won't ever free it.

Plug the memory leak by releasing the diffopts before we overwrite them.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/output-prefix-cleanup'</title>
<updated>2024-10-10T21:22:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-10T21:22:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3eb4cc451ed97123ff76e183a5be8a7dc164d1f6'/>
<id>urn:sha1:3eb4cc451ed97123ff76e183a5be8a7dc164d1f6</id>
<content type='text'>
Code clean-up.

* jk/output-prefix-cleanup:
  diff: store graph prefix buf in git_graph struct
  diff: return line_prefix directly when possible
  diff: return const char from output_prefix callback
  diff: drop line_prefix_length field
  line-log: use diff_line_prefix() instead of custom helper
</content>
</entry>
<entry>
<title>diff: return const char from output_prefix callback</title>
<updated>2024-10-03T21:22:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-10-03T21:09:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=436728fe9d75d05fa2439f867ca2039012b86e69'/>
<id>urn:sha1:436728fe9d75d05fa2439f867ca2039012b86e69</id>
<content type='text'>
The diff_options structure has an output_prefix callback for returning a
prefix string, but it does so by returning a pointer to a strbuf.

This makes the interface awkward. There's no reason the callback should
need to use a strbuf, and it creates questions about whether the
ownership of the resulting buffer should be transferred to the caller
(it should not be, but a recent attempt to clean up this code led to a
double-free in some cases).

The one advantage we get is that the strbuf contains a ptr/len pair, so
we could in theory have a prefix with embedded NULs. But we can observe
that none of the existing callbacks would ever produce such a NUL (they
are usually just indentation or graph symbols, and even the
"--line-prefix" option takes a NUL-terminated string).

And anyway, only one caller (the one in log_tree_diff_flush) actually
looks at the strbuf length. In every other case we use a helper function
which discards the length and just returns the NUL-terminated string.

So let's just have the callback return a "const char *" pointer. It's up
to the callbacks themselves if they want to use a strbuf under the hood.
And now the caller in log_tree_diff_flush() can just use the helper
function along with everybody else. That lets us even simplify out the
function pointer check, since the helper returns an empty string
(technically this does mean we'll sometimes issue an empty fputs() call,
but I don't think this code path is hot enough to care about that).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-lib: drop unused index argument from get_stat_data()</title>
<updated>2024-08-17T16:44:41Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-08-17T07:29:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=72c9793c154edbde16085f0f04db970e082c7f0b'/>
<id>urn:sha1:72c9793c154edbde16085f0f04db970e082c7f0b</id>
<content type='text'>
The "struct index_state" parameter passed to get_stat_data() has been
unused since we stopped passing it to check_removed() in 6a044a2048
(diff-lib: fix check_removed when fsmonitor is on, 2023-09-11). We can
just drop it, which in turns lets us simplify our callers a bit.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/leakfixes-more'</title>
<updated>2024-07-08T21:53:10Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-07-08T21:53:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3997614c249b4b475d07c00556446d8b698d1a49'/>
<id>urn:sha1:3997614c249b4b475d07c00556446d8b698d1a49</id>
<content type='text'>
More memory leaks have been plugged.

* ps/leakfixes-more: (29 commits)
  builtin/blame: fix leaking ignore revs files
  builtin/blame: fix leaking prefixed paths
  blame: fix leaking data for blame scoreboards
  line-range: plug leaking find functions
  merge: fix leaking merge bases
  builtin/merge: fix leaking `struct cmdnames` in `get_strategy()`
  sequencer: fix memory leaks in `make_script_with_merges()`
  builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`
  apply: fix leaking string in `match_fragment()`
  sequencer: fix leaking string buffer in `commit_staged_changes()`
  commit: fix leaking parents when calling `commit_tree_extended()`
  config: fix leaking "core.notesref" variable
  rerere: fix various trivial leaks
  builtin/stash: fix leak in `show_stash()`
  revision: free diff options
  builtin/log: fix leaking commit list in git-cherry(1)
  merge-recursive: fix memory leak when finalizing merge
  builtin/merge-recursive: fix leaking object ID bases
  builtin/difftool: plug memory leaks in `run_dir_diff()`
  object-name: free leaking object contexts
  ...
</content>
</entry>
</feed>
