git/combine-diff.c, branch v2.51.2

odb: rename `repo_read_object_file()`

2025-07-01T21:46:38Z

Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

object-store: rename files to "odb.{c,h}"

2025-07-01T21:46:34Z

In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

object-store: merge "object-store-ll.h" and "object-store.h"

2025-04-15T15:24:37Z

The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

hash: stop depending on `the_repository` in `null_oid()`

2025-03-10T20:16:20Z

The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

tree-diff: drop list-tail argument to diff_tree_paths()

2025-01-09T20:24:26Z

The internals of the path diffing code, including ll_diff_tree_paths(), all take an extra combine_diff_path parameter which they use as the tail of a list of results, appending any new entries to it. The public-facing diff_tree_paths() takes the same argument, but it just makes the callers more awkward. They always start with a clean list, and have to set up a fake head struct to pass in. Let's keep the public API clean by always returning a new list. That keeps the fake struct as an implementation detail of tree-diff.c. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

combine-diff: drop public declaration of combine_diff_path_size()

2025-01-09T20:24:26Z

We want callers to use combine_diff_path_new() to allocate structs, rather than using combine_diff_path_size() and xmalloc(). That gives us more consistency over the initialization of the fields. Now that the final external user of combine_diff_path_size() is gone, we can stop declaring it publicly. And since our constructor is the only caller, we can just inline it there. Breaking the size computation into two parts also lets us reuse the intermediate multiplication result of the parent length, since we need to know it to perform our memset(). The result is a little easier to read. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

combine-diff: use pointer for parent paths

2025-01-09T18:31:23Z

Commit d76ce4f734 (log,diff-tree: add --combined-all-paths option, 2019-02-07) added a "path" field to each combine_diff_parent struct. It's defined as a strbuf, but this is overkill. We never manipulate the buffer beyond inserting a single string into it. And in fact there's a small bug: we zero the parent structs, including the path strbufs. For the 0th parent, we strbuf_init() the strbuf before adding to it. But for subsequent parents, we never do the init. This is technically violating the strbuf API, though the code there is resilient enough to handle this zero'd state. This patch switches us to just store an allocated string pointer. Zeroing it is enough to properly initialize it there (modulo the usual assumption we make that a NULL pointer is all-zeroes). And as a bonus, we can just check for a non-NULL value to see if it is present, rather than repeating the combined_all_paths logic at each site. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

combine-diff: add combine_diff_path_new()

2025-01-09T17:57:44Z

The combine_diff_path struct has variable size, since it embeds both the memory allocation for the path field as well as a variable-sized parent array. This makes allocating one a bit tricky. We have a helper to compute the required size, but it's up to individual sites to actually initialize all of the fields. Let's provide a constructor function to make that a little nicer. Besides being shorter, it also hides away tricky bits like the computation of the "path" pointer (which is right after the "parent" flex array). As a bonus, using the same constructor everywhere means that we'll consistently initialize all parts of the struct. A few code paths left the parent array unitialized. This didn't cause any bugs, but we'll be able to simplify some code in the next few patches knowing that the parent fields have all been zero'd. This also gets rid of some questionable uses of "int" to store buffer lengths. Though we do use them to allocate, I don't think there are any integer overflow vulnerabilities here (the allocation helper promotes them to size_t and checks arithmetic for overflow, and the actual memcpy of the bytes is done using the possibly-truncated "int" value). Sadly we can't use the FLEX_* macros to simplify the allocation here, because there are two variable-sized parts to the struct (and those macros only handle one). Nor can we get stop publicly declaring combine_diff_path_size(). This patch does not touch the code in path_appendnew() at all, which is not ready to be moved to our new constructor for a few reasons: - path_appendnew() has a memory-reuse optimization where it tries to reuse combine_diff_path structs rather than freeing and reallocating. - path_appendnew() does not create the struct from a single path string, but rather allocates and copies into the buffer from multiple sources. These can be addressed by some refactoring, but let's leave it as-is for now. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

global: mark code units that generate warnings with `-Wsign-compare`

2024-12-06T11:20:02Z

Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

combine-diff: fix leaking lost lines

2024-11-05T06:37:55Z

The `cnt` variable tracks the number of lines in a patch diff. It can happen though that there are no newlines, in which case we'd still end up allocating our array of `sline`s. In fact, we always allocate it with `cnt + 2` entries: one extra entry for the deletion hunk at the end, and another entry that we don't seem to ever populate at all but acts as a kind of sentinel value. When we loop through the array to clear it at the end of this function we only loop until `lno < cnt`, and thus we may not end up releasing whatever the two extra `sline`s contain. While that shouldn't matter for the sentinel value, it does matter for the extra deletion hunk sline. Regardless of that, plug this memory leak by releasing both extra entries, which makes the logic a bit easier to reason about. While at it, fix the formatting of a local comment, which incidentally also provides the necessary context for why we overallocate the `sline` array. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano