aboutsummaryrefslogtreecommitdiffstats
path: root/reftable/basics.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-07reftable: check for trailing newline in 'tables.list'Karthik Nayak1-3/+4
In the reftable format, the 'tables.list' file contains a newline separated list of tables. While we parse this file, we do not check or care about the last newline. Tighten the parser in `parse_names()` to return an appropriate error if the last newline is missing. This requires modification to `parse_names()` to now return the error while accepting the output as a third argument. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-29reftable: make REFTABLE_UNUSED C99 compatibleCarlo Marcelo Arenas Belón1-0/+4
Since f93b2a0424 (reftable/basics: introduce `REFTABLE_UNUSED` annotation, 2025-02-18), the reftable library was migrated to use an internal version of `UNUSED`, which unconditionally sets a GNU __attribute__ to avoid warnings function parameters that are not being used. Make the definition conditional to prevent breaking the build with non GNU compilers. Reported-by: "Randall S. Becker" <rsbecker@nexbridge.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07reftable/block: create public interface for reading blocksPatrick Steinhardt1-7/+0
While users of the reftable library wouldn't generally require access to individual blocks in a reftable table, there are valid usecases where one may require low-level access to them. One such upcoming usecase in the Git codebase is to implement consistency checks for the reftable library where we want to verify each block individually. Create a public interface for reading blocks. The interface isn't yet complete and lacks e.g. a way to read individual records from a block. Such missing functionality will be backfilled in subsequent commits. Note that this change also requires us to expose `reftable_buf`, which is used by the `reftable_block_first_key()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07reftable: fix formatting of the license headerPatrick Steinhardt1-6/+6
The license headers used across the reftable library doesn't follow our typical coding style for multi-line comments. Fix it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/basics: introduce `REFTABLE_UNUSED` annotationPatrick Steinhardt1-0/+2
Introduce the `REFTABLE_UNUSED` annotation and replace all existing users of `UNUSED` in the reftable library to use the new macro instead. Note that we unconditionally define `MAYBE_UNUSED` in the exact same way, so doing so unconditionally for `REFTABLE_UNUSED` should be fine, too. Suggested-by: Toon Claes <toon@iotcl.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/basics: stop using `SWAP()` macroPatrick Steinhardt1-0/+9
Stop using `SWAP()` macro in favor of an open-coded variant of it. Note that this also requires us to open-code the build assert that `SWAP()` itself uses to verify that the size of both variables matches. This is done to reduce our dependency on the Git codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/basics: provide wrappers for big endian conversionPatrick Steinhardt1-3/+73
We're using a mixture of big endian conversion functions provided by both the reftable library, but also by the Git codebase. Refactor the code so that we exclusively use reftable-provided wrappers in order to untangle us from the Git codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/basics: stop using `st_mult()` in array allocatorsPatrick Steinhardt1-4/+32
We're using `st_mult()` as part of our macro helpers that allocate arrays. This is bad due two two reasons: - `st_mult()` causes us to die in case the multiplication overflows. - `st_mult()` ties us to the Git codebase. Refactor the code to instead detect overflows manually and return an error in such cases. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/basics: adjust `hash_size()` to return `uint32_t`Patrick Steinhardt1-1/+1
The `hash_size()` function returns the number of bytes used by the hash function. Weirdly enough though, it returns a signed integer for its size even though the size obviously cannot ever be negative. The only case where it could be negative is if the function returned an error when asked for an unknown hash, but we assert(3p) instead. Adjust the type of `hash_size()` to be `uint32_t` and adapt all places that use signed integers for the hash size to follow suit. This also allows us to get rid of a couple asserts that we had which verified that the size was indeed positive, which further stresses the point that this refactoring makes sense. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/basics: adjust `common_prefix_size()` to return `size_t`Patrick Steinhardt1-1/+1
The `common_prefix_size()` function computes the length of the common prefix between two buffers. As such its return value will always be an unsigned integer, as the length cannot be negative. Regardless of that, the function returns a signed integer, which is nonsensical and causes a couple of -Wsign-compare warnings all over the place. Adjust the function to return a `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-28reftable: fix allocation count on realloc errorRené Scharfe1-13/+26
When realloc(3) fails, it returns NULL and keeps the original allocation intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and the allocation count variable in that case, simultaneously leaking the original allocation and misrepresenting the number of storable items. parse_names() avoids the leak by keeping the original pointer if reallocation fails, but still increase the allocation count in such a case as if it succeeded. That's OK, because the error handling code just frees everything and doesn't look at names_cap anymore. reftable_buf_add() does the same, but here it is a problem as it leaves the reftable_buf in a broken state, with ->alloc being roughly twice as big as the actually allocated memory, allowing out-of-bounds writes in subsequent calls. Reimplement REFTABLE_ALLOC_GROW to avoid leaks, keep allocation counts in sync and still signal failures to callers while avoiding code duplication in callers. Make it an expression that evaluates to 0 if no reallocation is needed or it succeeded and 1 on failure while keeping the original pointer and allocation counter values. Adjust REFTABLE_ALLOC_GROW_OR_NULL to the new calling convention for REFTABLE_ALLOC_GROW, but keep its support for non-size_t alloc variables for now. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-28reftable: avoid leaks on realloc errorRené Scharfe1-0/+10
When realloc(3) fails, it returns NULL and keeps the original allocation intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and the allocation count variable in that case, simultaneously leaking the original allocation and misrepresenting the number of storable items. parse_names() and reftable_buf_add() avoid leaking by restoring the original pointer value on failure, but all other callers seem to be OK with losing the old allocation. Add a new variant of the macro, REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the allocation counter. Use it for those callers. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19reftable/system: stop depending on "hash.h"Patrick Steinhardt1-1/+1
We include "hash.h" in "reftable/system.h" such that we can use hash format IDs as well as the raw size of SHA1 and SHA256. As we are in the process of converting the reftable library to become standalone we of course cannot rely on those constants anymore. Introduce a new `enum reftable_hash` to replace internal uses of the hash format IDs and new constants that replace internal uses of the hash size. Adapt the reftable backend to set up the correct hash function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19reftable: explicitly handle hash format IDsPatrick Steinhardt1-0/+8
The hash format IDs are used for two different things across the reftable codebase: - They are used as a 32 bit unsigned integer when reading and writing the header in order to identify the hash function. - They are used internally to identify which hash function is in use. When one only considers the second usecase one might think that one can easily change the representation of those hash IDs. But because those IDs end up in the reftable header and footer on disk it is important that those never change. Create separate constants `REFTABLE_FORMAT_ID_*` and use them in contexts where we read or write reftable headers. This serves multiple purposes: - It allows us to more easily discern cases where we actually use those constants for the on-disk format. - It detangles us from the same constants that are defined in libgit.a, which is another required step to convert the reftable library to become standalone. - It makes the next step easier where we stop using `GIT_*_FORMAT_ID` constants in favor of a custom enum. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-17reftable: convert from `strbuf` to `reftable_buf`Patrick Steinhardt1-3/+4
Convert the reftable library to use the `reftable_buf` interface instead of the `strbuf` interface. This is mostly a mechanical change via sed(1) with some manual fixes where functions for `strbuf` and `reftable_buf` differ. The converted code does not yet handle allocation failures. This will be handled in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17reftable/basics: provide new `reftable_buf` interfacePatrick Steinhardt1-0/+56
Implement a new `reftable_buf` interface that will replace Git's own `strbuf` interface. This is done due to three reasons: - The `strbuf` interfaces do not handle memory allocation failures and instead causes us to die. This is okay in the context of Git, but is not in the context of the reftable library, which is supposed to be usable by third-party applications. - The `strbuf` interface is quite deeply tied into Git, which makes it hard to use the reftable library as a standalone library. Any dependent would have to carefully extract the relevant parts of it to make things work, which is not all that sensible. - The `strbuf` interface does not use the pluggable allocators that can be set up via `reftable_set_alloc()`. So we have good reasons to use our own type, and the implementation is rather trivial. Implement our own type. Conversion of the reftable library will be handled in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-02reftable/basics: ban standard allocator functionsPatrick Steinhardt1-0/+14
The reftable library uses pluggable allocators, which means that we shouldn't ever use the standard allocator functions. But it is an easy mistake to make to accidentally use e.g. free(3P) instead of the reftable-specific `reftable_free()` function, and we do not have any mechanism to detect this misuse right now. Introduce a couple of macros that ban the standard allocators, similar to how we do it in "banned.h". Note that we do not ban the following two classes of functions: - Macros like `FREE_AND_NULL()` or `REALLOC_ARRAY()`. As those expand to code that contains already-banned functions we'd get a compiler error even without banning those macros explicitly. - Git-specific allocators like `xmalloc()` and friends. The primary reason is that there are simply too many of them, so we're rather aiming for best effort here. Furthermore, the eventual goal is to make them unavailable in the reftable library place by not pulling them in via "git-compat-utils.h" anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02reftable: introduce `REFTABLE_FREE_AND_NULL()`Patrick Steinhardt1-0/+1
We have several calls to `FREE_AND_NULL()` in the reftable library, which of course uses free(3P). As the reftable allocators are pluggable we should rather call the reftable specific function, which is `reftable_free()`. Introduce a new macro `REFTABLE_FREE_AND_NULL()` and adapt the callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02reftable/basics: handle allocation failures in `parse_names()`Patrick Steinhardt1-3/+6
Handle allocation failures in `parse_names()` by returning `NULL` in case any allocation fails. While at it, refactor the function to return the array directly instead of assigning it to an out-pointer. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02reftable: introduce `reftable_strdup()`Patrick Steinhardt1-0/+1
The reftable library provides the ability to swap out allocators. There is a gap here though, because we continue to use `xstrdup()` even in the case where all the other allocators have been swapped out. Introduce `reftable_strdup()` that uses `reftable_malloc()` to do the allocation. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02reftable/basics: merge "publicbasics" into "basics"Patrick Steinhardt1-0/+3
The split between "basics" and "publicbasics" is somewhat arbitrary and not in line with how we typically structure code in the reftable library. While we do indeed split up headers into a public and internal part, we don't do that for the compilation unit itself. Furthermore, the declarations for "publicbasics.c" are in "reftable-malloc.h", which isn't in line with our naming schema, either. Fix these inconsistencies by: - Merging "publicbasics.c" into "basics.c". - Renaming "reftable-malloc.h" to "reftable-basics.h" as the public header. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07global: improve const correctness when assigning string constantsPatrick Steinhardt1-2/+2
We're about to enable `-Wwrite-strings`, which changes the type of string constants to `const char[]`. Fix various sites where we assign such constants to non-const variables. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03reftable/block: fix error handling when searching restart pointsPatrick Steinhardt1-2/+3
When doing the binary search over restart points in a block we need to decode the record keys. This decoding step can result in an error when the block is corrupted, which we indicate to the caller of the binary search by setting `args.error = 1`. But the only caller that exists mishandles this because it in fact performs the error check before calling `binsearch()`. Fix this bug by checking for errors at the right point in time. Furthermore, refactor `binsearch()` so that it aborts the search in case the callback function returns a negative value so that we don't needlessly continue to search the block. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03reftable/basics: fix return type of `binsearch()` to be `size_t`Patrick Steinhardt1-1/+1
The `binsearch()` function can be used to find the first element for which a callback functions returns a truish value. But while the array size is of type `size_t`, the function in fact returns an `int` that is supposed to index into that array. Fix the function signature to return a `size_t`. This conversion does not change any semantics given that the function would only ever return a value in the range `[0, sz]` anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/stack: use `size_t` to track stack lengthPatrick Steinhardt1-1/+1
While the stack length is already stored as `size_t`, we frequently use `int`s to refer to those stacks throughout the reftable library. Convert those cases to use `size_t` instead to make things consistent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable: introduce macros to allocate arraysPatrick Steinhardt1-1/+3
Similar to the preceding commit, let's carry over macros to allocate arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This requires us to change the signature of `reftable_calloc()`, which only takes a single argument right now and thus puts the burden on the caller to calculate the final array's size. This is a net improvement though as it means that we can now provide proper overflow checks when multiplying the array size with the member size. Convert callsites of `reftable_calloc()` to the new signature and start using the new macros where possible. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable: introduce macros to grow arraysPatrick Steinhardt1-0/+11
Throughout the reftable library we have many cases where we need to grow arrays. In order to avoid too many reallocations, we roughly double the capacity of the array on each iteration. The resulting code pattern is duplicated across many sites. We have similar patterns in our main codebase, which is why we have eventually introduced an `ALLOC_GROW()` macro to abstract it away and avoid some code duplication. We cannot easily reuse this macro here though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will call realloc(3P) to grow the array. The reftable code is structured as a library though (even if the boundaries are fuzzy), and one property this brings with it is that it is possible to plug in your own allocators. So instead of using realloc(3P), we need to use `reftable_realloc()` that knows to use the user-provided implementation. So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and `REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase, with two modifications: - They use `reftable_realloc()`, as explained above. - They use a different growth factor of `2 * cap + 1` instead of `(cap + 16) * 3 / 2`. The second change is because we know a bit more about the allocation patterns in the reftable library. In most cases, we end up only having a handful of items in the array and don't end up growing them. The initial capacity that our normal growth factor uses (which is 24) would thus end up over-allocating in a lot of code paths. This effect is measurable: - Before change: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated - After change with a growth factor of `(2 * alloc + 1)`: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated - After change with a growth factor of `(alloc + 16)* 2 / 3`: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated While the total heap usage is roughly the same, we do end up allocating significantly more bytes with our usual growth factor (in fact, roughly 21 times as many). Convert the reftable library to use these new macros. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08reftable: utility functionsHan-Wen Nienhuys1-0/+60
This commit provides basic utility classes for the reftable library. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>