<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/reftable/reader.c, branch v2.47.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.47.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.47.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-09-16T20:57:19Z</updated>
<entry>
<title>reftable/reader: make table iterator reseekable</title>
<updated>2024-09-16T20:57:19Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-16T08:50:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0a148a8eda7da6f7ff039ff2ea62b143ceb57f2f'/>
<id>urn:sha1:0a148a8eda7da6f7ff039ff2ea62b143ceb57f2f</id>
<content type='text'>
In 67ce50ba26 (Merge branch 'ps/reftable-reusable-iterator', 2024-05-30)
we have refactored the interface of reftable iterators such that they
can be reused in theory. This patch series only landed the required
changes on the interface level, but didn't yet implement the actual
logic to make iterators reusable.

As it turns out almost all of the infrastructure already does support
re-seeking. The only exception is the table iterator, which does not
reset its `is_finished` bit. Do so and add a couple of tests that verify
that we can re-seek iterators.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: keep readers alive during iteration</title>
<updated>2024-08-23T15:04:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=89eada4ea1f7a9c0a5f9b9e29592daa0847a79fc'/>
<id>urn:sha1:89eada4ea1f7a9c0a5f9b9e29592daa0847a79fc</id>
<content type='text'>
The lifetime of a table iterator may survive the lifetime of a reader
when the stack gets reloaded. Keep the reader from being released by
increasing its refcount while the iterator is still being used.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: introduce refcounting</title>
<updated>2024-08-23T15:04:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d857469d850b5e020de181ec07806872531d35e7'/>
<id>urn:sha1:d857469d850b5e020de181ec07806872531d35e7</id>
<content type='text'>
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: inline `reader_close()`</title>
<updated>2024-08-23T15:04:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=00e130a6bb3535de2a1a5f96640a8723fef09c87'/>
<id>urn:sha1:00e130a6bb3535de2a1a5f96640a8723fef09c87</id>
<content type='text'>
Same as with the preceding commit, we also provide a `reader_close()`
function that allows the caller to close a reader without freeing it.
This is unnecessary now that all users will have an allocated version of
the reader.

Inline it into `reftable_reader_free()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: inline `init_reader()`</title>
<updated>2024-08-23T15:04:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2de3c0d34555685a0502e81a37436a7db41a2ddf'/>
<id>urn:sha1:2de3c0d34555685a0502e81a37436a7db41a2ddf</id>
<content type='text'>
Most users use an allocated version of the `reftable_reader`, except for
some tests. We are about to convert the reader to become refcounted
though, and providing the ability to keep a reader on the stack makes
this conversion harder than necessary.

Update the tests to use `reftable_reader_new()` instead to prepare for
this change.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: rename `reftable_new_reader()`</title>
<updated>2024-08-23T15:04:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a0218203cdbcc4feb494074ec3d95c7c52670d09'/>
<id>urn:sha1:a0218203cdbcc4feb494074ec3d95c7c52670d09</id>
<content type='text'>
Rename the `reftable_new_reader()` function to `reftable_reader_new()`
to match our coding guidelines.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/generic: drop interface</title>
<updated>2024-08-22T14:59:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-22T06:35:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6014639837a8f118513563e6d9e3ff404e470b31'/>
<id>urn:sha1:6014639837a8f118513563e6d9e3ff404e470b31</id>
<content type='text'>
The `reftable_table` interface provides a generic infrastructure that
can abstract away whether the underlying table is a single table, or a
merged table. This abstraction can make it rather hard to reason about
the code. We didn't ever use it to implement the reftable backend, and
with the preceding patches in this patch series we in fact don't use it
at all anymore. Furthermore, it became somewhat useless with the recent
refactorings that made it possible to seek reftable iterators multiple
times, as these now provide generic access to tables for us. The
interface is thus redundant and only brings unnecessary complexity with
it.

Remove the `struct reftable_table` interface and its associated
functions.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t/helper: inline `reftable_reader_print_file()`</title>
<updated>2024-08-22T14:59:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-22T06:35:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=22f519a9a058fc0cf05e4bdd7a20adee13668150'/>
<id>urn:sha1:22f519a9a058fc0cf05e4bdd7a20adee13668150</id>
<content type='text'>
Move `reftable_reader_print_file()` into the "dump-reftable" helper.
This follows the same reasoning as the preceding commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/iter: drop double-checking logic</title>
<updated>2024-08-22T14:59:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-22T06:34:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b34ce7e911e315390b64f565accc6ca79e85443a'/>
<id>urn:sha1:b34ce7e911e315390b64f565accc6ca79e85443a</id>
<content type='text'>
The filtering ref iterator can be used to only yield refs which are not
in a specific skip list. This iterator has an option to double-check the
results it returns, which causes us to seek the reference we are about
to yield via a separate table such that we detect whether the reference
that the first iterator has yielded actually exists.

The value of this is somewhat dubious, and I cannot think of any usecase
where this functionality should be required. Furthermore, this option is
never set in our codebase, which means that it is essentially untested.
And last but not least, the `struct reftable_table` that is used to
implement it is about to go away.

So while we could refactor the code to not use a `reftable_table`, it
very much feels like a wasted effort. Let's just drop this code.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/merged: stop using generic tables in the merged table</title>
<updated>2024-08-22T14:59:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-22T06:34:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b8ca235ca5fad0421766db7fe2c0f52c6707bf8e'/>
<id>urn:sha1:b8ca235ca5fad0421766db7fe2c0f52c6707bf8e</id>
<content type='text'>
The merged table provides access to a reftable stack by merging the
contents of those tables into a virtual table. These subtables are being
tracked via `struct reftable_table`, which is a generic interface for
accessing either a single reftable or a merged reftable. So in theory,
it would be possible for the merged table to merge together other merged
tables.

This is somewhat nonsensical though: we only ever set up a merged table
over normal reftables, and there is no reason to do otherwise. This
generic interface thus makes the code way harder to follow and reason
about than really necessary. The abstraction layer may also have an
impact on performance, even though the extra set of vtable function
calls probably doesn't really matter.

Refactor the merged tables to use a `struct reftable_reader` for each of
the subtables instead, which gives us direct access to the underlying
tables. Adjust names accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
