<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/reftable/stack.c, branch v2.47.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.47.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.47.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-09-24T16:45:25Z</updated>
<entry>
<title>reftable/stack: allow locking of outdated stacks</title>
<updated>2024-09-24T16:45:25Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-24T05:33:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=80e7342ea8ecda48bdf034e77c32ac1c5d2bda85'/>
<id>urn:sha1:80e7342ea8ecda48bdf034e77c32ac1c5d2bda85</id>
<content type='text'>
In `reftable_stack_new_addition()` we first lock the stack and then
check whether it is still up-to-date. If it is not we return an error to
the caller indicating that the stack is outdated.

This is overly restrictive in our ref transaction interface though: we
lock the stack right before we start to verify the transaction, so we do
not really care whether it is outdated or not. What we really want is
that the stack is up-to-date after it has been locked so that we can
verify queued updates against its current state while we know that it is
locked for concurrent modification.

Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters
the behaviour of `reftable_stack_init_addition()` in this case: when we
notice that it is out-of-date we reload it instead of returning an error
to the caller.

This logic will be wired up in the reftable backend in the next commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refs/reftable: introduce "reftable.lockTimeout"</title>
<updated>2024-09-24T16:45:25Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-24T05:33:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bc39b6a796eb707c34ffb759d50a75f96313f26c'/>
<id>urn:sha1:bc39b6a796eb707c34ffb759d50a75f96313f26c</id>
<content type='text'>
When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.

The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.

Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.

Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/stack: fix segfault when reload with reused readers fails</title>
<updated>2024-08-23T15:04:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=85da2a2ab62e24a8b9ff183fe3a451b445632487'/>
<id>urn:sha1:85da2a2ab62e24a8b9ff183fe3a451b445632487</id>
<content type='text'>
It is expected that reloading the stack fails with concurrent writers,
e.g. because a table that we just wanted to read just got compacted.
In case we decided to reuse readers this will cause a segfault though
because we unconditionally release all new readers, including the reused
ones. As those are still referenced by the current stack, the result is
that we will eventually try to dereference those already-freed readers.

Fix this bug by incrementing the refcount of reused readers temporarily.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/stack: reorder swapping in the reloaded stack contents</title>
<updated>2024-08-23T15:04:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1302ed68d4f5452242d47ac609b06b793a18ea0b'/>
<id>urn:sha1:1302ed68d4f5452242d47ac609b06b793a18ea0b</id>
<content type='text'>
The code flow of how we swap in the reloaded stack contents is somewhat
convoluted because we switch back and forth between swapping in
different parts of the stack.

Reorder the code to simplify it. We now first close and unlink the old
tables which do not get reused before we update the stack to point to
the new stack.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: introduce refcounting</title>
<updated>2024-08-23T15:04:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d857469d850b5e020de181ec07806872531d35e7'/>
<id>urn:sha1:d857469d850b5e020de181ec07806872531d35e7</id>
<content type='text'>
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: inline `reader_close()`</title>
<updated>2024-08-23T15:04:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=00e130a6bb3535de2a1a5f96640a8723fef09c87'/>
<id>urn:sha1:00e130a6bb3535de2a1a5f96640a8723fef09c87</id>
<content type='text'>
Same as with the preceding commit, we also provide a `reader_close()`
function that allows the caller to close a reader without freeing it.
This is unnecessary now that all users will have an allocated version of
the reader.

Inline it into `reftable_reader_free()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: rename `reftable_new_reader()`</title>
<updated>2024-08-23T15:04:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a0218203cdbcc4feb494074ec3d95c7c52670d09'/>
<id>urn:sha1:a0218203cdbcc4feb494074ec3d95c7c52670d09</id>
<content type='text'>
Rename the `reftable_new_reader()` function to `reftable_reader_new()`
to match our coding guidelines.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/stack: inline `stack_compact_range_stats()`</title>
<updated>2024-08-23T15:04:46Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-23T14:12:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a52bac9ac0c6eac60244e902e4b65e9ddab066aa'/>
<id>urn:sha1:a52bac9ac0c6eac60244e902e4b65e9ddab066aa</id>
<content type='text'>
The only difference between `stack_compact_range_stats()` and
`stack_compact_range()` is that the former updates stats on failure,
whereas the latter doesn't. There are no callers anymore that do not
want their stats updated though, making the indirection unnecessary.

Inline the stat updates into `stack_compact_range()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/generic: drop interface</title>
<updated>2024-08-22T14:59:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-22T06:35:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6014639837a8f118513563e6d9e3ff404e470b31'/>
<id>urn:sha1:6014639837a8f118513563e6d9e3ff404e470b31</id>
<content type='text'>
The `reftable_table` interface provides a generic infrastructure that
can abstract away whether the underlying table is a single table, or a
merged table. This abstraction can make it rather hard to reason about
the code. We didn't ever use it to implement the reftable backend, and
with the preceding patches in this patch series we in fact don't use it
at all anymore. Furthermore, it became somewhat useless with the recent
refactorings that made it possible to seek reftable iterators multiple
times, as these now provide generic access to tables for us. The
interface is thus redundant and only brings unnecessary complexity with
it.

Remove the `struct reftable_table` interface and its associated
functions.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t/helper: inline `reftable_stack_print_directory()`</title>
<updated>2024-08-22T14:59:47Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-22T06:35:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ca74ef6ffb7388d862379d5016282340aff1f68b'/>
<id>urn:sha1:ca74ef6ffb7388d862379d5016282340aff1f68b</id>
<content type='text'>
Move `reftable_stack_print_directory()` into the "dump-reftable" helper.
This follows the same reasoning as the preceding commit.

Note that this requires us to remove the tests for this functionality in
`reftable/stack_test.c`. The test does not really add much anyway,
because all it verifies is that we do not crash or run into an error,
and it specifically doesn't check the outputted data. Also, as the code
is now part of the test helper, it doesn't make much sense to have a
unit test for it in the first place.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
