<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/reftable/readwrite_test.c, branch v2.45.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-03-05T17:10:06Z</updated>
<entry>
<title>reftable/record: convert old and new object IDs to arrays</title>
<updated>2024-03-05T17:10:06Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-03-05T12:10:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=87ff723018bfca588b5d68e110ab04494c451ebd'/>
<id>urn:sha1:87ff723018bfca588b5d68e110ab04494c451ebd</id>
<content type='text'>
In 7af607c58d (reftable/record: store "val1" hashes as static arrays,
2024-01-03) and b31e3cc620 (reftable/record: store "val2" hashes as
static arrays, 2024-01-03) we have converted ref records to store their
object IDs in a static array. Convert log records to do the same so that
their old and new object IDs are arrays, too.

This change results in two allocations less per log record that we're
iterating over. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/reftable-styles'</title>
<updated>2024-02-12T21:16:10Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-02-12T21:16:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f424d7c33df373fb8eb4c9dc63ab6dc24de24aa5'/>
<id>urn:sha1:f424d7c33df373fb8eb4c9dc63ab6dc24de24aa5</id>
<content type='text'>
Code clean-up in various reftable code paths.

* ps/reftable-styles:
  reftable/record: improve semantics when initializing records
  reftable/merged: refactor initialization of iterators
  reftable/merged: refactor seeking of records
  reftable/stack: use `size_t` to track stack length
  reftable/stack: use `size_t` to track stack slices during compaction
  reftable/stack: index segments with `size_t`
  reftable/stack: fix parameter validation when compacting range
  reftable: introduce macros to allocate arrays
  reftable: introduce macros to grow arrays
</content>
</entry>
<entry>
<title>reftable: introduce macros to allocate arrays</title>
<updated>2024-02-06T20:10:08Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-02-06T06:35:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b4ff12c8eefff9cba73ba3cb7492111adfa31d87'/>
<id>urn:sha1:b4ff12c8eefff9cba73ba3cb7492111adfa31d87</id>
<content type='text'>
Similar to the preceding commit, let's carry over macros to allocate
arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This
requires us to change the signature of `reftable_calloc()`, which only
takes a single argument right now and thus puts the burden on the caller
to calculate the final array's size. This is a net improvement though as
it means that we can now provide proper overflow checks when multiplying
the array size with the member size.

Convert callsites of `reftable_calloc()` to the new signature and start
using the new macros where possible.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/writer: fix writing multi-level indices</title>
<updated>2024-02-01T19:11:32Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-02-01T07:52:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7485601ca4da6a09c6488add181609cffec5799'/>
<id>urn:sha1:e7485601ca4da6a09c6488add181609cffec5799</id>
<content type='text'>
When finishing a section we will potentially write an index that makes
it more efficient to look up relevant blocks. The index records written
will encode, for each block of the indexed section, what the offset of
that block is as well as the last key of that block. Thus, the reader
would iterate through the index records to find the first key larger or
equal to the wanted key and then use the encoded offset to look up the
desired block.

When there are a lot of blocks to index though we may end up writing
multiple index blocks, too. To not require a linear search across all
index blocks we instead end up writing a multi-level index. Instead of
referring to the block we are after, an index record may point to
another index block. The reader will then access the highest-level index
and follow down the chain of index blocks until it hits the sought-after
block.

It has been observed though that it is impossible to seek ref records of
the last ref block when using a multi-level index. While the multi-level
index exists and looks fine for most of the part, the highest-level
index was missing an index record pointing to the last block of the next
index. Thus, every additional level made more refs become unseekable at
the end of the ref section.

The root cause is that we are not flushing the last block of the current
level once done writing the level. Consequently, it wasn't recorded in
the blocks that need to be indexed by the next-higher level and thus we
forgot about it.

Fix this bug by flushing blocks after we have written all index records.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable: honor core.fsync</title>
<updated>2024-01-23T21:45:27Z</updated>
<author>
<name>John Cai</name>
<email>johncai86@gmail.com</email>
</author>
<published>2024-01-23T18:51:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1df18a1c9a50b58afb19ef218f6517f347606800'/>
<id>urn:sha1:1df18a1c9a50b58afb19ef218f6517f347606800</id>
<content type='text'>
While the reffiles backend honors configured fsync settings, the
reftable backend does not. Address this by fsyncing reftable files using
the write-or-die api's fsync_component() in two places: when we
add additional entries into the table, and when we close the reftable
writer.

This commits adds a flush function pointer as a new member of
reftable_writer because we are not sure that the first argument to the
*write function pointer always contains a file descriptor. In the case of
strbuf_add_void, the first argument is a buffer. This way, we can pass
in a corresponding flush function that knows how to flush depending on
which writer is being used.

This patch does not contain tests as they will need to wait for another
patch to start to exercise the reftable backend. At that point, the
tests will be added to observe that fsyncs are happening when the
reftable is in use.

Signed-off-by: John Cai &lt;johncai86@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/reftable-fixes-and-optims'</title>
<updated>2024-01-16T18:11:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-01-16T18:11:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=481d69dd63328fb10422c8bf9e714b5b5c7d1820'/>
<id>urn:sha1:481d69dd63328fb10422c8bf9e714b5b5c7d1820</id>
<content type='text'>
More fixes and optimizations to the reftable backend.

* ps/reftable-fixes-and-optims:
  reftable/merged: transfer ownership of records when iterating
  reftable/merged: really reuse buffers to compute record keys
  reftable/record: store "val2" hashes as static arrays
  reftable/record: store "val1" hashes as static arrays
  reftable/record: constify some parts of the interface
  reftable/writer: fix index corruption when writing multiple indices
  reftable/stack: do not auto-compact twice in `reftable_stack_add()`
  reftable/stack: do not overwrite errors when compacting
</content>
</entry>
<entry>
<title>Merge branch 'en/header-cleanup'</title>
<updated>2024-01-08T22:05:15Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-01-08T22:05:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=492ee03f60297e7e83d101f4519ab8abc98782bc'/>
<id>urn:sha1:492ee03f60297e7e83d101f4519ab8abc98782bc</id>
<content type='text'>
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
</content>
</entry>
<entry>
<title>reftable/record: store "val2" hashes as static arrays</title>
<updated>2024-01-03T17:54:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-01-03T06:22:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b31e3cc620f926273af9346fbda4ff507f60682e'/>
<id>urn:sha1:b31e3cc620f926273af9346fbda4ff507f60682e</id>
<content type='text'>
Similar to the preceding commit, convert ref records of type "val2" to
store their object IDs in static arrays instead of allocating them for
every single record.

We're using the same benchmark as in the preceding commit, with `git
show-ref --quiet` in a repository with ~350k refs. This time around
though the effects aren't this huge. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated

This is because "val2"-type records are typically only stored for peeled
tags, and the number of annotated tags in the benchmark repository is
rather low. Still, it can be seen that this change leads to a reduction
of allocations overall, even if only a small one.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/record: store "val1" hashes as static arrays</title>
<updated>2024-01-03T17:54:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-01-03T06:22:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7af607c58d7985a0eb70fc3bca6eef8eb2381f14'/>
<id>urn:sha1:7af607c58d7985a0eb70fc3bca6eef8eb2381f14</id>
<content type='text'>
When reading ref records of type "val1", we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.

Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.

Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:

    HEAP SUMMARY:
        in use at exit: 21,098 bytes in 192 blocks
      total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,098 bytes in 192 blocks
      total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/writer: fix index corruption when writing multiple indices</title>
<updated>2024-01-03T17:54:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-01-03T06:22:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ddac965965719f19c2905a7184a7dd55287bf817'/>
<id>urn:sha1:ddac965965719f19c2905a7184a7dd55287bf817</id>
<content type='text'>
Each reftable may contain multiple types of blocks for refs, objects and
reflog records, where each of these may have an index that makes it more
efficient to find the records. It was observed that the index for log
records can become corrupted under certain circumstances, where the
first entry of the index points into the object index instead of to the
log records.

As it turns out, this corruption can occur whenever we write a log index
as well as at least one additional index. Writing records and their index
is basically a two-step process:

  1. We write all blocks for the corresponding record. Each block that
     gets written is added to a list of blocks to index.

  2. Once all blocks were written we finish the section. If at least two
     blocks have been added to the list of blocks to index then we will
     now write the index for those blocks and flush it, as well.

When we have a very large number of blocks then we may decide to write a
multi-level index, which is why we also keep track of the list of the
index blocks in the same way as we previously kept track of the blocks
to index.

Now when we have finished writing all index blocks we clear the index
and flush the last block to disk. This is done in the wrong order though
because flushing the block to disk will re-add it to the list of blocks
to be indexed. The result is that the next section we are about to write
will have an entry in the list of blocks to index that points to the
last block of the preceding section's index, which will corrupt the log
index.

Fix this corruption by clearing the index after having written the last
block.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
