<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/reftable/block.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-05-19T23:02:48Z</updated>
<entry>
<title>Merge branch 'ps/reftable-read-block-perffix'</title>
<updated>2025-05-19T23:02:48Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-05-19T23:02:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=90eedabbf7cd949f76875821e983bce4e5172d23'/>
<id>urn:sha1:90eedabbf7cd949f76875821e983bce4e5172d23</id>
<content type='text'>
Performance regression in not-yet-released code has been corrected.

* ps/reftable-read-block-perffix:
  reftable: fix perf regression when reading blocks of unwanted type
</content>
</entry>
<entry>
<title>reftable: fix perf regression when reading blocks of unwanted type</title>
<updated>2025-05-12T17:55:24Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-05-12T15:15:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1970333644fad127c68046697b9b86fd8d7f28c2'/>
<id>urn:sha1:1970333644fad127c68046697b9b86fd8d7f28c2</id>
<content type='text'>
In fd888311fbc (reftable/table: move reading block into block reader,
2025-04-07), we have refactored how reftable blocks are read so that
most of the logic is contained in the "block.c" subsystem itself. Most
importantly, the whole logic to read the data itself is now contained in
that subsystem.

This change caused a significant performance regression though when
reading blocks that aren't of the specific type one is searching for:

    Benchmark 1: update-ref: create 100k refs (revision = fd888311fbc~)
      Time (mean ± σ):      2.171 s ±  0.028 s    [User: 1.189 s, System: 0.977 s]
      Range (min … max):    2.117 s …  2.206 s    10 runs

    Benchmark 2: update-ref: create 100k refs (revision = fd888311fbc)
      Time (mean ± σ):      3.418 s ±  0.030 s    [User: 2.371 s, System: 1.037 s]
      Range (min … max):    3.377 s …  3.473 s    10 runs

    Summary
      update-ref: create 100k refs (revision = fd888311fbc~) ran
        1.57 ± 0.02 times faster than update-ref: create 100k refs (revision = fd888311fbc)

The root caute of the performance regression is that we changed when
exactly blocks of an uninteresting type are being discarded. Previous to
the refactoring in the mentioned commit we'd load the block data, read
its type, notice that it's not the wanted type and discard the block.
After the commit though we don't discard the block immediately, but we
fully decode it only to realize that it's not the desired type. We then
discard the block again, but have already performed a bunch of pointless
work.

Fix the regression by making `reftable_block_init()` return early in
case the block is not of the desired type. This fixes the performance
hit:

    Benchmark 1: update-ref: create 100k refs (revision = HEAD~)
      Time (mean ± σ):      2.712 s ±  0.018 s    [User: 1.990 s, System: 0.716 s]
      Range (min … max):    2.682 s …  2.741 s    10 runs

    Benchmark 2: update-ref: create 100k refs (revision = HEAD)
      Time (mean ± σ):      1.670 s ±  0.012 s    [User: 0.991 s, System: 0.676 s]
      Range (min … max):    1.652 s …  1.693 s    10 runs

    Summary
      update-ref: create 100k refs (revision = HEAD) ran
        1.62 ± 0.02 times faster than update-ref: create 100k refs (revision = HEAD~)

Note that the baseline performance is lower than in the original due to
a couple of unrelated performance improvements that have landed since
the original commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/reftable-api-revamp'</title>
<updated>2025-04-29T21:21:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-04-29T21:21:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a819a3da85655031a23abae0f75d0910697fb92c'/>
<id>urn:sha1:a819a3da85655031a23abae0f75d0910697fb92c</id>
<content type='text'>
Overhaul of the reftable API.

* ps/reftable-api-revamp:
  reftable/table: move printing logic into test helper
  reftable/constants: make block types part of the public interface
  reftable/table: introduce iterator for table blocks
  reftable/table: add `reftable_table` to the public interface
  reftable/block: expose a generic iterator over reftable records
  reftable/block: make block iterators reseekable
  reftable/block: store block pointer in the block iterator
  reftable/block: create public interface for reading blocks
  git-zlib: use `struct z_stream_s` instead of typedef
  reftable/block: rename `block_reader` to `reftable_block`
  reftable/block: rename `block` to `block_data`
  reftable/table: move reading block into block reader
  reftable/block: simplify how we track restart points
  reftable/blocksource: consolidate code into a single file
  reftable/reader: rename data structure to "table"
  reftable: fix formatting of the license header
</content>
</entry>
<entry>
<title>Merge branch 'ps/reftable-sans-compat-util'</title>
<updated>2025-04-08T18:43:14Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-04-08T18:43:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6e2a3b8ae0e07c0c31f2247fec49b77b5d903a83'/>
<id>urn:sha1:6e2a3b8ae0e07c0c31f2247fec49b77b5d903a83</id>
<content type='text'>
Make the code in reftable library less reliant on the service
routines it used to borrow from Git proper, to make it easier to
use by external users of the library.

* ps/reftable-sans-compat-util:
  Makefile: skip reftable library for Coccinelle
  reftable: decouple from Git codebase by pulling in "compat/posix.h"
  git-compat-util.h: split out POSIX-emulating bits
  compat/mingw: split out POSIX-related bits
  reftable/basics: introduce `REFTABLE_UNUSED` annotation
  reftable/basics: stop using `SWAP()` macro
  reftable/stack: stop using `sleep_millisec()`
  reftable/system: introduce `reftable_rand()`
  reftable/reader: stop using `ARRAY_SIZE()` macro
  reftable/basics: provide wrappers for big endian conversion
  reftable/basics: stop using `st_mult()` in array allocators
  reftable: stop using `BUG()` in trivial cases
  reftable/record: don't `BUG()` in `reftable_record_cmp()`
  reftable/record: stop using `BUG()` in `reftable_record_init()`
  reftable/record: stop using `COPY_ARRAY()`
  reftable/blocksource: stop using `xmmap()`
  reftable/stack: stop using `write_in_full()`
  reftable/stack: stop using `read_in_full()`
</content>
</entry>
<entry>
<title>reftable/constants: make block types part of the public interface</title>
<updated>2025-04-07T21:53:12Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-07T13:16:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0f8ee94b636b5ab183c62b8fdd26c1611c2b86f4'/>
<id>urn:sha1:0f8ee94b636b5ab183c62b8fdd26c1611c2b86f4</id>
<content type='text'>
Now that reftable blocks can be read individually via the public
interface it becomes necessary for callers to be able to distinguish the
different types of blocks. Expose the relevant constants.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/block: expose a generic iterator over reftable records</title>
<updated>2025-04-07T21:53:12Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-07T13:16:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=50d845947734f45970439518047ab1f79628bb7e'/>
<id>urn:sha1:50d845947734f45970439518047ab1f79628bb7e</id>
<content type='text'>
Expose a generic iterator over reftable records and expose it via the
public interface. Together with an upcoming iterator for reftable blocks
contained in a table this will allow users to trivially iterate through
blocks and their respective records individually.

This functionality will be used to implement consistency checks for the
reftable backend, which requires more fine-grained control over how we
read data.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/block: make block iterators reseekable</title>
<updated>2025-04-07T21:53:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-07T13:16:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6da48a5e00ae77c4092e78ac8ac8641a90660343'/>
<id>urn:sha1:6da48a5e00ae77c4092e78ac8ac8641a90660343</id>
<content type='text'>
Refactor the block iterators so that initialization and seeking are
different from one another. This makes the iterator trivially reseekable
by storing the pointer to the block at initialization time, which we can
then reuse on every seek.

This refactoring prepares the code for exposing a `reftable_iterator`
interface for blocks in a subsequent commit. Callsites are adjusted
accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/block: store block pointer in the block iterator</title>
<updated>2025-04-07T21:53:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-07T13:16:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=156d79cef0de565408e41f840bbda87114367977'/>
<id>urn:sha1:156d79cef0de565408e41f840bbda87114367977</id>
<content type='text'>
The block iterator requires access to a bunch of data from the
underlying `reftable_block` that it is iterating over. This data is
stored by copying over relevant data into a separate set of variables.
This has multiple downsides:

  - We require more storage space than necessary. This is more of a
    theoretical issue as we shouldn't ever have many blocks.

  - We have to perform more bookkeeping, and the variable names are
    inconsistent across the two data structures. This can lead to some
    confusion.

  - The lifetime of the block iterator is tied to the block anyway, but
    we hide that a bit by only storing pointers pointing into the block.

There isn't really any good reason why we rip out parts of the block
instead of storing a pointer to the block itself.

Refactor the code to do so. Despite being simpler, it also allows us to
decouple the lifetime of the block iterator from seeking in a subsequent
commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/block: rename `block_reader` to `reftable_block`</title>
<updated>2025-04-07T21:53:10Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-07T13:16:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=12a9aa8cb76c120bca7609ac7ae57929d52605e9'/>
<id>urn:sha1:12a9aa8cb76c120bca7609ac7ae57929d52605e9</id>
<content type='text'>
The `block_reader` structure is used to access parsed data of a reftable
block. The structure is currently treated as an internal implementation
detail and not exposed via our public interfaces. The functionality
provided by the structure is useful to external users of the reftable
library though, for example when implementing consistency checks that
need to scan through the blocks manually.

Rename the structure to `reftable_block` now that the name has been made
available in the preceding commit. This name is in line with the naming
schema used for other data structures like `reftable_table` in that it
describes the underlying entity that it provides access to.

The new data structure isn't yet exposed via the public interface, which
is left for a subsequent commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/block: rename `block` to `block_data`</title>
<updated>2025-04-07T21:53:10Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-07T13:16:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2b3362c10d39efe09fe9ef16122df3bed5149032'/>
<id>urn:sha1:2b3362c10d39efe09fe9ef16122df3bed5149032</id>
<content type='text'>
The `reftable_block` structure associates a byte slice with a block
source. As such it only holds the data of a reftable block without
actually encoding any of the details for how to access that data.

Rename the structure to instead be called `reftable_block_data`. Besides
clarifying that this really only holds data, it also allows us to rename
the `reftable_block_reader` to `reftable_block` in the next commit, as
this is the structure that actually encapsulates access to the reftable
blocks.

Rename the `struct reftable_block_reader::block` member accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
