<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/reftable/reader.c, branch v2.46.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.46.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.46.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-05-30T21:15:12Z</updated>
<entry>
<title>Merge branch 'ps/reftable-reusable-iterator'</title>
<updated>2024-05-30T21:15:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-05-30T21:15:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=67ce50ba26507e99c53dcd4b1d85ad8565a31c23'/>
<id>urn:sha1:67ce50ba26507e99c53dcd4b1d85ad8565a31c23</id>
<content type='text'>
Code clean-up to make the reftable iterator closer to be reusable.

* ps/reftable-reusable-iterator:
  reftable/merged: adapt interface to allow reuse of iterators
  reftable/stack: provide convenience functions to create iterators
  reftable/reader: adapt interface to allow reuse of iterators
  reftable/generic: adapt interface to allow reuse of iterators
  reftable/generic: move seeking of records into the iterator
  reftable/merged: simplify indices for subiterators
  reftable/merged: split up initialization and seeking of records
  reftable/reader: set up the reader when initializing table iterator
  reftable/reader: inline `reader_seek_internal()`
  reftable/reader: separate concerns of table iter and reftable reader
  reftable/reader: unify indexed and linear seeking
  reftable/reader: avoid copying index iterator
  reftable/block: use `size_t` to track restart point index
</content>
</entry>
<entry>
<title>reftable/reader: adapt interface to allow reuse of iterators</title>
<updated>2024-05-14T00:04:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0e7be2b3ea444fc5375e76e42d81b1e1d3d4971f'/>
<id>urn:sha1:0e7be2b3ea444fc5375e76e42d81b1e1d3d4971f</id>
<content type='text'>
Refactor the interfaces exposed by `struct reftable_reader` and `struct
table_iterator` such that they support iterator reuse. This is done by
separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/generic: move seeking of records into the iterator</title>
<updated>2024-05-14T00:04:18Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5bf96e0c393827481afab55d3b73ae09ec1ea0de'/>
<id>urn:sha1:5bf96e0c393827481afab55d3b73ae09ec1ea0de</id>
<content type='text'>
Reftable iterators are created by seeking on the parent structure of a
corresponding record. For example, to create an iterator for the merged
table you would call `reftable_merged_table_seek_ref()`. Most notably,
it is not posible to create an iterator and then seek it afterwards.

While this may be a bit easier to reason about, it comes with two
significant downsides. The first downside is that the logic to find
records is split up between the parent data structure and the iterator
itself. Conceptually, it is more straight forward if all that logic was
contained in a single place, which should be the iterator.

The second and more significant downside is that it is impossible to
reuse iterators for multiple seeks. Whenever you want to look up a
record, you need to re-create the whole infrastructure again, which is
quite a waste of time. Furthermore, it is impossible to optimize seeks,
such as when seeking the same record multiple times.

To address this, we essentially split up the concerns properly such that
the parent data structure is responsible for setting up the iterator via
a new `init_iter()` callback, whereas the iterator handles seeks via a
new `seek()` callback. This will eventually allow us to call `seek()` on
the iterator multiple times, where every iterator can potentially
optimize for certain cases.

Note that at this point in time we are not yet ready to reuse the
iterators. This will be left for a future patch series.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: set up the reader when initializing table iterator</title>
<updated>2024-05-14T00:04:17Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c82692f75591b320fbe5f4d2018505bd902f95e6'/>
<id>urn:sha1:c82692f75591b320fbe5f4d2018505bd902f95e6</id>
<content type='text'>
All the seeking functions accept a `struct reftable_reader` as input
such that they can use the reader to look up the respective blocks.
Refactor the code to instead set up the reader as a member of `struct
table_iter` during initialization such that we don't have to pass the
reader on every single call.

This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: inline `reader_seek_internal()`</title>
<updated>2024-05-14T00:04:17Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f1e3c12196b6c91086b58495499db0f4802fa5d1'/>
<id>urn:sha1:f1e3c12196b6c91086b58495499db0f4802fa5d1</id>
<content type='text'>
We have both `reader_seek()` and `reader_seek_internal()`, where the
former function only exists so that we can exit early in case the given
table has no records of the sought-after type.

Merge these two functions into one.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: separate concerns of table iter and reftable reader</title>
<updated>2024-05-14T00:04:17Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=81a03a323664d3d14a25e2ef8b29fe52dcef2126'/>
<id>urn:sha1:81a03a323664d3d14a25e2ef8b29fe52dcef2126</id>
<content type='text'>
In "reftable/reader.c" we implement two different interfaces:

  - The reftable reader contains the logic to read reftables.

  - The table iterator is used to iterate through a single reftable read
    by the reader.

The way those two types are used in the code is somewhat confusing
though because seeking inside a table is implemented as if it was part
of the reftable reader, even though it is ultimately more of a detail
implemented by the table iterator.

Make the boundary between those two types clearer by renaming functions
that seek records in a table such that they clearly belong to the table
iterator's logic.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: unify indexed and linear seeking</title>
<updated>2024-05-14T00:04:16Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dfdd1455bbb2eed6674964000d037cd213687a33'/>
<id>urn:sha1:dfdd1455bbb2eed6674964000d037cd213687a33</id>
<content type='text'>
In `reader_seek_internal()` we either end up doing an indexed seek when
there is one or a linear seek otherwise. These two code paths are
disjunct without a good reason, where the indexed seek will cause us to
exit early.

Refactor the two code paths such that it becomes possible to share a bit
more code between them.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/reader: avoid copying index iterator</title>
<updated>2024-05-14T00:04:16Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:47:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9a59b65dba0820671753f636e9417bfd63ea20c1'/>
<id>urn:sha1:9a59b65dba0820671753f636e9417bfd63ea20c1</id>
<content type='text'>
When doing an indexed seek we need to walk down the multi-level index
until we finally hit a record of the desired indexed type. This loop
performs a copy of the index iterator on every iteration, which is both
hard to understand and completely unnecessary.

Refactor the code so that we use a single iterator to walk down the
indices, only.

Note that while this should improve performance, the improvement is
negligible in all but the most unreasonable repositories. This is
because the effect is only really noticeable when we have to walk down
many levels of indices, which is not something that a repository would
typically have. So the motivation for this change is really only about
readability.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/dump: support dumping a table's block structure</title>
<updated>2024-05-14T00:02:38Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-13T08:18:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fcf341890ef583f519bc2746940c048bb8261d3d'/>
<id>urn:sha1:fcf341890ef583f519bc2746940c048bb8261d3d</id>
<content type='text'>
We're about to introduce new configs that will allow users to have more
control over how exactly reftables are written. To verify that these
configs are effective we will need to take a peak into the actual blocks
written by the reftable backend.

Introduce a new mode to the dumping logic that prints out the block
structure. This logic can be invoked via `test-tool dump-reftables -b`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reftable/block: reuse `zstream` state on inflation</title>
<updated>2024-04-15T17:36:09Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-04-08T12:17:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ce1f213cc91cf545736048f28117fe1de89b8134'/>
<id>urn:sha1:ce1f213cc91cf545736048f28117fe1de89b8134</id>
<content type='text'>
When calling `inflateInit()` and `inflate()`, the zlib library will
allocate several data structures for the underlying `zstream` to keep
track of various information. Thus, when inflating repeatedly, it is
possible to optimize memory allocation patterns by reusing the `zstream`
and then calling `inflateReset()` on it to prepare it for the next chunk
of data to inflate.

This is exactly what the reftable code is doing: when iterating through
reflogs we need to potentially inflate many log blocks, but we discard
the `zstream` every single time. Instead, as we reuse the `block_reader`
for each of the blocks anyway, we can initialize the `zstream` once and
then reuse it for subsequent inflations.

Refactor the code to do so, which leads to a significant reduction in
the number of allocations. The following measurements were done when
iterating through 1 million reflog entries. Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
