<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/t5318-commit-graph.sh, branch v2.32.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.32.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.32.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-03-22T21:00:23Z</updated>
<entry>
<title>Merge branch 'ds/commit-graph-generation-config'</title>
<updated>2021-03-22T21:00:23Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-03-22T21:00:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d20fa3cf9dba095b9d8836b177da697425ce8d3c'/>
<id>urn:sha1:d20fa3cf9dba095b9d8836b177da697425ce8d3c</id>
<content type='text'>
A new configuration variable has been introduced to allow choosing
which version of the generation number gets used in the
commit-graph file.

* ds/commit-graph-generation-config:
  commit-graph: use config to specify generation type
  commit-graph: create local repository pointer
</content>
</entry>
<entry>
<title>Merge branch 'ds/chunked-file-api'</title>
<updated>2021-03-01T22:02:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-03-01T22:02:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=660dd97a62da66ffe95df20a9e27a01e39ae473f'/>
<id>urn:sha1:660dd97a62da66ffe95df20a9e27a01e39ae473f</id>
<content type='text'>
The common code to deal with "chunked file format" that is shared
by the multi-pack-index and commit-graph files have been factored
out, to help codepaths for both filetypes to become more robust.

* ds/chunked-file-api:
  commit-graph.c: display correct number of chunks when writing
  chunk-format: add technical docs
  chunk-format: restore duplicate chunk checks
  midx: use 64-bit multiplication for chunk sizes
  midx: use chunk-format read API
  commit-graph: use chunk-format read API
  chunk-format: create read chunk API
  midx: use chunk-format API in write_midx_internal()
  midx: drop chunk progress during write
  midx: return success/failure in chunk write methods
  midx: add num_large_offsets to write_midx_context
  midx: add pack_perm to write_midx_context
  midx: add entries to write_midx_context
  midx: use context in write_midx_pack_names()
  midx: rename pack_info to write_midx_context
  commit-graph: use chunk-format write API
  chunk-format: create chunk format write API
  commit-graph: anonymize data in chunk_write_fn
</content>
</entry>
<entry>
<title>commit-graph: use config to specify generation type</title>
<updated>2021-02-25T23:10:41Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-25T18:19:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=702110aac63556b4572d9c7b65c9123ec8038ebf'/>
<id>urn:sha1:702110aac63556b4572d9c7b65c9123ec8038ebf</id>
<content type='text'>
We have two established generation number versions:

 1: topological levels
 2: corrected commit dates

The corrected commit dates are enabled by default, but they also write
extra data in the GDAT and GDOV chunks. Services that host Git data
might want to have more control over when this feature rolls out than
just updating the Git binaries.

Add a new "commitGraph.generationVersion" config option that specifies
the intended generation number version. If this value is less than 2,
then the GDAT chunk is never written _or read_ from an existing file.

This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT
environment variable in the test suite. Remove it.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ab/test-lib'</title>
<updated>2021-02-23T00:12:43Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-02-23T00:12:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d68fccef86af1ceb5ae3a9abb02c8fe4f6ff3317'/>
<id>urn:sha1:d68fccef86af1ceb5ae3a9abb02c8fe4f6ff3317</id>
<content type='text'>
Test framework clean-up.

* ab/test-lib:
  test-lib-functions: assert correct parameter count
  test-lib-functions: remove bug-inducing "diagnostics" helper param
  test libs: rename "diff-lib" to "lib-diff"
  t/.gitattributes: sort lines
  test-lib-functions: move function to lib-bitmap.sh
  test libs: rename gitweb-lib.sh to lib-gitweb.sh
  test libs: rename bundle helper to "lib-bundle.sh"
  test-lib-functions: remove generate_zero_bytes() wrapper
  test-lib-functions: move test_set_index_version() to its user
  test lib: change "error" to "BUG" as appropriate
  test-lib: remove check_var_migration
</content>
</entry>
<entry>
<title>commit-graph: use chunk-format read API</title>
<updated>2021-02-18T21:38:16Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-18T14:07:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2692c2f6fd1cc9f74e433cb05d5756575682b9ab'/>
<id>urn:sha1:2692c2f6fd1cc9f74e433cb05d5756575682b9ab</id>
<content type='text'>
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). While the current
implementation loses the duplicate-chunk detection, that will be added
in a future change.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>test-lib-functions: remove generate_zero_bytes() wrapper</title>
<updated>2021-02-10T21:54:34Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-02-09T21:41:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f3ad2bf4712aa41d710ce3eb8207c12dda43710d'/>
<id>urn:sha1:f3ad2bf4712aa41d710ce3eb8207c12dda43710d</id>
<content type='text'>
Since d5cfd142ec1 (tests: teach the test-tool to generate NUL bytes
and use it, 2019-02-14) the generate_zero_bytes() functions has been a
thin wrapper for "test-tool genzeros". Let's have its only user call
that directly instead.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: prepare commit graph</title>
<updated>2021-02-02T05:03:36Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-02T03:01:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bc50d6c91f4002b4197a9f5ea5dfdc3c9f105a1c'/>
<id>urn:sha1:bc50d6c91f4002b4197a9f5ea5dfdc3c9f105a1c</id>
<content type='text'>
Before checking if the repository has a commit-graph loaded, be sure
to run prepare_commit_graph(). This is necessary because otherwise
the topo_levels slab is not initialized. As we compute topo_levels for
the new commits, we iterate further into the lower layers since the
first visit to each commit looks as though the topo_level is not
populated.

By properly initializing the topo_slab, we fix the previously broken
case of a split commit graph where a base layer has the
generation_data_overflow chunk.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Reviewed-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: always parse before commit_graph_data_at()</title>
<updated>2021-02-02T05:03:36Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-01T17:15:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=90cb1c47c7cbf6cdd5152c1371a2732af59911a4'/>
<id>urn:sha1:90cb1c47c7cbf6cdd5152c1371a2732af59911a4</id>
<content type='text'>
There is a subtle failure happening when computing corrected commit
dates with --split enabled. It requires a base layer needing the
generation_data_overflow chunk. Then, the next layer on top
erroneously thinks it needs an overflow chunk due to a bug leading
to recalculating all reachable generation numbers. The output of
the failure is

  BUG: commit-graph.c:1912: expected to write 8 bytes to
  chunk 47444f56, but wrote 0 instead

These "expected" 8 bytes are due to re-computing the corrected
commit date for the lower layer but the new layer does not need
any overflow.

Add a test to t5318-commit-graph.sh that demonstrates this bug. However,
it does not trigger consistently with the existing code.

The generation number data is stored in a slab and accessed by
commit_graph_data_at(). This data is initialized when parsing a commit,
but is otherwise used assuming it has been populated. The loop in
compute_generation_numbers() did not enforce that all reachable
commits were parsed and had correct values. This could lead to some
problems when writing a commit-graph with corrected commit dates based
on a commit-graph without them.

It has been difficult to identify the issue here because it was so hard
to reproduce. It relies on this uninitialized data having a non-zero
value, but also on specifically in a way that overwrites the existing
data.

This patch adds the extra parse to ensure the data is filled before we
compute the generation number of a commit. This triggers the new test
to fail because the generation number overflow count does not match
between this computation and the write for that chunk.

The actual fix will follow as the next few changes.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Reviewed-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: implement generation data chunk</title>
<updated>2021-01-19T00:21:18Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2021-01-16T18:11:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e8b63005c48696a26f976f5f9b0ccaf1983e439d'/>
<id>urn:sha1:e8b63005c48696a26f976f5f9b0ccaf1983e439d</id>
<content type='text'>
As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.

We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.

Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).

We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.

To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:

We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.

If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.

We test the overflow-related code with the following repo history:

           F - N - U
          /         \
U - N - U            N
         \          /
	  N - F - N

Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.

The largest offset observed is 2 ^ 31, just large enough to overflow.

[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Reviewed-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: use the "hash version" byte</title>
<updated>2020-08-17T23:45:14Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-08-17T14:04:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=665d70ad033b115d8c14cdc2bcecf6c8b1947260'/>
<id>urn:sha1:665d70ad033b115d8c14cdc2bcecf6c8b1947260</id>
<content type='text'>
The commit-graph format reserved a byte among the header of the file to
store a "hash version". During the SHA-256 work, this was not modified
because file formats are not necessarily intended to work across hash
versions. If a repository has SHA-256 as its hash algorithm, it
automatically up-shifts the lengths of object names in all necessary
formats.

However, since we have this byte available for adjusting the version, we
can make the file formats more obviously incompatible instead of relying
on other context from the repository.

Update the oid_version() method in commit-graph.c to add a new value, 2,
for sha-256. This automatically writes the new value in a SHA-256
repository _and_ verifies the value is correct. This is a breaking
change relative to the current 'master' branch since 092b677 (Merge
branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking
relative to any released version of Git.

The test impact is relatively minor: the output of 'test-tool
read-graph' lists the header information, so those instances of '1' need
to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A
more careful test is added that specifically creates a repository of
each type then swaps the commit-graph files. The important value here is
that the "git log" command succeeds while writing a message to stderr.

Helped-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Reviewed-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
