<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/t5324-split-commit-graph.sh, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-11-02T21:17:39Z</updated>
<entry>
<title>Merge branch 'ds/commit-graph-merging-fix'</title>
<updated>2020-11-02T21:17:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-11-02T21:17:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=307a53dd9914f65c9b6399221574e24234e4b49f'/>
<id>urn:sha1:307a53dd9914f65c9b6399221574e24234e4b49f</id>
<content type='text'>
When "git commit-graph" detects the same commit recorded more than
once while it is merging the layers, it used to die.  The code now
ignores all but one of them and continues.

* ds/commit-graph-merging-fix:
  commit-graph: don't write commit-graph when disabled
  commit-graph: ignore duplicates when merging layers
</content>
</entry>
<entry>
<title>commit-graph: don't write commit-graph when disabled</title>
<updated>2020-10-09T21:16:32Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-10-09T20:53:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=85102ac71b98466eaa2b9b5a568c3a1de736202d'/>
<id>urn:sha1:85102ac71b98466eaa2b9b5a568c3a1de736202d</id>
<content type='text'>
The core.commitGraph config setting can be set to 'false' to prevent
parsing commits from the commit-graph file(s). This causes an issue when
trying to write with "--split" which needs to distinguish between
commits that are in the existing commit-graph layers and commits that
are not. The existing mechanism uses parse_commit() and follows by
checking if there is a 'graph_pos' that shows the commit was parsed from
the commit-graph file.

When core.commitGraph=false, we do not parse the commits from the
commit-graph and 'graph_pos' indicates that no commits are in the
existing file. The --split logic moves forward creating a new layer on
top that holds all reachable commits, then possibly merges down into
those layers, resulting in duplicate commits. The previous change makes
that merging process more robust to such a situation in case it happens
in the written commit-graph data.

The easy answer here is to avoid writing a commit-graph if reading the
commit-graph is disabled. Since the resulting commit-graph will would not
be read by subsequent Git processes. This is more natural than forcing
core.commitGraph to be true for the 'write' process.

Reported-by: Thomas Braun &lt;thomas.braun@virtuell-zuhause.de&gt;
Helped-by: Jeff King &lt;peff@peff.net&gt;
Helped-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: ignore duplicates when merging layers</title>
<updated>2020-10-09T21:16:23Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-10-09T20:53:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=150f11574bcb53a6880a03593ff46d6b25fa03a1'/>
<id>urn:sha1:150f11574bcb53a6880a03593ff46d6b25fa03a1</id>
<content type='text'>
Thomas reported [1] that a "git fetch" command was failing with an error
saying "unexpected duplicate commit id". The root cause is that they had
fetch.writeCommitGraph enabled which generates commit-graph chains, and
this instance was merging two layers that both contained the same commit
ID.

[1] https://lore.kernel.org/git/55f8f00c-a61c-67d4-889e-a9501c596c39@virtuell-zuhause.de/

The initial assumption is that Git would not write a commit ID into a
commit-graph layer if it already exists in a lower commit-graph layer.
Somehow, this specific case did get into that situation, leading to this
error.

While unexpected, this isn't actually invalid (as long as the two layers
agree on the metadata for the commit). When we parse a commit that does
not have a graph_pos in the commit_graph_data_slab, we use binary search
in the commit-graph layers to find the commit and set graph_pos. That
position is never used again in this case. However, when we parse a
commit from the commit-graph file, we load its parents from the
commit-graph and assign graph_pos at that point. If those parents were
already parsed from the commit-graph, then nothing needs to be done.
Otherwise, this graph_pos is a valid position in the commit-graph so we
can parse the parents, when necessary.

Thus, this die() is too aggressive. The easiest thing to do would be to
ignore the duplicates.

If we only ignore the duplicates, then we will produce a commit-graph
that has identical commit IDs listed in adjacent positions. This excess
data will never be removed from the commit-graph, which could cascade
into significantly bloated file sizes.

Thankfully, we can collapse the list to erase the duplicate commit
pointers. This allows us to get the end result we want without extra
memory costs and minimal CPU time.

The root cause is due to disabling core.commitGraph, which prevents
parsing commits from the lower layers during a 'git commit-graph write
--split' command. Since we use the 'graph_pos' value to determine
whether a commit is in a lower layer, we never discover that those
commits are already in the commit-graph chain and add them to the top
layer. This layer is then merged down, creating duplicates.

The test added in t5324-split-commit-graph.sh fails without this change.
However, we still have not completely removed the need for this
duplicate check. That will come in a follow-up change.

Reported-by: Thomas Braun &lt;thomas.braun@virtuell-zuhause.de&gt;
Helped-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Co-authored-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/bloom-improvements'</title>
<updated>2020-09-29T21:01:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-09-29T21:01:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=288ed98bf768f4df9b569d51a52c233a1402c0f5'/>
<id>urn:sha1:288ed98bf768f4df9b569d51a52c233a1402c0f5</id>
<content type='text'>
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=&lt;n&gt;'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&amp;&amp;'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
</content>
</entry>
<entry>
<title>commit-graph: introduce 'get_bloom_filter_settings()'</title>
<updated>2020-09-09T19:51:48Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-09-09T15:22:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4f3644056ad2b4c46ed0bcce72f5a1eb5b92bd7f'/>
<id>urn:sha1:4f3644056ad2b4c46ed0bcce72f5a1eb5b92bd7f</id>
<content type='text'>
Many places in the code often need a pointer to the commit-graph's
'struct bloom_filter_settings', in which case they often take the value
from the top-most commit-graph.

In the non-split case, this works as expected. In the split case,
however, things get a little tricky. Not all layers in a chain of
incremental commit-graphs are required to themselves have Bloom data,
and so whether or not some part of the code uses Bloom filters depends
entirely on whether or not the top-most level of the commit-graph chain
has Bloom filters.

This has been the behavior since Bloom filters were introduced, and has
been codified into the tests since a759bfa9ee (t4216: add end to end
tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
requires that Bloom filters are not used in exactly the case described
earlier.

There is no reason that this needs to be the case, since it is perfectly
valid for commits in an earlier layer to have Bloom filters when commits
in a newer layer do not.

Since Bloom settings are guaranteed in practice to be the same for any
layer in a chain that has Bloom data, it is sufficient to traverse the
'-&gt;base_graph' pointer until either (1) a non-null 'struct
bloom_filter_settings *' is found, or (2) until we are at the root of
the commit-graph chain.

Introduce a 'get_bloom_filter_settings()' function that does just this,
and use it instead of purely dereferencing the top-most graph's
'-&gt;bloom_filter_settings' pointer.

While we're at it, add an additional test in t5324 to guard against code
in the commit-graph writing machinery that doesn't correctly handle a
NULL 'struct bloom_filter *'.

Co-authored-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: use the "hash version" byte</title>
<updated>2020-08-17T23:45:14Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-08-17T14:04:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=665d70ad033b115d8c14cdc2bcecf6c8b1947260'/>
<id>urn:sha1:665d70ad033b115d8c14cdc2bcecf6c8b1947260</id>
<content type='text'>
The commit-graph format reserved a byte among the header of the file to
store a "hash version". During the SHA-256 work, this was not modified
because file formats are not necessarily intended to work across hash
versions. If a repository has SHA-256 as its hash algorithm, it
automatically up-shifts the lengths of object names in all necessary
formats.

However, since we have this byte available for adjusting the version, we
can make the file formats more obviously incompatible instead of relying
on other context from the repository.

Update the oid_version() method in commit-graph.c to add a new value, 2,
for sha-256. This automatically writes the new value in a SHA-256
repository _and_ verifies the value is correct. This is a breaking
change relative to the current 'master' branch since 092b677 (Merge
branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking
relative to any released version of Git.

The test impact is relatively minor: the output of 'test-tool
read-graph' lists the header information, so those instances of '1' need
to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A
more careful test is added that specifically creates a repository of
each type then swaps the commit-graph files. The important value here is
that the "git log" command succeeds while writing a message to stderr.

Helped-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Reviewed-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'bc/sha-256-part-3'</title>
<updated>2020-08-12T01:04:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-08-12T01:04:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e0ad9574ddf5bb14d9ed6808112485ce0da99fea'/>
<id>urn:sha1:e0ad9574ddf5bb14d9ed6808112485ce0da99fea</id>
<content type='text'>
The final leg of SHA-256 transition.

* bc/sha-256-part-3: (39 commits)
  t: remove test_oid_init in tests
  docs: add documentation for extensions.objectFormat
  ci: run tests with SHA-256
  t: make SHA1 prerequisite depend on default hash
  t: allow testing different hash algorithms via environment
  t: add test_oid option to select hash algorithm
  repository: enable SHA-256 support by default
  setup: add support for reading extensions.objectformat
  bundle: add new version for use with SHA-256
  builtin/verify-pack: implement an --object-format option
  http-fetch: set up git directory before parsing pack hashes
  t0410: mark test with SHA1 prerequisite
  t5308: make test work with SHA-256
  t9700: make hash size independent
  t9500: ensure that algorithm info is preserved in config
  t9350: make hash size independent
  t9301: make hash size independent
  t9300: use $ZERO_OID instead of hard-coded object ID
  t9300: abstract away SHA-1-specific constants
  t8011: make hash size independent
  ...
</content>
</entry>
<entry>
<title>t: remove test_oid_init in tests</title>
<updated>2020-07-30T16:16:49Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2020-07-29T23:14:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e023ff0691ca207d421a0e75ea23c132ada9142a'/>
<id>urn:sha1:e023ff0691ca207d421a0e75ea23c132ada9142a</id>
<content type='text'>
Now that we call test_oid_init in the setup for all test scripts,
there's no point in calling it individually.  Remove all of the places
where we've done so to help keep tests tidy.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Reviewed-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t5324: reorder `run_with_limited_open_files test_might_fail`</title>
<updated>2020-07-07T20:07:27Z</updated>
<author>
<name>Denton Liu</name>
<email>liu.denton@gmail.com</email>
</author>
<published>2020-07-07T06:04:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6861ac806b98076121ea69ed41476b14818f07d6'/>
<id>urn:sha1:6861ac806b98076121ea69ed41476b14818f07d6</id>
<content type='text'>
In the future, we plan on only allowing `test_might_fail` to work on a
restricted subset of commands, including `git`. Reorder the commands so
that `run_with_limited_open_files` comes before `test_might_fail`. This
way, `test_might_fail` operates on a git command.

Signed-off-by: Denton Liu &lt;liu.denton@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/commit-graph-perm-bits'</title>
<updated>2020-05-05T21:54:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-05-05T21:54:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1d7e9c4c4e60375146ad70ed5c555574a653e92a'/>
<id>urn:sha1:1d7e9c4c4e60375146ad70ed5c555574a653e92a</id>
<content type='text'>
Some of the files commit-graph subsystem keeps on disk did not
correctly honor the core.sharedRepository settings and some were
left read-write.

* tb/commit-graph-perm-bits:
  commit-graph.c: make 'commit-graph-chain's read-only
  commit-graph.c: ensure graph layers respect core.sharedRepository
  commit-graph.c: write non-split graphs as read-only
  lockfile.c: introduce 'hold_lock_file_for_update_mode'
  tempfile.c: introduce 'create_tempfile_mode'
</content>
</entry>
</feed>
