<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit-graph.h, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-09-29T21:01:20Z</updated>
<entry>
<title>Merge branch 'tb/bloom-improvements'</title>
<updated>2020-09-29T21:01:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-09-29T21:01:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=288ed98bf768f4df9b569d51a52c233a1402c0f5'/>
<id>urn:sha1:288ed98bf768f4df9b569d51a52c233a1402c0f5</id>
<content type='text'>
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=&lt;n&gt;'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&amp;&amp;'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
</content>
</entry>
<entry>
<title>builtin/commit-graph.c: introduce '--max-new-filters=&lt;n&gt;'</title>
<updated>2020-09-18T17:35:39Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-09-18T13:27:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=809e0327f579267ea78a1b2f727d3b63c1f5d044'/>
<id>urn:sha1:809e0327f579267ea78a1b2f727d3b63c1f5d044</id>
<content type='text'>
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.

Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).

This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:

  * There may be lots of filters whose diffs take a long time to
    generate (for example, they have close to the maximum number of
    changes, diffing itself takes a long time, etc).

  * Old-style commit-graphs (which encode filters with too many entries
    as not having been computed at all) cause us to waste time
    recomputing filters that appear to have not been computed only to
    discover that they are too-large.

This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.

To make this command behave more predictably, introduce
'--max-new-filters=&lt;n&gt;' to allow computing at most '&lt;n&gt;' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.

Helped-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: rename 'split_commit_graph_opts'</title>
<updated>2020-09-18T04:55:50Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-09-18T02:59:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=98bb796191f7234c88b7a97f587d37ffbd130289'/>
<id>urn:sha1:98bb796191f7234c88b7a97f587d37ffbd130289</id>
<content type='text'>
In the subsequent commit, additional options will be added to the
commit-graph API which have nothing to do with splitting.

Rename the 'split_commit_graph_opts' structure to the more-generic
'commit_graph_opts' to encompass both. Likewise, rename the 'flags'
member to instead be 'split_flags' to clarify that it only has to do
with the behavior implied by '--split'.

Suggested-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>maintenance: add commit-graph task</title>
<updated>2020-09-17T18:30:05Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-09-17T18:11:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=663b2b1b90bf76275044824ddeca96aaec240f09'/>
<id>urn:sha1:663b2b1b90bf76275044824ddeca96aaec240f09</id>
<content type='text'>
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: pass a 'struct repository *' in more places</title>
<updated>2020-09-09T19:51:48Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-09-09T15:22:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ab14d0676c58389e6f0ccbb9569b07045a05b6df'/>
<id>urn:sha1:ab14d0676c58389e6f0ccbb9569b07045a05b6df</id>
<content type='text'>
In a future commit, some commit-graph internals will want access to
'r-&gt;settings', but we only have the 'struct object_directory *'
corresponding to that repository.

Add an additional parameter to pass the repository around in more
places.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: introduce 'get_bloom_filter_settings()'</title>
<updated>2020-09-09T19:51:48Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-09-09T15:22:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4f3644056ad2b4c46ed0bcce72f5a1eb5b92bd7f'/>
<id>urn:sha1:4f3644056ad2b4c46ed0bcce72f5a1eb5b92bd7f</id>
<content type='text'>
Many places in the code often need a pointer to the commit-graph's
'struct bloom_filter_settings', in which case they often take the value
from the top-most commit-graph.

In the non-split case, this works as expected. In the split case,
however, things get a little tricky. Not all layers in a chain of
incremental commit-graphs are required to themselves have Bloom data,
and so whether or not some part of the code uses Bloom filters depends
entirely on whether or not the top-most level of the commit-graph chain
has Bloom filters.

This has been the behavior since Bloom filters were introduced, and has
been codified into the tests since a759bfa9ee (t4216: add end to end
tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
requires that Bloom filters are not used in exactly the case described
earlier.

There is no reason that this needs to be the case, since it is perfectly
valid for commits in an earlier layer to have Bloom filters when commits
in a newer layer do not.

Since Bloom settings are guaranteed in practice to be the same for any
layer in a chain that has Bloom data, it is sufficient to traverse the
'-&gt;base_graph' pointer until either (1) a non-null 'struct
bloom_filter_settings *' is found, or (2) until we are at the root of
the commit-graph chain.

Introduce a 'get_bloom_filter_settings()' function that does just this,
and use it instead of purely dereferencing the top-most graph's
'-&gt;bloom_filter_settings' pointer.

While we're at it, add an additional test in t5324 to guard against code
in the commit-graph writing machinery that doesn't correctly handle a
NULL 'struct bloom_filter *'.

Co-authored-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/commit-graph-bloom-updates' into master</title>
<updated>2020-07-30T20:20:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-07-30T20:20:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=70cdbbe3a7028b9c1eb7542a0aa038b971d3fb27'/>
<id>urn:sha1:70cdbbe3a7028b9c1eb7542a0aa038b971d3fb27</id>
<content type='text'>
Updates to the changed-paths bloom filter.

* ds/commit-graph-bloom-updates:
  commit-graph: check all leading directories in changed path Bloom filters
  revision: empty pathspecs should not use Bloom filters
  revision.c: fix whitespace
  commit-graph: check chunk sizes after writing
  commit-graph: simplify chunk writes into loop
  commit-graph: unify the signatures of all write_graph_chunk_*() functions
  commit-graph: persist existence of changed-paths
  bloom: fix logic in get_bloom_filter()
  commit-graph: change test to die on parse, not load
  commit-graph: place bloom_settings in context
</content>
</entry>
<entry>
<title>Merge branch 'sg/commit-graph-cleanups' into master</title>
<updated>2020-07-30T20:20:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-07-30T20:20:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=de6dda0dc3d95cd9aaf43a7b85ceeb57316dcc27'/>
<id>urn:sha1:de6dda0dc3d95cd9aaf43a7b85ceeb57316dcc27</id>
<content type='text'>
The changed-path Bloom filter is improved using ideas from an
independent implementation.

* sg/commit-graph-cleanups:
  commit-graph: simplify write_commit_graph_file() #2
  commit-graph: simplify write_commit_graph_file() #1
  commit-graph: simplify parse_commit_graph() #2
  commit-graph: simplify parse_commit_graph() #1
  commit-graph: clean up #includes
  diff.h: drop diff_tree_oid() &amp; friends' return value
  commit-slab: add a function to deep free entries on the slab
  commit-graph-format.txt: all multi-byte numbers are in network byte order
  commit-graph: fix parsing the Chunk Lookup table
  tree-walk.c: don't match submodule entries for 'submod/anything'
</content>
</entry>
<entry>
<title>commit-graph: persist existence of changed-paths</title>
<updated>2020-07-01T21:17:43Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-07-01T13:27:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0087a87ba8fc69b27dde0183ec24ade367a4aa5b'/>
<id>urn:sha1:0087a87ba8fc69b27dde0183ec24ade367a4aa5b</id>
<content type='text'>
The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.

This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.

Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.

In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.

Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: change test to die on parse, not load</title>
<updated>2020-06-24T00:12:08Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-06-23T17:47:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7b671f8c2b6b7de511fd2f6587e4540c25764b61'/>
<id>urn:sha1:7b671f8c2b6b7de511fd2f6587e4540c25764b61</id>
<content type='text'>
43d3561 (commit-graph write: don't die if the existing graph is corrupt,
2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment
variable. This was created to verify that commit-graph was not loaded
when writing a new non-incremental commit-graph.

An upcoming change wants to load a commit-graph in some valuable cases,
but we want to maintain that we don't trust the commit-graph data when
writing our new file. Instead of dying on load, instead die if we ever
try to parse a commit from the commit-graph. This functionally verifies
the same intended behavior, but allows a more advanced feature in the
next change.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
