<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit.c, branch v2.43.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.43.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.43.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-02-09T00:22:10Z</updated>
<entry>
<title>Merge branch 'en/header-cleanup' into maint-2.43</title>
<updated>2024-02-09T00:22:10Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-02-09T00:22:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0f7a10a3aad344a28f0f30b32a979925a9629533'/>
<id>urn:sha1:0f7a10a3aad344a28f0f30b32a979925a9629533</id>
<content type='text'>
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
</content>
</entry>
<entry>
<title>Merge branch 'la/trailer-cleanups' into maint-2.43</title>
<updated>2024-02-09T00:22:09Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-02-09T00:22:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=541d0d75e781fad5761414d7a5500b8840f3e5bb'/>
<id>urn:sha1:541d0d75e781fad5761414d7a5500b8840f3e5bb</id>
<content type='text'>
Code clean-up.

* la/trailer-cleanups:
  trailer: use offsets for trailer_start/trailer_end
  trailer: find the end of the log message
  commit: ignore_non_trailer computes number of bytes to ignore
</content>
</entry>
<entry>
<title>treewide: remove unnecessary includes in source files</title>
<updated>2023-12-26T20:04:31Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-12-23T17:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eea0e59ffbed6e33d171ace5be13cde9faa41639'/>
<id>urn:sha1:eea0e59ffbed6e33d171ace5be13cde9faa41639</id>
<content type='text'>
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default</title>
<updated>2023-11-26T01:10:00Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2023-11-24T11:08:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b1df3b3867e351913887121063cbd69de24e83fc'/>
<id>urn:sha1:b1df3b3867e351913887121063cbd69de24e83fc</id>
<content type='text'>
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:

  Benchmark 1: git.v2.42.1 rev-list --count HEAD
    Time (mean ± σ):     658.0 ms ±   5.2 ms    [User: 613.5 ms, System: 44.4 ms]
    Range (min … max):   650.2 ms … 666.0 ms    10 runs

  Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
    Time (mean ± σ):      1.333 s ±  0.019 s    [User: 1.263 s, System: 0.069 s]
    Range (min … max):    1.302 s …  1.361 s    10 runs

  Summary
    git.v2.42.1 rev-list --count HEAD ran
      2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD

While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.

With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).

This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:

  Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
    Time (mean ± σ):      2.947 s ±  0.003 s    [User: 2.412 s, System: 0.534 s]
    Range (min … max):    2.943 s …  2.949 s    3 runs

  Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
    Time (mean ± σ):      2.724 s ±  0.030 s    [User: 2.207 s, System: 0.514 s]
    Range (min … max):    2.704 s …  2.759 s    3 runs

  Summary
    GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
      1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted

So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.

Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.

Reported-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit: detect commits that exist in commit-graph but not in the ODB</title>
<updated>2023-11-01T03:04:06Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2023-10-31T07:16:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7a5d604443ffc7afcd3788818f8fe00fc68c054d'/>
<id>urn:sha1:7a5d604443ffc7afcd3788818f8fe00fc68c054d</id>
<content type='text'>
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.

As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.

To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.

We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.

For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.

It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.

This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:

    Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
      Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
      Range (min … max):    2.894 s …  2.950 s    10 runs

    Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
      Range (min … max):    3.780 s …  3.961 s    10 runs

    Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
      Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
      Range (min … max):   13.714 s … 13.995 s    10 runs

    Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
      Range (min … max):   13.645 s … 14.038 s    10 runs

    Summary
      git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
        1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)

We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit: ignore_non_trailer computes number of bytes to ignore</title>
<updated>2023-10-20T21:25:12Z</updated>
<author>
<name>Linus Arver</name>
<email>linusa@google.com</email>
</author>
<published>2023-10-20T19:01:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7cb26a1722a7f7d9f1b01f2dcde8fe81d3a96bc1'/>
<id>urn:sha1:7cb26a1722a7f7d9f1b01f2dcde8fe81d3a96bc1</id>
<content type='text'>
ignore_non_trailer() returns the _number of bytes_ that should be
ignored from the end of the log message. It does not by itself "ignore"
anything.

Rename this function to remove the leading "ignore" verb, to sound more
like a quantity than an action.

Signed-off-by: Linus Arver &lt;linusa@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rs/pack-objects-parseopt-fix'</title>
<updated>2023-07-28T16:45:22Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-07-28T16:45:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ddcb8fd8b94c066406c2a4f41e04122cb2a742ad'/>
<id>urn:sha1:ddcb8fd8b94c066406c2a4f41e04122cb2a742ad</id>
<content type='text'>
Command line parser fix.

* rs/pack-objects-parseopt-fix:
  pack-objects: fix --no-quiet
  pack-objects: fix --no-keep-true-parents
</content>
</entry>
<entry>
<title>pack-objects: fix --no-keep-true-parents</title>
<updated>2023-07-21T17:02:59Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2023-07-21T12:41:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3a5f30874120880ddaa53d0db4c3b4ec0a07297a'/>
<id>urn:sha1:3a5f30874120880ddaa53d0db4c3b4ec0a07297a</id>
<content type='text'>
Since 99fb6e04cb (pack-objects: convert to use parse_options(),
2012-02-01) git pack-objects has accepted --no-keep-true-parents, but
this option does the same as --keep-true-parents.  That's because it's
defined using OPT_SET_INT with a value of 0, which sets 0 when negated
as well.

Turn --no-keep-true-parents into the opposite of --keep-true-parents by
using OPT_BOOL and storing the option's status directly in a variable
named "grafts_keep_true_parents" instead of in negative form in
"grafts_replace_parents".

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store-ll.h: split this header out of object-store.h</title>
<updated>2023-06-21T20:39:54Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:34:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a034e9106ff1a4cb6fcb6f2ea3a1a47b4d2ba173'/>
<id>urn:sha1:a034e9106ff1a4cb6fcb6f2ea3a1a47b4d2ba173</id>
<content type='text'>
The vast majority of files including object-store.h did not need dir.h
nor khash.h.  Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.

After this patch:
    $ git grep -h include..object-store | sort | uniq -c
          2 #include "object-store.h"
        129 #include "object-store-ll.h"

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/header-split-cache-h-part-2'</title>
<updated>2023-05-09T23:45:46Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-05-09T23:45:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ccd12a3d6cc62f51b746654ae56e26d92f89ba92'/>
<id>urn:sha1:ccd12a3d6cc62f51b746654ae56e26d92f89ba92</id>
<content type='text'>
More header clean-up.

* en/header-split-cache-h-part-2: (22 commits)
  reftable: ensure git-compat-util.h is the first (indirect) include
  diff.h: reduce unnecessary includes
  object-store.h: reduce unnecessary includes
  commit.h: reduce unnecessary includes
  fsmonitor: reduce includes of cache.h
  cache.h: remove unnecessary headers
  treewide: remove cache.h inclusion due to previous changes
  cache,tree: move basic name compare functions from read-cache to tree
  cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
  hash-ll.h: split out of hash.h to remove dependency on repository.h
  tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
  dir.h: move DTYPE defines from cache.h
  versioncmp.h: move declarations for versioncmp.c functions from cache.h
  ws.h: move declarations for ws.c functions from cache.h
  match-trees.h: move declarations for match-trees.c functions from cache.h
  pkt-line.h: move declarations for pkt-line.c functions from cache.h
  base85.h: move declarations for base85.c functions from cache.h
  copy.h: move declarations for copy.c functions from cache.h
  server-info.h: move declarations for server-info.c functions from cache.h
  packfile.h: move pack_window and pack_entry from cache.h
  ...
</content>
</entry>
</feed>
