<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/pack-bitmap.c, branch v2.41.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.41.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-05-02T15:48:22Z</updated>
<entry>
<title>fsck: verify checksums of all .bitmap files</title>
<updated>2023-05-02T15:48:22Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-05-02T13:27:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=756f1bcd29a711075cbaecb02c923328e86a41a4'/>
<id>urn:sha1:756f1bcd29a711075cbaecb02c923328e86a41a4</id>
<content type='text'>
If a filesystem-level corruption occurs in a .bitmap file, Git can react
poorly. This could take the form of a run-time error due to failing to
parse an EWAH bitmap or be more subtle such as returning the wrong set
of objects to a fetch or clone.

A natural first response to either of these kinds of errors is to run
'git fsck' to see if any files are corrupt. This currently ignores all
.bitmap files.

Add checks to 'git fsck' for all .bitmap files that are currently
associated with a multi-pack-index or pack file. Verify their checksums
using the hashfile API.

We iterate through all multi-pack-indexes and pack-files to be sure to
check all .bitmap files, not just the one that would be read by the
process. For example, a multi-pack-index bitmap overrules a pack-bitmap.
However, if the multi-pack-index is removed, the pack-bitmap may be
selected instead. Be thorough to include every file that could become
active in such a way. This includes checking files in alternates.

There is potential that we could extend this effort to check the
structure of the reachability bitmaps themselves, but it is very
expensive to do so. At minimum, it's as expensive as generating the
bitmaps in the first place, and that's assuming that we don't use the
trivial algorithm of verifying each bitmap individually. The trivial
algorithm will result in quadratic behavior (number of objects times
number of bitmapped commits) while the bitmap building operation
constructs a lattice of commits to build bitmaps incrementally and then
generate the final bitmaps from a subset of those commits.

If we were to extend 'git fsck' to check .bitmap file contents more
closely like this, then we would likely want to hide it behind an option
that signals the user is more willing to do expensive operations such as
this.

For testing, set up a repository with a pack-bitmap _and_ a
multi-pack-index bitmap. This requires some file movement to avoid
deleting the pack-bitmap during the repack that creates the
multi-pack-index bitmap. We can then verify that 'git fsck' is checking
all files, not just the "active" bitmap.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/fsck-pack-revindex'</title>
<updated>2023-04-27T23:00:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-27T23:00:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a02675ad907a31205b2de86a945eec327de38132'/>
<id>urn:sha1:a02675ad907a31205b2de86a945eec327de38132</id>
<content type='text'>
"git fsck" learned to validate the on-disk pack reverse index files.

* ds/fsck-pack-revindex:
  fsck: validate .rev file header
  fsck: check rev-index position values
  fsck: check rev-index checksums
  fsck: create scaffolding for rev-index checks
</content>
</entry>
<entry>
<title>Merge branch 'tb/pack-revindex-on-disk'</title>
<updated>2023-04-27T23:00:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-27T23:00:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=849c8b3dbf8e850020951e81a42df4dc0b1e5b5d'/>
<id>urn:sha1:849c8b3dbf8e850020951e81a42df4dc0b1e5b5d</id>
<content type='text'>
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.

* tb/pack-revindex-on-disk:
  t: invert `GIT_TEST_WRITE_REV_INDEX`
  config: enable `pack.writeReverseIndex` by default
  pack-revindex: introduce `pack.readReverseIndex`
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: make `load_pack_revindex` take a repository
  t5325: mark as leak-free
  pack-write.c: plug a leak in stage_tmp_packfiles()
</content>
</entry>
<entry>
<title>Merge branch 'en/header-split-cache-h'</title>
<updated>2023-04-25T20:56:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-25T20:56:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0807e57807aaffe2813fffb7704dcc9153f03832'/>
<id>urn:sha1:0807e57807aaffe2813fffb7704dcc9153f03832</id>
<content type='text'>
Header clean-up.

* en/header-split-cache-h: (24 commits)
  protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
  mailmap, quote: move declarations of global vars to correct unit
  treewide: reduce includes of cache.h in other headers
  treewide: remove double forward declaration of read_in_full
  cache.h: remove unnecessary includes
  treewide: remove cache.h inclusion due to pager.h changes
  pager.h: move declarations for pager.c functions from cache.h
  treewide: remove cache.h inclusion due to editor.h changes
  editor: move editor-related functions and declarations into common file
  treewide: remove cache.h inclusion due to object.h changes
  object.h: move some inline functions and defines from cache.h
  treewide: remove cache.h inclusion due to object-file.h changes
  object-file.h: move declarations for object-file.c functions from cache.h
  treewide: remove cache.h inclusion due to git-zlib changes
  git-zlib: move declarations for git-zlib functions from cache.h
  treewide: remove cache.h inclusion due to object-name.h changes
  object-name.h: move declarations for object-name.c functions from cache.h
  treewide: remove unnecessary cache.h inclusion
  treewide: be explicit about dependence on mem-pool.h
  treewide: be explicit about dependence on oid-array.h
  ...
</content>
</entry>
<entry>
<title>fsck: validate .rev file header</title>
<updated>2023-04-17T21:39:05Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2023-04-17T16:21:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5a6072f631dcf4d9f65e83b08d14c82e2af45dd8'/>
<id>urn:sha1:5a6072f631dcf4d9f65e83b08d14c82e2af45dd8</id>
<content type='text'>
While parsing a .rev file, we check the header information to be sure it
makes sense. This happens before doing any additional validation such as
a checksum or value check. In order to differentiate between a bad
header and a non-existent file, we need to update the API for loading a
reverse index.

Make load_pack_revindex_from_disk() non-static and specify that a
positive value means "the file does not exist" while other errors during
parsing are negative values. Since an invalid header prevents setting up
the structures we would use for further validations, we can stop at that
point.

The place where we can distinguish between a missing file and a corrupt
file is inside load_revindex_from_disk(), which is used both by pack
rev-indexes and multi-pack-index rev-indexes. Some tests in t5326
demonstrate that it is critical to take some conditions to allow
positive error signals.

Add tests that check the three header values.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-revindex: make `load_pack_revindex` take a repository</title>
<updated>2023-04-13T14:55:45Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-04-12T22:20:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=65308ad8f757a823ccf3175609d580da5f767a15'/>
<id>urn:sha1:65308ad8f757a823ccf3175609d580da5f767a15</id>
<content type='text'>
In a future commit, we will introduce a `pack.readReverseIndex`
configuration, which forces Git to generate the reverse index from
scratch instead of loading it from disk.

In order to avoid reading this configuration value more than once, we'll
use the `repo_settings` struct to lazily load this value.

In order to access the `struct repo_settings`, add a repository argument
to `load_pack_revindex`, and update all callers to pass the correct
instance (in all cases, `the_repository`).

In certain instances, a new function-local variable is introduced to
take the place of a `struct repository *` argument to the function
itself to avoid propagating the new parameter even further throughout
the tree.

Co-authored-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Acked-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove cache.h inclusion due to object-file.h changes</title>
<updated>2023-04-11T15:52:10Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-11T07:41:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b6fdc44c8441d04c6659252cdf9adae240339e17'/>
<id>urn:sha1:b6fdc44c8441d04c6659252cdf9adae240339e17</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Acked-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file.h: move declarations for object-file.c functions from cache.h</title>
<updated>2023-04-11T15:52:10Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-11T07:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=87bed17907b2cb9a9581a5b8b16b8da264c2a2a8'/>
<id>urn:sha1:87bed17907b2cb9a9581a5b8b16b8da264c2a2a8</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Acked-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: be explicit about dependence on trace.h &amp; trace2.h</title>
<updated>2023-04-11T15:52:08Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-11T03:00:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=74ea5c9574d29a510602492fcd672e5d09c841b0'/>
<id>urn:sha1:74ea5c9574d29a510602492fcd672e5d09c841b0</id>
<content type='text'>
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h.  This made it more difficult
to find which files could remove a dependence on cache.h.  Make C files
explicitly include trace.h or trace2.h if they are using them.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Acked-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/header-split-cleanup'</title>
<updated>2023-04-06T20:38:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-06T20:38:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6047b28eb7e1a0b0061c5310034f7b5683ea401a'/>
<id>urn:sha1:6047b28eb7e1a0b0061c5310034f7b5683ea401a</id>
<content type='text'>
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.

* en/header-split-cleanup:
  csum-file.h: remove unnecessary inclusion of cache.h
  write-or-die.h: move declarations for write-or-die.c functions from cache.h
  treewide: remove cache.h inclusion due to setup.h changes
  setup.h: move declarations for setup.c functions from cache.h
  treewide: remove cache.h inclusion due to environment.h changes
  environment.h: move declarations for environment.c functions from cache.h
  treewide: remove unnecessary includes of cache.h
  wrapper.h: move declarations for wrapper.c functions from cache.h
  path.h: move function declarations for path.c functions from cache.h
  cache.h: remove expand_user_path()
  abspath.h: move absolute path functions from cache.h
  environment: move comment_line_char from cache.h
  treewide: remove unnecessary cache.h inclusion from several sources
  treewide: remove unnecessary inclusion of gettext.h
  treewide: be explicit about dependence on gettext.h
  treewide: remove unnecessary cache.h inclusion from a few headers
</content>
</entry>
</feed>
