diff options
| author | Junio C Hamano <gitster@pobox.com> | 2024-07-08 14:53:09 -0700 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2024-07-08 14:53:10 -0700 |
| commit | ecf7fc600a5218c9ee3863ee70d5a6e312164f30 (patch) | |
| tree | bcea9bd3948c998759ea848ee55592ac1e1f8a16 /Documentation/gitformat-commit-graph.txt | |
| parent | Merge branch 'db/date-underflow-fix' (diff) | |
| parent | bloom: introduce `deinit_bloom_filters()` (diff) | |
| download | git-ecf7fc600a5218c9ee3863ee70d5a6e312164f30.tar.gz git-ecf7fc600a5218c9ee3863ee70d5a6e312164f30.zip | |
Merge branch 'tb/path-filter-fix'
The Bloom filter used for path limited history traversal was broken
on systems whose "char" is unsigned; update the implementation and
bump the format version to 2.
* tb/path-filter-fix:
bloom: introduce `deinit_bloom_filters()`
commit-graph: reuse existing Bloom filters where possible
object.h: fix mis-aligned flag bits table
commit-graph: new Bloom filter version that fixes murmur3
commit-graph: unconditionally load Bloom filters
bloom: prepare to discard incompatible Bloom filters
bloom: annotate filters with hash version
repo-settings: introduce commitgraph.changedPathsVersion
t4216: test changed path filters with high bit paths
t/helper/test-read-graph: implement `bloom-filters` mode
bloom.h: make `load_bloom_filter_from_graph()` public
t/helper/test-read-graph.c: extract `dump_graph_info()`
gitformat-commit-graph: describe version 2 of BDAT
commit-graph: ensure Bloom filters are read with consistent settings
revision.c: consult Bloom filters for root commits
t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
Diffstat (limited to 'Documentation/gitformat-commit-graph.txt')
| -rw-r--r-- | Documentation/gitformat-commit-graph.txt | 9 |
1 files changed, 6 insertions, 3 deletions
diff --git a/Documentation/gitformat-commit-graph.txt b/Documentation/gitformat-commit-graph.txt index 31cad585e2..3e906e8030 100644 --- a/Documentation/gitformat-commit-graph.txt +++ b/Documentation/gitformat-commit-graph.txt @@ -142,13 +142,16 @@ All multi-byte numbers are in network byte order. ==== Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] * It starts with header consisting of three unsigned 32-bit integers: - - Version of the hash algorithm being used. We currently only support - value 1 which corresponds to the 32-bit version of the murmur3 hash + - Version of the hash algorithm being used. We currently support + value 2 which corresponds to the 32-bit version of the murmur3 hash implemented exactly as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double hashing technique using seed values 0x293ae76f and 0x7e646e2 as described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters - in Probabilistic Verification" + in Probabilistic Verification". Version 1 Bloom filters have a bug that appears + when char is signed and the repository has path names that have characters >= + 0x80; Git supports reading and writing them, but this ability will be removed + in a future version of Git. - The number of times a path is hashed and hence the number of bit positions that cumulatively determine whether a file is present in the commit. - The minimum number of bits 'b' per entry in the Bloom filter. If the filter |
