<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/bundle.c, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-07-21T16:14:25Z</updated>
<entry>
<title>Merge branch 'bc/use-sha256-by-default-in-3.0'</title>
<updated>2025-07-21T16:14:25Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-07-21T16:14:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a636d395ff0b0ccecb4569c0fc80f7c55f2e5f2e'/>
<id>urn:sha1:a636d395ff0b0ccecb4569c0fc80f7c55f2e5f2e</id>
<content type='text'>
Prepare to flip the default hash function to SHA-256.

* bc/use-sha256-by-default-in-3.0:
  Enable SHA-256 by default in breaking changes mode
  help: add a build option for default hash
  t5300: choose the built-in hash outside of a repo
  t4042: choose the built-in hash outside of a repo
  t1007: choose the built-in hash outside of a repo
  t: default to compile-time default hash if not set
  setup: use the default algorithm to initialize repo format
  Use legacy hash for legacy formats
  builtin: use default hash when outside a repository
  hash: add a constant for the legacy hash algorithm
  hash: add a constant for the default hash algorithm
</content>
</entry>
<entry>
<title>Use legacy hash for legacy formats</title>
<updated>2025-07-01T21:58:24Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2025-07-01T21:22:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=667d251a04c1dd769fb5a71bbe94d6d15ae594f1'/>
<id>urn:sha1:667d251a04c1dd769fb5a71bbe94d6d15ae594f1</id>
<content type='text'>
We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the legacy hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>odb: rename `repo_read_object_file()`</title>
<updated>2025-07-01T21:46:38Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d4ff88aee3967e5d1ef1237cd9b8792b7cdb304c'/>
<id>urn:sha1:d4ff88aee3967e5d1ef1237cd9b8792b7cdb304c</id>
<content type='text'>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.

Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: rename files to "odb.{c,h}"</title>
<updated>2025-07-01T21:46:34Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8f49151763cb81adf4bcec53c1ae67057081b02d'/>
<id>urn:sha1:8f49151763cb81adf4bcec53c1ae67057081b02d</id>
<content type='text'>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-store: rename `object_directory` to `odb_source`</title>
<updated>2025-07-01T21:46:34Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-07-01T12:22:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a1e2581a1e9ca2a85ae0a018ba5fb8fe5db3c322'/>
<id>urn:sha1:a1e2581a1e9ca2a85ae0a018ba5fb8fe5db3c322</id>
<content type='text'>
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.

While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.

As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.

Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:

  - `odb_backend` was discarded because it led to the association that
    one object database has a single backend, but the model is that one
    alternate has one backend. Furthermore, "backend" is more about the
    actual backing implementation and less about the high-level concept.

  - `odb_alternate` was discarded because it is a bit of a stretch to
    also call the main object directory an "alternate".

Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.

In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/object-file-cleanup'</title>
<updated>2025-04-25T00:25:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-04-25T00:25:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=36d8035d27d6100a525a0e25619868b9542a4f35'/>
<id>urn:sha1:36d8035d27d6100a525a0e25619868b9542a4f35</id>
<content type='text'>
Code clean-up.

* ps/object-file-cleanup:
  object-store: merge "object-store-ll.h" and "object-store.h"
  object-store: remove global array of cached objects
  object: split out functions relating to object store subsystem
  object-file: drop `index_blob_stream()`
  object-file: split up concerns of `HASH_*` flags
  object-file: split out functions relating to object store subsystem
  object-file: move `xmmap()` into "wrapper.c"
  object-file: move `git_open_cloexec()` to "compat/open.c"
  object-file: move `safe_create_leading_directories()` into "path.c"
  object-file: move `mkdir_in_gitdir()` into "path.c"
</content>
</entry>
<entry>
<title>object-store: merge "object-store-ll.h" and "object-store.h"</title>
<updated>2025-04-15T15:24:37Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-04-15T09:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=68cd492a3e662c75dec364986c81e94716d4ac56'/>
<id>urn:sha1:68cd492a3e662c75dec364986c81e94716d4ac56</id>
<content type='text'>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle: fix non-linear performance scaling with refs</title>
<updated>2025-04-08T21:21:49Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2025-04-08T09:00:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a52d459e72b890c192485002ec518bb9e01c19a6'/>
<id>urn:sha1:a52d459e72b890c192485002ec518bb9e01c19a6</id>
<content type='text'>
The 'git bundle create' command has non-linear performance with the
number of refs in the repository. Benchmarking the command shows that
a large portion of the time (~75%) is spent in the
`object_array_remove_duplicates()` function.

The `object_array_remove_duplicates()` function was added in
b2a6d1c686 (bundle: allow the same ref to be given more than once,
2009-01-17) to skip duplicate refs provided by the user from being
written to the bundle. Since this is an O(N^2) algorithm, in repos with
large number of references, this can take up a large amount of time.

Let's instead use a 'strset' to skip duplicates inside
`write_bundle_refs()`. This improves the performance by around 6 times
when tested against in repository with 100000 refs:

Benchmark 1: bundle (refcount = 100000, revision = master)
  Time (mean ± σ):     14.653 s ±  0.203 s    [User: 13.940 s, System: 0.762 s]
  Range (min … max):   14.237 s … 14.920 s    10 runs

Benchmark 2: bundle (refcount = 100000, revision = HEAD)
  Time (mean ± σ):      2.394 s ±  0.023 s    [User: 1.684 s, System: 0.798 s]
  Range (min … max):    2.364 s …  2.425 s    10 runs

Summary
  bundle (refcount = 100000, revision = HEAD) ran
    6.12 ± 0.10 times faster than bundle (refcount = 100000, revision = master)

Previously, `object_array_remove_duplicates()` ensured that both the
refname and the object it pointed to were checked for duplicates. The
new approach, implemented within `write_bundle_refs()`, eliminates
duplicate refnames without comparing the objects they reference. This
works because, for bundle creation, we only need to prevent duplicate
refs from being written to the bundle header. The `revs-&gt;pending` array
can contain duplicates of multiple types.

First, references which resolve to the same refname. For e.g. "git
bundle create out.bdl master master" or "git bundle create out.bdl
refs/heads/master refs/heads/master" or "git bundle create out.bdl
master refs/heads/master". In these scenarios we want to prevent writing
"refs/heads/master" twice to the bundle header. Since both the refnames
here would point to the same object (unless there is a race), we do not
need to check equality of the object.

Second, refnames which are duplicates but do not point to the same
object. This can happen when we use an exclusion criteria. For e.g. "git
bundle create out.bdl master master^!", Here `revs-&gt;pending` would
contain two elements, both with refname set to "master". However, each
of them would be pointing to an INTERESTING and UNINTERESTING object
respectively. Since we only write refnames with INTERESTING objects to
the bundle header, we perform our duplicate checks only on such objects.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle: avoid closing file descriptor twice</title>
<updated>2025-01-26T02:38:11Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2025-01-25T23:57:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9a84794ad8ad1bc8ec6b2c4e1592a1f63765e753'/>
<id>urn:sha1:9a84794ad8ad1bc8ec6b2c4e1592a1f63765e753</id>
<content type='text'>
Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, the file
descriptor (`data-&gt;fd`) is passed to `unbundle()`, which would use it as
the `stdin` of the `index-pack` process, which as a consequence would
close it via `start_command()`. However, `data-&gt;fd` would still hold the
numerical value of the file descriptor, and `close_bundle()` would see
that and happily close it again.

This seems not to have caused too many problems in almost two decades,
but I encountered a situation today where it _does_ cause problems: In
i686 variants of Git for Windows, it seems that file descriptors are
reused quickly after they have been closed.

In the particular scenario I faced, `git fetch &lt;bundle&gt; &lt;ref&gt;` gets the
same file descriptor value when opening the bundle file and importing
its embedded packfile (which implicitly closes the file descriptor) and
then when opening a pack file in `fetch_and_consume_refs()` while
looking up an object's header.

Later on, after the bundle has been imported (and the `close_bundle()`
function erroneously closes the file descriptor that has _already_ been
closed when using it as `stdin` for `git index-pack`), the same file
descriptor value has now been reused via `use_pack()`. Now, when either
the recursive fetch (which defaults to "on", unfortunately) or a
commit-graph update needs to `mmap()` the packfile, it fails due to a
now-invalid file descriptor that _should_ point to the pack file but
doesn't anymore.

To fix that, let's invalidate `data-&gt;fd` after calling `unbundle()`.
That way, `close_bundle()` does not close a file descriptor that may
have been reused for something different. While at it, document that
`unbundle()` closes the file descriptor, and ensure that it also does
that when failing to verify the bundle.

Luckily, this bug does not affect the bundle URI feature, it only
affects the `git fetch &lt;bundle&gt;` code path.

Note that this patch does not _completely_ clarifies who is responsible
to close that file descriptor, as `run_command()` may fail _without_
closing `cmd-&gt;in`. Addressing this issue thoroughly, however, would
require a rather thorough re-design of the `start_command()` and
`finish_command()` functionality to make it a lot less murky who is
responsible for what file descriptors.

At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/build-sign-compare'</title>
<updated>2024-12-23T17:32:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-23T17:32:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4156b6a741c7fb15a4eccb320612fb6e453f439c'/>
<id>urn:sha1:4156b6a741c7fb15a4eccb320612fb6e453f439c</id>
<content type='text'>
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
</content>
</entry>
</feed>
