<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/http.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-23T17:32:11Z</updated>
<entry>
<title>Merge branch 'ps/build-sign-compare'</title>
<updated>2024-12-23T17:32:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-23T17:32:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4156b6a741c7fb15a4eccb320612fb6e453f439c'/>
<id>urn:sha1:4156b6a741c7fb15a4eccb320612fb6e453f439c</id>
<content type='text'>
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
</content>
</entry>
<entry>
<title>Merge branch 'kn/midx-wo-the-repository'</title>
<updated>2024-12-13T15:33:44Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-13T15:33:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ca43bd2562fb85b34c846c1bbc2175e1ef145bf1'/>
<id>urn:sha1:ca43bd2562fb85b34c846c1bbc2175e1ef145bf1</id>
<content type='text'>
Yet another "pass the repository through the callchain" topic.

* kn/midx-wo-the-repository:
  midx: inline the `MIDX_MIN_SIZE` definition
  midx: pass down `hash_algo` to functions using global variables
  midx: pass `repository` to `load_multi_pack_index`
  midx: cleanup internal usage of `the_repository` and `the_hash_algo`
  midx-write: pass down repository to `write_midx_file[_only]`
  write-midx: add repository field to `write_midx_context`
  midx-write: use `revs-&gt;repo` inside `read_refs_snapshot`
  midx-write: pass down repository to static functions
  packfile.c: remove unnecessary prepare_packed_git() call
  midx: add repository to `multi_pack_index` struct
  config: make `packed_git_(limit|window_size)` non-global variables
  config: make `delta_base_cache_limit` a non-global variable
  packfile: pass down repository to `for_each_packed_object`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `odb_pack_name`
  packfile: pass `repository` to static function in the file
  packfile: use `repository` from `packed_git` directly
  packfile: add repository to struct `packed_git`
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'bc/drop-ancient-libcurl-and-perl'</title>
<updated>2024-12-04T01:14:48Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-04T01:14:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4c1b7e364e59107d7d460602abf0ac616f146631'/>
<id>urn:sha1:4c1b7e364e59107d7d460602abf0ac616f146631</id>
<content type='text'>
Drop support for older libcURL and Perl.

* bc/drop-ancient-libcurl-and-perl:
  gitweb: make use of s///r
  Require Perl 5.26.0
  INSTALL: document requirement for libcurl 7.61.0
  git-curl-compat: remove check for curl 7.56.0
  git-curl-compat: remove check for curl 7.53.0
  git-curl-compat: remove check for curl 7.52.0
  git-curl-compat: remove check for curl 7.44.0
  git-curl-compat: remove check for curl 7.43.0
  git-curl-compat: remove check for curl 7.39.0
  git-curl-compat: remove check for curl 7.34.0
  git-curl-compat: remove check for curl 7.25.0
  git-curl-compat: remove check for curl 7.21.5
</content>
</entry>
<entry>
<title>packfile: pass down repository to `odb_pack_name`</title>
<updated>2024-12-03T23:21:54Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=873b00597bbf20c1bcda089a687641167b148fa2'/>
<id>urn:sha1:873b00597bbf20c1bcda089a687641167b148fa2</id>
<content type='text'>
The function `odb_pack_name` currently relies on the global variable
`the_repository`. To eliminate global variable usage in `packfile.c`, we
should progressively shift the dependency on the_repository to higher
layers.

Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: add repository to struct `packed_git`</title>
<updated>2024-12-03T23:21:53Z</updated>
<author>
<name>Karthik Nayak</name>
<email>karthik.188@gmail.com</email>
</author>
<published>2024-12-03T14:43:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2cf3fe63f6eedd6d132c530b897595345a05088b'/>
<id>urn:sha1:2cf3fe63f6eedd6d132c530b897595345a05088b</id>
<content type='text'>
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.

We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.

We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.

Helped-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Karthik Nayak &lt;karthik.188@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/dumb-http-finalize'</title>
<updated>2024-11-01T16:53:32Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2024-11-01T16:53:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1c5a712f26b67e07672a4bcda017046f45545369'/>
<id>urn:sha1:1c5a712f26b67e07672a4bcda017046f45545369</id>
<content type='text'>
The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.

* jk/dumb-http-finalize:
  packfile: use oidread() instead of hashcpy() to fill object_id
  packfile: use object_id in find_pack_entry_one()
  packfile: convert find_sha1_pack() to use object_id
  http-walker: use object_id instead of bare hash
  packfile: warn people away from parse_packed_git()
  packfile: drop sha1_pack_index_name()
  packfile: drop sha1_pack_name()
  packfile: drop has_pack_index()
  dumb-http: store downloaded pack idx as tempfile
  t5550: count fetches in "previously-fetched .idx" test
  midx: avoid duplicate packed_git entries
</content>
</entry>
<entry>
<title>packfile: drop sha1_pack_name()</title>
<updated>2024-10-25T21:35:46Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-10-25T07:00:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c2dc4c9fbb1f119be6ab55ff8676bf18b4b9446a'/>
<id>urn:sha1:c2dc4c9fbb1f119be6ab55ff8676bf18b4b9446a</id>
<content type='text'>
The sha1_pack_name() function has a few ugly bits:

  - it writes into a static strbuf (and not even a ring buffer of them),
    which can lead to subtle invalidation problems

  - it uses the term "sha1", but it's really using the_hash_algo, which
    could be sha256

There's only one caller of it left. And in fact that caller is better
off using the underlying odb_pack_name() function itself, since it's
just copying the result into its own strbuf anyway.

Converting that caller lets us get rid of this now-obselete function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>packfile: drop has_pack_index()</title>
<updated>2024-10-25T21:35:46Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-10-25T07:00:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=03b8eed7f50a56cd7581e18775c6f3a7caf639b4'/>
<id>urn:sha1:03b8eed7f50a56cd7581e18775c6f3a7caf639b4</id>
<content type='text'>
The has_pack_index() function has several oddities that may make it
surprising if you are trying to find out if we have a pack with some
$hash:

  - it is not looking for a valid pack that we found while searching
    object directories. It just looks for any pack-$hash.idx file in the
    pack directory.

  - it only looks in the local directory, not any alternates

  - it takes a bare "unsigned char" hash, which we try to avoid these
    days

The only caller it has is in the dumb http code; it wants to know if we
already have the pack idx in question. This can happen if we downloaded
the pack (and generated its index) during a previous fetch.

Before the previous patch ("dumb-http: store downloaded pack idx as
tempfile"), it could also happen if we downloaded the .idx from the
remote but didn't get the matching .pack. But since that patch, we don't
hold on to those .idx files. So there's no need to look for the .idx
file in the filesystem; we can just scan through the packed_git list to
see if we have it.

That lets us simplify the dumb http code a bit, as we know that if we
have the .idx we have the matching .pack already. And it lets us get rid
of this odd function that is unlikely to be needed again.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>dumb-http: store downloaded pack idx as tempfile</title>
<updated>2024-10-25T21:35:46Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-10-25T06:58:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=63aca3f7f14e01cabccf2165b941205a8a9a01dc'/>
<id>urn:sha1:63aca3f7f14e01cabccf2165b941205a8a9a01dc</id>
<content type='text'>
This patch fixes a regression in b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26) where fetching a v1 pack idx file
over the dumb-http protocol would cause the fetch to fail.

The core of the issue is that dumb-http stores the idx we fetch from the
remote at the same path that will eventually hold the idx we generate
from "index-pack --stdin". The sequence is something like this:

  0. We realize we need some object X, which we don't have locally, and
     nor does the other side have it as a loose object.

  1. We download the list of remote packs from objects/info/packs.

  2. For each entry in that file, we download each pack index and store
     it locally in .git/objects/pack/pack-$hash.idx (the $hash is not
     something we can verify yet and is given to us by the remote).

  3. We check each pack index we got to see if it has object X. When we
     find a match, we download the matching .pack file from the remote
     to a tempfile. We feed that to "index-pack --stdin", which
     reindexes the pack, rather than trusting that it has what the other
     side claims it does. In most cases, this will end up generating the
     exact same (byte-for-byte) pack index which we'll store at the same
     pack-$hash.idx path, because the index generation and $hash id are
     computed based on what's in the packfile. But:

       a. The other side might have used other options to generate the
          index. For instance we use index v2 by default, but long ago
          it was v1 (and you can still ask for v1 explicitly).

       b. The other side might even use a different mechanism to
          determine $hash. E.g., long ago it was based on the sorted
          list of objects in the packfile, but we switched to using the
          pack checksum in 1190a1acf8 (pack-objects: name pack files
          after trailer hash, 2013-12-05).

The regression we saw in the real world was (3a). A recent client
fetching from a server with a v1 index downloaded that index, then
complained about trying to overwrite it with its own v2 index. This
collision is otherwise harmless; we know we want to replace the remote
version with our local one, but the collision check doesn't realize
that.

There are a few options to fix it:

  - we could teach index-pack a command-line option to ignore only pack
    idx collisions, and use it when the dumb-http code invokes
    index-pack. This would be an awkward thing to expose users to and
    would involve a lot of boilerplate to get the option down to the
    collision code.

  - we could delete the remote .idx file right before running
    index-pack. It should be redundant at that point (since we've just
    downloaded the matching pack). But it feels risky to delete
    something from our own .git/objects based on what the other side has
    said. I'm not entirely positive that a malicious server couldn't lie
    about which pack-$hash.idx it has and get us to delete something
    precious.

  - we can stop co-mingling the downloaded idx files in our local
    objects directory. This is a slightly bigger change but I think
    fixes the root of the problem more directly.

This patch implements the third option. The big design questions are:
where do we store the downloaded files, and how do we manage their
lifetimes?

There are some additional quirks to the dumb-http system we should
consider. Remember that in step 2 we downloaded every pack index, but in
step 3 we may only download some of the matching packs. What happens to
those other idx files now? They sit in the .git/objects/pack directory,
possibly waiting to be used at a later date. That may save bandwidth for
a subsequent fetch, but it also creates a lot of weird corner cases:

  - our local object directory now has semi-untrusted .idx files sitting
    around, without their matching .pack

  - in case 3b, we noted that we might not generate the same hash as the
    other side. In that case even if we download the matching pack,
    our index-pack invocation will store it in a different
    pack-$hash.idx file. And the unmatched .idx will sit there forever.

  - if the server repacks, it may delete the old packs. Now we have
    these orphaned .idx files sitting around locally that will never be
    used (nor deleted).

  - if we repack locally we may delete our local version of the server's
    pack index and not realize we have it. So we'll download it again,
    even though we have all of the objects it mentions.

I think the right solution here is probably some more complex cache
management system: download the remote .idx files to their own storage
directory, mark them as "seen" when we get their matching pack (to avoid
re-downloading even if we repack), and then delete them when the
server's objects/info/refs no longer mentions them.

But since the dumb http protocol is so ancient and so inferior to the
smart http protocol, I don't think it's worth spending a lot of time
creating such a system. For this patch I'm just downloading the idx
files to .git/objects/tmp_pack_*, and marking them as tempfiles to be
deleted when we exit (and due to the name, any we miss due to a crash,
etc, should eventually be removed by "git gc" runs based on timestamps).

That is slightly worse for one case: if we download an idx but not the
matching pack, we won't retain that idx for subsequent runs. But the
flip side is that we're making other cases better (we never hold on to
useless idx files forever). I suspect that worse case does not even come
up often, since it implies that the packs are generated to match
distinct parts of history (i.e., in practice even in a repo with many
packs you're going to end up grabbing all of those packs to do a clone).
If somebody really cares about that, I think the right path forward is a
managed cache directory as above, and this patch is providing the first
step in that direction anyway (by moving things out of the objects/pack/
directory).

There are two test changes. One demonstrates the broken v1 index case
(it double-checks the resulting clone with fsck to be careful, but prior
to this patch it actually fails at the clone step). The other tweaks the
expectation for a test that covers the "slightly worse" case to
accommodate the extra index download.

The code changes are fairly simple. We stop using finalize_object_file()
to copy the remote's index file into place, and leave it as a tempfile.
We give the tempfile a real ".idx" name, since the packfile code expects
that, and thus we make sure it is out of the usual packs/ directory (so
we'd never mistake it for a real local .idx).

We also have to change parse_pack_index(), which creates a temporary
packed_git to access our index (we need this because all of the pack idx
code assumes we have that struct). It reads the index data from the
tempfile, but prior to this patch would speculatively write the
finalized name into the packed_git struct using the pack-$hash we expect
to use.

I was mildly surprised that this worked at all, since we call
verify_pack_index() on the packed_git which mentions the final name
before moving the file into place! But it works because
parse_pack_index() leaves the mmap-ed data in the struct, so the
lazy-open in verify_pack_index() never triggers, and we read from the
tempfile, ignoring the filename in the struct completely. Hacky, but it
works.

After this patch, parse_pack_index() now uses the index filename we pass
in to derive a matching .pack name. This is OK to change because there
are only two callers, both in the dumb http code (and the other passes
in an existing pack-$hash.idx name, so the derived name is going to be
pack-$hash.pack, which is what we were using anyway).

I'll follow up with some more cleanups in that area, but this patch is
sufficient to fix the regression.

Reported-by: fox &lt;fox.gbr@townlong-yak.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
</feed>
