<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object.c, branch v2.48.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.48.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.48.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-12-06T11:20:02Z</updated>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object: clear grafts when clearing parsed object pool</title>
<updated>2024-09-05T15:49:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-05T10:09:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0d1d22f5a385d05bde40303c17483db2eec499b3'/>
<id>urn:sha1:0d1d22f5a385d05bde40303c17483db2eec499b3</id>
<content type='text'>
We do not clear grafts part of the parsed object pool when clearing the
pool itself, which can lead to memory leaks when a repository is being
cleared.

Fix this by moving `reset_commit_grafts()` into "object.c" and making it
part of the `struct parsed_object_pool` interface such that we can call
it from `parsed_object_pool_clear()`. Adapt `parsed_object_pool_new()`
to take and store a reference to its owning repository, which is needed
by `unparse_commit()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object: fix leaking packfiles when closing object store</title>
<updated>2024-08-08T16:22:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-08T07:36:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6f1e9394e2e02d16dfbef02c1585a1acfd2a5118'/>
<id>urn:sha1:6f1e9394e2e02d16dfbef02c1585a1acfd2a5118</id>
<content type='text'>
When calling `raw_object_store_clear()`, we close and free several
resources associated with the object store. Part of that is to close and
free all the packfiles, which is handled by `close_object_store()`.

That function really only ends up closing the packfiles though, but it
doesn't free them. And in fact it can't, as that function is being
called via `run_command()` when `close_object_store = 1`, which is done
e.g. when we execute git-maintenance(1). At that point, other structures
may still have references on those packfiles, and thus we cannot free
them here. So while it is in fact intentional that we really only close
them, the result is a memory leak because `raw_object_store_clear()`
does not free them, either.

Fix the leak by freeing the packfiles in `raw_object_store_clear()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: introduce `USE_THE_REPOSITORY_VARIABLE` macro</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e7da9385708accf518a80a1e17969020fb361048'/>
<id>urn:sha1:e7da9385708accf518a80a1e17969020fb361048</id>
<content type='text'>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refs: pass repo when peeling objects</title>
<updated>2024-05-17T17:33:39Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-17T08:19:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=30aaff437fddd889ba429b50b96ea4c151c502c5'/>
<id>urn:sha1:30aaff437fddd889ba429b50b96ea4c151c502c5</id>
<content type='text'>
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on
`the_repository` to look up objects. Despite the fact that we want to
get rid of `the_repository`, it also leads to some restrictions in our
ref iterators when trying to retrieve the peeled value for a repository
other than `the_repository`.

Refactor these functions such that both take a repository as argument
and remove the now-unnecessary restrictions.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>refs: move object peeling into "object.c"</title>
<updated>2024-05-17T17:33:39Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-05-17T08:18:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=19c76e8235747a61a703fe3bf7a5e40caf6fa3fa'/>
<id>urn:sha1:19c76e8235747a61a703fe3bf7a5e40caf6fa3fa</id>
<content type='text'>
Peeling an object has nothing to do with refs, but we still have the
code in "refs.c". Move it over into "object.c", which is a more natural
place to put it.

Ideally, we'd also move `peel_iterated_oid()` over into "object.c". But
this function is tied to the refs interfaces because it uses a global
ref iterator variable to optimize peeling when the iterator already has
the peeled object ID readily available.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'eb/hash-transition'</title>
<updated>2024-03-28T21:13:50Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-03-28T21:13:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1002f28a527d33893f7dab068dbac7011f84af65'/>
<id>urn:sha1:1002f28a527d33893f7dab068dbac7011f84af65</id>
<content type='text'>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
</content>
</entry>
<entry>
<title>Merge branch 'jk/upload-pack-bounded-resources'</title>
<updated>2024-03-07T23:59:42Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-03-07T23:59:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=56d608456000121f141a40c4501d41d712381674'/>
<id>urn:sha1:56d608456000121f141a40c4501d41d712381674</id>
<content type='text'>
Various parts of upload-pack has been updated to bound the resource
consumption relative to the size of the repository to protect from
abusive clients.

* jk/upload-pack-bounded-resources:
  upload-pack: free tree buffers after parsing
  upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
  upload-pack: always turn off save_commit_buffer
  upload-pack: disallow object-info capability by default
  upload-pack: accept only a single packfile-uri line
  upload-pack: use a strmap for want-ref lines
  upload-pack: use oidset for deepen_not list
  upload-pack: switch deepen-not list to an oid_array
  upload-pack: drop separate v2 "haves" array
</content>
</entry>
<entry>
<title>upload-pack: free tree buffers after parsing</title>
<updated>2024-02-28T22:42:01Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-02-28T22:39:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6cd05e768b7e54ca48b16fb0214df4c70aecd46c'/>
<id>urn:sha1:6cd05e768b7e54ca48b16fb0214df4c70aecd46c</id>
<content type='text'>
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree-&gt;buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.

But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.

Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.

But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:

  pktline() {
    local msg="$*"
    printf "%04x%s\n" $((1+4+${#msg})) "$msg"
  }

  want_trees() {
    pktline command=fetch
    printf 0001
    git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      while read oid type; do
        test "$type" = "tree" || continue
        pktline want $oid
      done
      pktline done
      printf 0000
  }

  want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . &gt;/dev/null

shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.

So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.

I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.

I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.

And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.

And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).

These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>use xstrncmpz()</title>
<updated>2024-02-12T17:32:41Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-02-10T07:43:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f0e578c69cd91a554179c09dab6989f6eb0e2910'/>
<id>urn:sha1:f0e578c69cd91a554179c09dab6989f6eb0e2910</id>
<content type='text'>
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly.  This simplifies
callers by reducing code duplication.

I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
