<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object.c, branch v2.45.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.3</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2024-03-28T21:13:50Z</updated>
<entry>
<title>Merge branch 'eb/hash-transition'</title>
<updated>2024-03-28T21:13:50Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-03-28T21:13:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1002f28a527d33893f7dab068dbac7011f84af65'/>
<id>urn:sha1:1002f28a527d33893f7dab068dbac7011f84af65</id>
<content type='text'>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
</content>
</entry>
<entry>
<title>Merge branch 'jk/upload-pack-bounded-resources'</title>
<updated>2024-03-07T23:59:42Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-03-07T23:59:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=56d608456000121f141a40c4501d41d712381674'/>
<id>urn:sha1:56d608456000121f141a40c4501d41d712381674</id>
<content type='text'>
Various parts of upload-pack has been updated to bound the resource
consumption relative to the size of the repository to protect from
abusive clients.

* jk/upload-pack-bounded-resources:
  upload-pack: free tree buffers after parsing
  upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
  upload-pack: always turn off save_commit_buffer
  upload-pack: disallow object-info capability by default
  upload-pack: accept only a single packfile-uri line
  upload-pack: use a strmap for want-ref lines
  upload-pack: use oidset for deepen_not list
  upload-pack: switch deepen-not list to an oid_array
  upload-pack: drop separate v2 "haves" array
</content>
</entry>
<entry>
<title>upload-pack: free tree buffers after parsing</title>
<updated>2024-02-28T22:42:01Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-02-28T22:39:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6cd05e768b7e54ca48b16fb0214df4c70aecd46c'/>
<id>urn:sha1:6cd05e768b7e54ca48b16fb0214df4c70aecd46c</id>
<content type='text'>
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree-&gt;buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.

But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.

Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.

But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:

  pktline() {
    local msg="$*"
    printf "%04x%s\n" $((1+4+${#msg})) "$msg"
  }

  want_trees() {
    pktline command=fetch
    printf 0001
    git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      while read oid type; do
        test "$type" = "tree" || continue
        pktline want $oid
      done
      pktline done
      printf 0000
  }

  want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . &gt;/dev/null

shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.

So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.

I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.

I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.

And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.

And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).

These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>use xstrncmpz()</title>
<updated>2024-02-12T17:32:41Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2024-02-10T07:43:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f0e578c69cd91a554179c09dab6989f6eb0e2910'/>
<id>urn:sha1:f0e578c69cd91a554179c09dab6989f6eb0e2910</id>
<content type='text'>
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly.  This simplifies
callers by reducing code duplication.

I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>loose: add a mapping between SHA-1 and SHA-256 for loose objects</title>
<updated>2023-10-02T21:57:38Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2023-10-02T02:40:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=23b2c7e95b6f8f3045665835d2dc5028701eff18'/>
<id>urn:sha1:23b2c7e95b6f8f3045665835d2dc5028701eff18</id>
<content type='text'>
As part of the transition plan, we'd like to add a file in the .git
directory that maps loose objects between SHA-1 and SHA-256.  Let's
implement the specification in the transition plan and store this data
on a per-repository basis in struct repository.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/header-split-cache-h-part-3'</title>
<updated>2023-06-29T23:43:21Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-06-29T23:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a1264a08a1a6e0cd7e510c899cd0ba42dcf1045d'/>
<id>urn:sha1:a1264a08a1a6e0cd7e510c899cd0ba42dcf1045d</id>
<content type='text'>
Header files cleanup.

* en/header-split-cache-h-part-3: (28 commits)
  fsmonitor-ll.h: split this header out of fsmonitor.h
  hash-ll, hashmap: move oidhash() to hash-ll
  object-store-ll.h: split this header out of object-store.h
  khash: name the structs that khash declares
  merge-ll: rename from ll-merge
  git-compat-util.h: remove unneccessary include of wildmatch.h
  builtin.h: remove unneccessary includes
  list-objects-filter-options.h: remove unneccessary include
  diff.h: remove unnecessary include of oidset.h
  repository: remove unnecessary include of path.h
  log-tree: replace include of revision.h with simple forward declaration
  cache.h: remove this no-longer-used header
  read-cache*.h: move declarations for read-cache.c functions from cache.h
  repository.h: move declaration of the_index from cache.h
  merge.h: move declarations for merge.c from cache.h
  diff.h: move declaration for global in diff.c from cache.h
  preload-index.h: move declarations for preload-index.c from elsewhere
  sparse-index.h: move declarations for sparse-index.c from cache.h
  name-hash.h: move declarations for name-hash.c from cache.h
  run-command.h: move declarations for run-command.c from cache.h
  ...
</content>
</entry>
<entry>
<title>cache.h: remove this no-longer-used header</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bc5c5ec0446895f5c4139cd470066beb3c4ac6d5'/>
<id>urn:sha1:bc5c5ec0446895f5c4139cd470066beb3c4ac6d5</id>
<content type='text'>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.

Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first).  This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>statinfo: move stat_{data,validity} functions from cache/read-cache</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=90cbae9ce5d22df29867be9026c514b8c79e3d31'/>
<id>urn:sha1:90cbae9ce5d22df29867be9026c514b8c79e3d31</id>
<content type='text'>
These functions do not depend upon struct cache_entry or struct
index_state in any way, and it seems more logical to break them out into
this file, especially since statinfo.h already has the struct stat_data
declaration.

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object: add object_array initializer helper function</title>
<updated>2023-05-08T19:05:55Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-05-08T17:38:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fe90355361430dc52f858845a821370db0c54c80'/>
<id>urn:sha1:fe90355361430dc52f858845a821370db0c54c80</id>
<content type='text'>
The object_array API has an OBJECT_ARRAY_INIT macro, but lacks a
function to initialize an object_array at a given location in memory.

Introduce `object_array_init()` to implement such a function.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file.h: move declarations for object-file.c functions from cache.h</title>
<updated>2023-04-11T15:52:10Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-11T07:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=87bed17907b2cb9a9581a5b8b16b8da264c2a2a8'/>
<id>urn:sha1:87bed17907b2cb9a9581a5b8b16b8da264c2a2a8</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Acked-by: Calvin Wan &lt;calvinwan@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
