git/object.h, branch v2.41.2

hash-ll.h: split out of hash.h to remove dependency on repository.h

2023-04-24T19:47:32Z

hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

object.h: move some inline functions and defines from cache.h

2023-04-11T15:52:10Z

The object_type() inline function is very tied to the enum object_type declaration within object.h, and just seemed to make more sense to live there. That makes S_ISGITLINK and some other defines make sense to go with it, as well as the create_ce_mode() and canon_mode() inline functions. Move all these inline functions and defines from cache.h to object.h. Signed-off-by: Elijah Newren Acked-by: Calvin Wan Signed-off-by: Junio C Hamano

object.h: stop depending on cache.h; make cache.h depend on object.h

2023-02-24T01:25:29Z

Things should be able to depend on object.h without pulling in all of cache.h. Move an enum to allow this. Note that a couple files previously depended on things brought in through cache.h indirectly (revision.h -> commit.h -> object.h -> cache.h). As such, this change requires making existing dependencies more explicit in half a dozen files. The inclusion of strbuf.h in some headers if of particular note: these headers directly embedded a strbuf in some new structs, meaning they should have been including strbuf.h all along but were indirectly getting the necessary definitions. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

parse_object(): allow skipping hash check

2022-09-07T19:18:57Z

The parse_object() function checks the object hash of any object it parses. This is a nice feature, as it means we may catch bit corruption during normal use, rather than waiting for specific fsck operations. But it also can be slow. It's particularly noticeable for blobs, where except for the hash check, we could return without loading the object contents at all. Now one may wonder what is the point of calling parse_object() on a blob in the first place then, but usually it's not intentional: we were fed an oid from somewhere, don't know the type, and want an object struct. For commits and trees, the parsing is usually helpful; we're about to look at the contents anyway. But this is less true for blobs, where we may be collecting them as part of a reachability traversal, etc, and don't actually care what's in them. And blobs, of course, tend to be larger. We don't want to just throw out the hash-checks for blobs, though. We do depend on them in some circumstances (e.g., rev-list --verify-objects uses parse_object() to check them). It's only the callers that know how they're going to use the result. And so we can help them by providing a special flag to skip the hash check. We could just apply this to blobs, as they're going to be the main source of performance improvement. But if a caller doesn't care about checking the hash, we might as well skip it for other object types, too. Even though we can't avoid reading the object contents, we can still skip the actual hash computation. If this seems like it is making Git a little bit less safe against corruption, it may be. But it's part of a series of tradeoffs we're already making. For instance, "rev-list --objects" does not open the contents of blobs it prints. And when a commit graph is present, we skip opening most commits entirely. The important thing will be to use this flag in cases where it's safe to skip the check. For instance, when serving a pack for a fetch, we know the client will fully index the objects and do a connectivity check itself. There's little to be gained from the server side re-hashing a blob itself. And indeed, most of the time we don't! The revision machinery won't open up a blob reached by traversal, but only one requested directly with a "want" line. So applied properly, this new feature shouldn't make anything less safe in practice. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

revision: allow --ancestry-path to take an argument

2022-08-19T17:45:08Z

We have long allowed users to run e.g. git log --ancestry-path master..seen which shows all commits which satisfy all three of these criteria: * are an ancestor of seen * are not an ancestor of master * have master as an ancestor This commit allows another variant: git log --ancestry-path=$TOPIC master..seen which shows all commits which satisfy all of these criteria: * are an ancestor of seen * are not an ancestor of master * have $TOPIC in their ancestry-path that last bullet can be defined as commits meeting any of these criteria: * are an ancestor of $TOPIC * have $TOPIC as an ancestor * are $TOPIC This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path. Signed-off-by: Elijah Newren Acked-by: Derrick Stolee Signed-off-by: Junio C Hamano

reflog: libify delete reflog function and helpers

2022-03-02T23:24:47Z

Currently stash shells out to reflog in order to delete refs. In an effort to reduce how much we shell out to a subprocess, libify the functionality that stash needs into reflog.c. Add a reflog_delete function that is pretty much the logic in the while loop in builtin/reflog.c cmd_reflog_delete(). This is a function that builtin/reflog.c and builtin/stash.c can both call. Also move functions needed by reflog_delete and export them. Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: John Cai Signed-off-by: Junio C Hamano

.[ch] _INIT macros: use { 0 } for a "zero out" idiom

2021-09-27T21:47:59Z

In C it isn't required to specify that all members of a struct are zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will accomplish that. Let's also change code that provided N zero'd fields to just provide one, and change e.g. "{ NULL }" to "{ 0 }" for consistency. I.e. even if the first member is a pointer let's use "0" instead of "NULL". The point of using "0" consistently is to pick one, and to not have the reader wonder why we're not using the same pattern everywhere. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano

builtin/pack-objects.c: remove duplicate hash lookup

2021-08-30T06:25:43Z

In the original code from 08cdfb1337 (pack-objects --keep-unreachable, 2007-09-16), we add each object to the packing list with type `obj->type`, where `obj` comes from `lookup_unknown_object()`. Unless we had already looked up and parsed the object, this will be `OBJ_NONE`. That's fine, since oe_set_type() sets the type_valid bit to '0', and we determine the real type later on. So the only thing we need from the object lookup is access to the `flags` field so that we can mark that we've added the object with `OBJECT_ADDED` to avoid adding it again (we can just pass `OBJ_NONE` directly instead of grabbing it from the object). But add_object_entry() already rejects duplicates! This has been the behavior since 7a979d99ba (Thin pack - create packfile with missing delta base., 2006-02-19), but 08cdfb1337 didn't take advantage of it. Moreover, to do the OBJECT_ADDED check, we have to do a hash lookup in `obj_hash`. So we can drop the lookup_unknown_object() call completely, *and* the OBJECT_ADDED flag, too, since the spot we're touching here is the only location that checks it. In the end, we perform the same number of hash lookups, but with the added bonus that we don't waste memory allocating an OBJ_NONE object (if we were traversing, we'd need it eventually, but the whole point of this code path is not to traverse). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano

object.h: add lookup_object_by_type() function

2021-06-29T03:30:18Z

In some cases it's useful for efficiency reasons to get the type of an object before deciding whether to parse it, but we still want an object struct. E.g., in reachable.c, bitmaps give us the type, but we just want to mark flags on each object. Likewise, we may loop over every object and only parse tags in order to peel them; checking the type first lets us avoid parsing the non-tags. But our lookup_blob(), etc, functions make getting an object struct annoying: we have to call the right function for every type. And we cannot just use the generic lookup_object(), because it only returns an already-seen object; it won't allocate a new object struct. Let's provide a function that dispatches to the correct lookup_* function based on a run-time type. In fact, reachable.c already has such a helper, so we'll just make that public. I did change the return type from "void *" to "struct object *". While the former is a clever way to avoid casting inside the function, it's less safe and less informative to people reading the function declaration. The next commit will add a new caller. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

object.h: expand docstring for lookup_unknown_object()

2021-06-29T03:30:17Z

The lookup_unknown_object() system is not often used and is somewhat confusing. Let's try to explain it a bit more (which is especially important as I'm adding a related but slightly different function in the next commit). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano