<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/list-objects-filter.c, branch v2.35.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.35.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.35.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-04-19T21:09:11Z</updated>
<entry>
<title>list-objects: implement object type filter</title>
<updated>2021-04-19T21:09:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-04-19T11:46:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b0c42a53c9d36ea69f4d2650001f05e98eb347cb'/>
<id>urn:sha1:b0c42a53c9d36ea69f4d2650001f05e98eb347cb</id>
<content type='text'>
While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects: support filtering by tag and commit</title>
<updated>2021-04-12T16:35:50Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-04-12T13:37:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9a2a4f95448890d138a800c8a55c5d5dcfe16082'/>
<id>urn:sha1:9a2a4f95448890d138a800c8a55c5d5dcfe16082</id>
<content type='text'>
Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.

No change in behaviour is expected from this commit given that there are
no filters yet for those object types.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>use CALLOC_ARRAY</title>
<updated>2021-03-14T00:00:09Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2021-03-13T16:17:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ca56dadb4b65ccaeab809d80db80a312dc00941a'/>
<id>urn:sha1:ca56dadb4b65ccaeab809d80db80a312dc00941a</id>
<content type='text'>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-name.c: rename from sha1-name.c</title>
<updated>2021-01-04T21:01:55Z</updated>
<author>
<name>Martin Ågren</name>
<email>martin.agren@gmail.com</email>
</author>
<published>2020-12-31T11:56:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1e6771e5046232e448e2dde4f0fee1752c9e1d04'/>
<id>urn:sha1:1e6771e5046232e448e2dde4f0fee1752c9e1d04</id>
<content type='text'>
Generalize the last remnants of "sha" and "sha1" in this file and rename
it to reflect that we're not just able to handle SHA-1 these days.

We need to update one test to check for an updated error string.

Signed-off-by: Martin Ågren &lt;martin.agren@gmail.com&gt;
Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: treat NULL filter_options as "disabled"</title>
<updated>2020-05-05T04:57:58Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-05-04T23:12:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5bf7f1eaa51b3f35161e1c9e4d8bc843330dea3c'/>
<id>urn:sha1:5bf7f1eaa51b3f35161e1c9e4d8bc843330dea3c</id>
<content type='text'>
In most callers, we have an actual list_objects_filter_options struct,
and if no filtering is desired its "choice" element will be
LOFC_DISABLED. However, some code may have only a pointer to such a
struct which may be NULL (because _their_ callers didn't care about
filtering, either). Rather than forcing them to handle this explicitly
like:

  if (filter_options)
          traverse_commit_list_filtered(filter_options, revs,
	                                show_commit, show_object,
					show_data, NULL);
  else
          traverse_commit_list(revs, show_commit, show_object,
	                             show_data);

let's just treat a NULL filter_options the same as LOFC_DISABLED. We
only need a small change, since that option struct is converted into a
real filter only in the "init" function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/partial-clone-sparse-blob'</title>
<updated>2019-10-07T02:32:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-07T02:32:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ad8f0368b45bf1ab0f1339033d0a62cee94b1ae2'/>
<id>urn:sha1:ad8f0368b45bf1ab0f1339033d0a62cee94b1ae2</id>
<content type='text'>
The name of the blob object that stores the filter specification
for sparse cloning/fetching was interpreted in a wrong place in the
code, causing Git to abort.

* jk/partial-clone-sparse-blob:
  list-objects-filter: use empty string instead of NULL for sparse "base"
  list-objects-filter: give a more specific error sparse parsing error
  list-objects-filter: delay parsing of sparse oid
  t5616: test cloning/fetching with sparse:oid=&lt;oid&gt; filter
</content>
</entry>
<entry>
<title>list-objects-filter: use empty string instead of NULL for sparse "base"</title>
<updated>2019-09-16T19:47:51Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-15T16:51:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4cafc737916c2df5a52875cb1d0976662e3ab0e'/>
<id>urn:sha1:a4cafc737916c2df5a52875cb1d0976662e3ab0e</id>
<content type='text'>
We use add_excludes_from_blob_to_list() to parse a sparse blob. Since
we don't have a base path, we pass NULL and 0 for the base and baselen,
respectively. But the rest of the exclude code passes a literal empty
string instead of NULL for this case. And indeed, we eventually end up
with match_pathname() calling fspathncmp(), which then calls the system
strncmp(path, base, baselen).

This works on many platforms, which notice that baselen is 0 and do not
look at the bytes of "base" at all. But it does violate the C standard,
and building with SANITIZE=undefined will complain. You can also see it
by instrumenting fspathncmp like this:

	diff --git a/dir.c b/dir.c
	index d021c908e5..4bb3d3ec96 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -71,6 +71,8 @@ int fspathcmp(const char *a, const char *b)

	 int fspathncmp(const char *a, const char *b, size_t count)
	 {
	+	if (!a || !b)
	+		BUG("null fspathncmp arguments");
	 	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
	 }

We could perhaps be more defensive in match_pathname(), but even if we
did so, it makes sense for this code to match the rest of the exclude
callers.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: give a more specific error sparse parsing error</title>
<updated>2019-09-16T19:47:45Z</updated>
<author>
<name>Jon Simons</name>
<email>jon@jonsimons.org</email>
</author>
<published>2019-09-15T01:13:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cf34337f9886bb45f16f0114dc8f3265aea912ce'/>
<id>urn:sha1:cf34337f9886bb45f16f0114dc8f3265aea912ce</id>
<content type='text'>
The sparse:oid filter has two error modes: we might fail to resolve the
name to an OID, or we might fail to parse the contents of that OID. In
the latter case, let's give a less generic error message, and mention
the OID we did find.

While we're here, let's also mark both messages as translatable.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: delay parsing of sparse oid</title>
<updated>2019-09-16T19:47:37Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-15T16:12:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4c96a775945d0299e39b982ab9cb32c5132e877d'/>
<id>urn:sha1:4c96a775945d0299e39b982ab9cb32c5132e877d</id>
<content type='text'>
The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>unpack-trees: rename 'is_excluded_from_list()'</title>
<updated>2019-09-05T21:05:12Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-09-03T18:04:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=468ce99b77a0efaf1ace4c31a7b0a7d036fd9ca1'/>
<id>urn:sha1:468ce99b77a0efaf1ace4c31a7b0a7d036fd9ca1</id>
<content type='text'>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

Now that this library is renamed to use 'struct pattern_list'
and 'struct pattern', we can now rename the method used by
the sparse-checkout feature to determine which paths should
appear in the working directory.

The method is_excluded_from_list() is only used by the
sparse-checkout logic in unpack-trees and list-objects-filter.
The confusing part is that it returned 1 for "excluded" (i.e.
it matches the list of exclusions) but that really manes that
the path matched the list of patterns for _inclusion_ in the
working directory.

Rename the method to be path_matches_pattern_list() and have
it return an explicit 'enum pattern_match_result'. Here, the
values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree
with the previous integer values. This shift allows future
consumers to better understand what the retur values mean,
and provides more type checking for handling those values.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
