<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/list-objects-filter.c, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-05-05T04:57:58Z</updated>
<entry>
<title>list-objects-filter: treat NULL filter_options as "disabled"</title>
<updated>2020-05-05T04:57:58Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-05-04T23:12:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5bf7f1eaa51b3f35161e1c9e4d8bc843330dea3c'/>
<id>urn:sha1:5bf7f1eaa51b3f35161e1c9e4d8bc843330dea3c</id>
<content type='text'>
In most callers, we have an actual list_objects_filter_options struct,
and if no filtering is desired its "choice" element will be
LOFC_DISABLED. However, some code may have only a pointer to such a
struct which may be NULL (because _their_ callers didn't care about
filtering, either). Rather than forcing them to handle this explicitly
like:

  if (filter_options)
          traverse_commit_list_filtered(filter_options, revs,
	                                show_commit, show_object,
					show_data, NULL);
  else
          traverse_commit_list(revs, show_commit, show_object,
	                             show_data);

let's just treat a NULL filter_options the same as LOFC_DISABLED. We
only need a small change, since that option struct is converted into a
real filter only in the "init" function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/partial-clone-sparse-blob'</title>
<updated>2019-10-07T02:32:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-07T02:32:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ad8f0368b45bf1ab0f1339033d0a62cee94b1ae2'/>
<id>urn:sha1:ad8f0368b45bf1ab0f1339033d0a62cee94b1ae2</id>
<content type='text'>
The name of the blob object that stores the filter specification
for sparse cloning/fetching was interpreted in a wrong place in the
code, causing Git to abort.

* jk/partial-clone-sparse-blob:
  list-objects-filter: use empty string instead of NULL for sparse "base"
  list-objects-filter: give a more specific error sparse parsing error
  list-objects-filter: delay parsing of sparse oid
  t5616: test cloning/fetching with sparse:oid=&lt;oid&gt; filter
</content>
</entry>
<entry>
<title>list-objects-filter: use empty string instead of NULL for sparse "base"</title>
<updated>2019-09-16T19:47:51Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-15T16:51:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a4cafc737916c2df5a52875cb1d0976662e3ab0e'/>
<id>urn:sha1:a4cafc737916c2df5a52875cb1d0976662e3ab0e</id>
<content type='text'>
We use add_excludes_from_blob_to_list() to parse a sparse blob. Since
we don't have a base path, we pass NULL and 0 for the base and baselen,
respectively. But the rest of the exclude code passes a literal empty
string instead of NULL for this case. And indeed, we eventually end up
with match_pathname() calling fspathncmp(), which then calls the system
strncmp(path, base, baselen).

This works on many platforms, which notice that baselen is 0 and do not
look at the bytes of "base" at all. But it does violate the C standard,
and building with SANITIZE=undefined will complain. You can also see it
by instrumenting fspathncmp like this:

	diff --git a/dir.c b/dir.c
	index d021c908e5..4bb3d3ec96 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -71,6 +71,8 @@ int fspathcmp(const char *a, const char *b)

	 int fspathncmp(const char *a, const char *b, size_t count)
	 {
	+	if (!a || !b)
	+		BUG("null fspathncmp arguments");
	 	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
	 }

We could perhaps be more defensive in match_pathname(), but even if we
did so, it makes sense for this code to match the rest of the exclude
callers.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: give a more specific error sparse parsing error</title>
<updated>2019-09-16T19:47:45Z</updated>
<author>
<name>Jon Simons</name>
<email>jon@jonsimons.org</email>
</author>
<published>2019-09-15T01:13:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cf34337f9886bb45f16f0114dc8f3265aea912ce'/>
<id>urn:sha1:cf34337f9886bb45f16f0114dc8f3265aea912ce</id>
<content type='text'>
The sparse:oid filter has two error modes: we might fail to resolve the
name to an OID, or we might fail to parse the contents of that OID. In
the latter case, let's give a less generic error message, and mention
the OID we did find.

While we're here, let's also mark both messages as translatable.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: delay parsing of sparse oid</title>
<updated>2019-09-16T19:47:37Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-15T16:12:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4c96a775945d0299e39b982ab9cb32c5132e877d'/>
<id>urn:sha1:4c96a775945d0299e39b982ab9cb32c5132e877d</id>
<content type='text'>
The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>unpack-trees: rename 'is_excluded_from_list()'</title>
<updated>2019-09-05T21:05:12Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-09-03T18:04:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=468ce99b77a0efaf1ace4c31a7b0a7d036fd9ca1'/>
<id>urn:sha1:468ce99b77a0efaf1ace4c31a7b0a7d036fd9ca1</id>
<content type='text'>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

Now that this library is renamed to use 'struct pattern_list'
and 'struct pattern', we can now rename the method used by
the sparse-checkout feature to determine which paths should
appear in the working directory.

The method is_excluded_from_list() is only used by the
sparse-checkout logic in unpack-trees and list-objects-filter.
The confusing part is that it returned 1 for "excluded" (i.e.
it matches the list of exclusions) but that really manes that
the path matched the list of patterns for _inclusion_ in the
working directory.

Rename the method to be path_matches_pattern_list() and have
it return an explicit 'enum pattern_match_result'. Here, the
values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree
with the previous integer values. This shift allows future
consumers to better understand what the retur values mean,
and provides more type checking for handling those values.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: rename 'exclude' methods to 'pattern'</title>
<updated>2019-09-05T21:05:12Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-09-03T18:04:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=65edd96aecdee2cd4d16a7c17ae9f723c3fe61a4'/>
<id>urn:sha1:65edd96aecdee2cd4d16a7c17ae9f723c3fe61a4</id>
<content type='text'>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':

 * last_exclude_matching() -&gt; last_matching_pattern()
 * parse_exclude() -&gt; parse_path_pattern()

In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:

 * add_exclude_list()
 * add_excludes_from_file_to_list()
 * add_excludes_from_file()
 * add_excludes_from_blob_to_list()
 * add_exclude()
 * clear_exclude_list()

A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: rename 'struct exclude_list' to 'struct pattern_list'</title>
<updated>2019-09-05T21:05:11Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-09-03T18:04:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=caa3d5544474a6ef8e8d7db5c073c1564b76d8bb'/>
<id>urn:sha1:caa3d5544474a6ef8e8d7db5c073c1564b76d8bb</id>
<content type='text'>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames 'struct exclude_list' to 'struct pattern_list'
and renames several variables called 'el' to 'pl'.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: implement composite filters</title>
<updated>2019-06-28T15:41:53Z</updated>
<author>
<name>Matthew DeVore</name>
<email>matvore@google.com</email>
</author>
<published>2019-06-27T22:54:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e987df5fe62b8b29be4cdcdeb3704681ada2b29e'/>
<id>urn:sha1:e987df5fe62b8b29be4cdcdeb3704681ada2b29e</id>
<content type='text'>
Allow combining filters such that only objects accepted by all filters
are shown. The motivation for this is to allow getting directory
listings without also fetching blobs. This can be done by combining
blob:none with tree:&lt;depth&gt;. There are massive repositories that have
larger-than-expected trees - even if you include only a single commit.

A combined filter supports any number of subfilters, and is written in
the following form:

	combine:&lt;filter 1&gt;+&lt;filter 2&gt;+&lt;filter 3&gt;

Certain non-alphanumeric characters in each filter must be
URL-encoded.

For now, combined filters must be specified in this form. In a
subsequent commit, rev-list will support multiple --filter arguments
which will have the same effect as specifying one filter argument
starting with "combine:". The documentation will be updated in that
commit, as the URL-encoding scheme is in general not meant to be used
directly by the user, and it is better to describe the URL-encoding
feature in terms of the repeated flag.

Helped-by: Emily Shaffer &lt;emilyshaffer@google.com&gt;
Helped-by: Jeff Hostetler &lt;git@jeffhostetler.com&gt;
Helped-by: Johannes Schindelin &lt;Johannes.Schindelin@gmx.de&gt;
Helped-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Helped-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Matthew DeVore &lt;matvore@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: put omits set in filter struct</title>
<updated>2019-06-28T15:41:53Z</updated>
<author>
<name>Matthew DeVore</name>
<email>matvore@google.com</email>
</author>
<published>2019-06-27T22:54:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7a7c7f4a6d22477e3548021eb3571384651c00be'/>
<id>urn:sha1:7a7c7f4a6d22477e3548021eb3571384651c00be</id>
<content type='text'>
The oidset *omits pointer must be accessed by the combine filter in a
type-agnostic way once the graph traversal is over. Store that pointer
in the general `filter` struct. This will be used in a follow-up patch
to implement the combine filter.

Signed-off-by: Matthew DeVore &lt;matvore@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
