<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/list-objects-filter-options.h, branch v2.37.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.37.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.37.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-03-28T16:57:21Z</updated>
<entry>
<title>pack-objects: lazily set up "struct rev_info", don't leak</title>
<updated>2022-03-28T16:57:21Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-03-28T15:43:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5cb28270a1ff94a0a23e67b479bbbec3bc993518'/>
<id>urn:sha1:5cb28270a1ff94a0a23e67b479bbbec3bc993518</id>
<content type='text'>
In the preceding [1] (pack-objects: move revs out of
get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
to cmd_pack_objects() so that it unconditionally took place for all
invocations of "git pack-objects".

We'd thus start leaking memory, which is easily reproduced in
e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
information manager from hell, 2005-04-07) to "git pack-objects";

    $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
    [...]
	==19130==ERROR: LeakSanitizer: detected memory leaks

	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)

	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
	Aborted

Narrowly fixing that commit would have been easy, just add call
repo_init_revisions() right before get_object_list(), which is
effectively what was done before that commit.

But an unstated constraint when setting it up early is that it was
needed for the subsequent [2] (pack-objects: parse --filter directly
into revs.filter, 2022-03-22), i.e. we might have a --filter
command-line option, and need to either have the "struct rev_info"
setup when we encounter that option, or later.

Let's just change the control flow so that we'll instead set up the
"struct rev_info" only when we need it. Doing so leads to a bit more
verbosity, but it's a lot clearer what we're doing and why.

An earlier version of this commit[3] went behind
opt_parse_list_objects_filter()'s back by faking up a "struct option"
before calling it. Let's avoid that and instead create a blessed API
for this pattern.

We could furthermore combine the two get_object_list() invocations
here by having repo_init_revisions() invoked on &amp;pfd.revs, but I think
clearly separating the two makes the flow clearer. Likewise
redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
"have_revs" early in cmd_pack_objects().

While we're at it add parentheses around the arguments to the OPT_*
macros in in list-objects-filter-options.h, as we need to change those
lines anyway. It doesn't matter in this case, but is good general
practice.

1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com
3. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: remove CL_ARG__FILTER</title>
<updated>2022-03-23T20:13:17Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-03-22T17:28:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cc910442560e606546a328375c1394a982dfe8a6'/>
<id>urn:sha1:cc910442560e606546a328375c1394a982dfe8a6</id>
<content type='text'>
We have established the command-line interface for the --[no-]filter
options for a while now, so we do not need a helper to make this
editable in the future.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bundle: parse filter capability</title>
<updated>2022-03-09T18:25:27Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-03-09T16:01:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=105c6f14ad34b417c1e78bc9a8704dcda7b059f2'/>
<id>urn:sha1:105c6f14ad34b417c1e78bc9a8704dcda7b059f2</id>
<content type='text'>
The v3 bundle format has capabilities, allowing newer versions of Git to
create bundles with newer features. Older versions that do not
understand these new capabilities will fail with a helpful warning.

Create a new capability allowing Git to understand that the contained
pack-file is filtered according to some object filter. Typically, this
filter will be "blob:none" for a blobless partial clone.

This change teaches Git to parse this capability, place its value in the
bundle header, and demonstrate this understanding by adding a message to
'git bundle verify'.

Since we will use gently_parse_list_objects_filter() outside of
list-objects-filter-options.c, make it an external method and move its
API documentation to before its declaration.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter-options: create copy helper</title>
<updated>2022-03-09T18:25:26Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-03-09T16:01:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4a4c3f9b6389168033843814ad45e69bf0b92b1d'/>
<id>urn:sha1:4a4c3f9b6389168033843814ad45e69bf0b92b1d</id>
<content type='text'>
As we add more embedded members with type 'struct
list_objects_filter_options', it will be important to easily perform a
deep copy across multiple such structs. Create
list_objects_filter_copy() to satisfy this need.

This method is recursive to match the recursive nature of the struct.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects: implement object type filter</title>
<updated>2021-04-19T21:09:11Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-04-19T11:46:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b0c42a53c9d36ea69f4d2650001f05e98eb347cb'/>
<id>urn:sha1:b0c42a53c9d36ea69f4d2650001f05e98eb347cb</id>
<content type='text'>
While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list_objects_filter_options: introduce 'list_object_filter_config_name'</title>
<updated>2020-08-04T01:03:24Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2020-07-31T20:26:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b9ea214795b89e0455bbe92fab5d30d1f3a4cc6d'/>
<id>urn:sha1:b9ea214795b89e0455bbe92fab5d30d1f3a4cc6d</id>
<content type='text'>
In a subsequent commit, we will add configuration options that are
specific to each kind of object filter, in which case it is handy to
have a function that translates between 'enum
list_objects_filter_choice' and an appropriate configuration-friendly
string.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Acked-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Use OPT_CALLBACK and OPT_CALLBACK_F</title>
<updated>2020-04-28T17:47:10Z</updated>
<author>
<name>Denton Liu</name>
<email>liu.denton@gmail.com</email>
</author>
<published>2020-04-28T08:36:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=203c85339fb51bb8b83aae8f0adde44d6e55e018'/>
<id>urn:sha1:203c85339fb51bb8b83aae8f0adde44d6e55e018</id>
<content type='text'>
In the codebase, there are many options which use OPTION_CALLBACK in a
plain ol' struct definition. However, we have the OPT_CALLBACK and
OPT_CALLBACK_F macros which are meant to abstract these plain struct
definitions away. These macros are useful as they semantically signal to
developers that these are just normal callback option with nothing fancy
happening.

Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or
OPT_CALLBACK_F where applicable. The heavy lifting was done using the
following (disgusting) shell script:

	#!/bin/sh

	do_replacement () {
		tr '\n' '\r' |
			sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\s*0,\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' |
			sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' |
			tr '\r' '\n'
	}

	for f in $(git ls-files \*.c)
	do
		do_replacement &lt;"$f" &gt;"$f.tmp"
		mv "$f.tmp" "$f"
	done

The result was manually inspected and then reformatted to match the
style of the surrounding code. Finally, using
`git grep OPTION_CALLBACK \*.c`, leftover results which were not handled
by the script were manually transformed.

Signed-off-by: Denton Liu &lt;liu.denton@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/partial-clone-sparse-blob'</title>
<updated>2019-10-07T02:32:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-07T02:32:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ad8f0368b45bf1ab0f1339033d0a62cee94b1ae2'/>
<id>urn:sha1:ad8f0368b45bf1ab0f1339033d0a62cee94b1ae2</id>
<content type='text'>
The name of the blob object that stores the filter specification
for sparse cloning/fetching was interpreted in a wrong place in the
code, causing Git to abort.

* jk/partial-clone-sparse-blob:
  list-objects-filter: use empty string instead of NULL for sparse "base"
  list-objects-filter: give a more specific error sparse parsing error
  list-objects-filter: delay parsing of sparse oid
  t5616: test cloning/fetching with sparse:oid=&lt;oid&gt; filter
</content>
</entry>
<entry>
<title>Merge branch 'md/list-objects-filter-combo'</title>
<updated>2019-09-18T18:50:09Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-09-18T18:50:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=627b82683447e299fc2e20140318c276efbf7de2'/>
<id>urn:sha1:627b82683447e299fc2e20140318c276efbf7de2</id>
<content type='text'>
The list-objects-filter API (used to create a sparse/lazy clone)
learned to take a combined filter specification.

* md/list-objects-filter-combo:
  list-objects-filter-options: make parser void
  list-objects-filter-options: clean up use of ALLOC_GROW
  list-objects-filter-options: allow mult. --filter
  strbuf: give URL-encoding API a char predicate fn
  list-objects-filter-options: make filter_spec a string_list
  list-objects-filter-options: move error check up
  list-objects-filter: implement composite filters
  list-objects-filter-options: always supply *errbuf
  list-objects-filter: put omits set in filter struct
  list-objects-filter: encapsulate filter components
</content>
</entry>
<entry>
<title>list-objects-filter: delay parsing of sparse oid</title>
<updated>2019-09-16T19:47:37Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-15T16:12:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=4c96a775945d0299e39b982ab9cb32c5132e877d'/>
<id>urn:sha1:4c96a775945d0299e39b982ab9cb32c5132e877d</id>
<content type='text'>
The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Jeff Hostetler &lt;jeffhost@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
