<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/tree-walk.c, branch v2.40.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.40.3</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.40.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-08-10T21:26:25Z</updated>
<entry>
<title>tree-walk: add a mechanism for getting non-canonicalized modes</title>
<updated>2022-08-10T21:26:25Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-08-10T21:01:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ec18b10bf20574fc6d60c966412a11c81f9c17e0'/>
<id>urn:sha1:ec18b10bf20574fc6d60c966412a11c81f9c17e0</id>
<content type='text'>
When using init_tree_desc() and tree_entry() to iterate over a tree, we
always canonicalize the modes coming out of the tree. This is a good
thing to prevent bugs or oddities in normal code paths, but it's
counter-productive for tools like fsck that want to see the exact
contents.

We can address this by adding an option to avoid the extra
canonicalization. A few notes on the implementation:

  - I've attached the new option to the tree_desc struct itself. The
    actual code change is in decode_tree_entry(), which is in turn
    called by the public update_tree_entry(), tree_entry(), and
    init_tree_desc() functions, plus their "gently" counterparts.

    By letting it ride along in the struct, we can avoid changing the
    signature of those functions, which are called many times. Plus it's
    conceptually simpler: you really want a particular iteration of a
    tree to be "raw" or not, rather than individual calls.

  - We still have to set the new option somewhere. The struct is
    initialized by init_tree_desc(). I added the new flags field only to
    the "gently" version. That avoids disturbing the much more numerous
    non-gentle callers, and it makes sense that anybody being careful
    about looking at raw modes would also be careful about bogus trees
    (i.e., the caller will be something like fsck in the first place).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file API: pass an enum to read_object_with_reference()</title>
<updated>2022-02-26T01:16:32Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-02-04T23:48:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6aea6baeb3ece6c832dbdf1deed09f41aebf85c2'/>
<id>urn:sha1:6aea6baeb3ece6c832dbdf1deed09f41aebf85c2</id>
<content type='text'>
Change the read_object_with_reference() function to take an "enum
object_type". It was not prepared to handle an arbitrary "const
char *type", as it was itself calling type_from_string().

Let's change the only caller that passes in user data to use
type_from_string(), and convert the rest to use e.g. "OBJ_TREE"
instead of "tree_type".

The "cat-file" caller is not on the codepath that
handles"--allow-unknown", so the type_from_string() there is safe. Its
use of type_from_string() doesn't functionally differ from that of the
pre-image.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Always use oidread to read into struct object_id</title>
<updated>2021-04-27T07:31:38Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2021-04-26T01:02:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=92e2cab96b8b8ea9a076dc279864226b3d0863e9'/>
<id>urn:sha1:92e2cab96b8b8ea9a076dc279864226b3d0863e9</id>
<content type='text'>
In the future, we'll want oidread to automatically set the hash
algorithm member for an object ID we read into it, so ensure we use
oidread instead of hashcpy everywhere we're copying a hash value into a
struct object_id.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-walk: report recursion counts</title>
<updated>2021-01-04T23:23:08Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-01-04T03:09:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=da8be8ced672fc00d5dea8a95d04854b2e35d925'/>
<id>urn:sha1:da8be8ced672fc00d5dea8a95d04854b2e35d925</id>
<content type='text'>
The traverse_trees() method recursively walks through trees, but also
prunes the tree-walk based on a callback. Some callers, such as
unpack_trees(), are quite complicated and can have wildly different
performance between two different commands.

Create constants that count these values and then report the results at
the end of a process. These counts are cumulative across multiple "root"
instances of traverse_trees(), but they provide reproducible values for
demonstrating improvements to the pruning algorithm when possible.

This change is modeled after a similar statistics reporting in 42e50e78
(revision.c: add trace2 stats around Bloom filter usage, 2020-04-06).

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-walk.c: don't match submodule entries for 'submod/anything'</title>
<updated>2020-06-08T19:28:48Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2020-06-05T13:00:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=35a9f1e99c5d31635bb78a4f2d498e72c04fc471'/>
<id>urn:sha1:35a9f1e99c5d31635bb78a4f2d498e72c04fc471</id>
<content type='text'>
Submodules should be handled the same as regular directories with
respect to the presence of a trailing slash, i.e. commands like:

  git diff rev1 rev2 -- $path
  git rev-list HEAD -- $path

should produce the same output whether $path is 'submod' or 'submod/'.
This has been fixed in commit 74b4f7f277 (tree-walk.c: ignore trailing
slash on submodule in tree_entry_interesting(), 2014-01-23).

Unfortunately, that commit had the unintended side effect to handle
'submod/anything' the same as 'submod' and 'submod/' as well, e.g.:

  $ git log --oneline --name-only -- sha1collisiondetection/whatever
  4125f78222 sha1dc: update from upstream
  sha1collisiondetection
  07a20f569b Makefile: fix unaligned loads in sha1dc with UBSan
  sha1collisiondetection
  23e37f8e9d sha1dc: update from upstream
  sha1collisiondetection
  86cfd61e6b sha1dc: optionally use sha1collisiondetection as a submodule
  sha1collisiondetection

Fix this by rejecting submodules as partial pathnames when their
trailing slash is followed by anything.

Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-walk.c: break circular dependency with unpack-trees</title>
<updated>2020-02-04T18:32:15Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-01T11:39:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5290d4513496d89f84570985a0e02e97dff477ff'/>
<id>urn:sha1:5290d4513496d89f84570985a0e02e97dff477ff</id>
<content type='text'>
The unpack-trees API depends on the tree-walk API. But we've recently
introduced a dependency in tree-walk.c on MAX_UNPACK_TREES, which
doesn't otherwise care about unpack-trees at all.

Let's break that dependency by reversing the constants: we'll introduce
a new MAX_TRAVERSE_TREES which belongs to the tree-walk API. And then we
can define MAX_UNPACK_TREES in terms of that (since unpack-trees cannot
possibly work with more trees than it can traverse at once via
tree-walk).

The value for both will remain at 8. This is somewhat arbitrary and
probably more than is necessary, per ca885a4fe6 (read-tree() and
unpack_trees(): use consistent limit, 2008-03-13), but there's not
really any pressing need to reduce it.

Suggested-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Acked-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>traverse_trees(): use stack array for name entries</title>
<updated>2020-01-30T21:55:30Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-01-30T09:53:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8dd40c0472a8f51f4fba4b8fd28dc3e9421cd7d7'/>
<id>urn:sha1:8dd40c0472a8f51f4fba4b8fd28dc3e9421cd7d7</id>
<content type='text'>
We heap-allocate our arrays of name_entry structs, etc, with one entry
per tree we're asked to traverse. The code does a raw multiplication in
the xmalloc() call, which I find when auditing for integer overflows
during allocation.

We could "fix" this by using ALLOC_ARRAY() instead. But as it turns out,
the maximum size of these arrays is limited at compile time:

  - merge_trees() always passes in 3 trees

  - unpack_trees() and its brethren never pass in more than
    MAX_UNPACK_TREES

So we can simplify even further by just using a stack array and bounding
it with MAX_UNPACK_TREES. There should be no concern with overflowing
the stack, since MAX_UNPACK_TREES is only 8 and the structs themselves
are small.

Note that since we're replacing xcalloc(), we have to move one of the
NULL initializations into a loop.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'</title>
<updated>2020-01-06T22:17:50Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-01-06T22:17:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a578ef9e63a2f53ada00beb9d75b23e68061b331'/>
<id>urn:sha1:a578ef9e63a2f53ada00beb9d75b23e68061b331</id>
<content type='text'>
An earlier update to Git for Windows declared that a tree object is
invalid if it has a path component with backslash in it, which was
overly strict, which has been corrected.  The only protection the
Windows users need is to prevent such path (or any path that their
filesystem cannot check out) from entering the index.

* js/mingw-loosen-overstrict-tree-entry-checks:
  mingw: only test index entries for backslashes, not tree entries
</content>
</entry>
<entry>
<title>mingw: only test index entries for backslashes, not tree entries</title>
<updated>2020-01-02T20:56:08Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2019-12-31T22:53:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=224c7d70fa14ed44d8e7e3ce1e165e05b7b23725'/>
<id>urn:sha1:224c7d70fa14ed44d8e7e3ce1e165e05b7b23725</id>
<content type='text'>
During a clone of a repository that contained a file with a backslash in
its name in the past, as of v2.24.1(2), Git for Windows prints errors
like this:

	error: filename in tree entry contains backslash: '\'

The idea is to prevent Git from even trying to write files with
backslashes in their file names: while these characters are valid in
file names on other platforms, on Windows it is interpreted as directory
separator (which would obviously lead to ambiguities, e.g. when there is
a file `a\b` and there is also a file `a/b`).

Arguably, this is the wrong layer for that error: As long as the user
never checks out the files whose names contain backslashes, there should
not be any problem in the first place.

So let's loosen the requirements: we now leave tree entries with
backslashes in their file names alone, but we do require any entries
that are added to the Git index to contain no backslashes on Windows.

Note: just as before, the check is guarded by `core.protectNTFS` (to
allow overriding the check by toggling that config setting), and it
is _only_ performed on Windows, as the backslash is not a directory
separator elsewhere, even when writing to NTFS-formatted volumes.

An alternative approach would be to try to prevent creating files with
backslashes in their file names. However, that comes with its own set of
problems. For example, `git config -f C:\ProgramData\Git\config ...` is
a very valid way to specify a custom config location, and we obviously
do _not_ want to prevent that. Therefore, the approach chosen in this
patch would appear to be better.

This addresses https://github.com/git-for-windows/git/issues/2435

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Sync with Git 2.24.1</title>
<updated>2019-12-10T06:17:55Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-12-10T06:17:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7034cd094bda4edbcdff7fad1a28fcaaf9b9a040'/>
<id>urn:sha1:7034cd094bda4edbcdff7fad1a28fcaaf9b9a040</id>
<content type='text'>
</content>
</entry>
</feed>
