git/entry.c, branch v2.40.2

Sync with 2.39.4

2024-04-19T10:38:37Z

* maint-2.39: (38 commits) Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default fetch/clone: detect dubious ownership of local repositories ...

clone: when symbolic links collide with directories, keep the latter

2024-04-17T20:30:08Z

When recursively cloning a repository with submodules, we must ensure that the submodules paths do not suddenly contain symbolic links that would let Git write into unintended locations. We just plugged that vulnerability, but let's add some more defense-in-depth. Since we can only keep one item on disk if multiple index entries' paths collide, we may just as well avoid keeping a symbolic link (because that would allow attack vectors where Git follows those links by mistake). Technically, we handle more situations than cloning submodules into paths that were (partially) replaced by symbolic links. This provides defense-in-depth in case someone finds a case-folding confusion vulnerability in the future that does not even involve submodules. Signed-off-by: Johannes Schindelin

entry: report more colliding paths

2024-04-17T20:30:07Z

In b878579ae7 (clone: report duplicate entries on case-insensitive filesystems, 2018-08-17) code was added to warn about index entries that resolve to the same file system entity (usually the cause is a case-insensitive filesystem). In Git for Windows, where inodes are not trusted (because of a performance trade-off, inodes are equal to 0 by default), that check does not compare inode numbers but the verbatim path. This logic works well when index entries' paths differ only in case. However, for file/directory conflicts only the file's path was reported, leaving the user puzzled with what that path collides. Let's try ot catch colliding paths even if one path is the prefix of the other. We do this also in setups where the file system is case-sensitive because the inode check would not be able to catch those collisions. While not a complete solution (for example, on macOS, Unicode normalization could also lead to file/directory conflicts but be missed by this logic), it is at least another defensive layer on top of what the previous commits added. Signed-off-by: Johannes Schindelin

read-tree: add "--super-prefix" option, eliminate global

2022-12-26T01:21:44Z

The "--super-prefix" option to "git" was initially added in [1] for use with "ls-files"[2], and shortly thereafter "submodule--helper"[3] and "grep"[4]. It wasn't until [5] that "read-tree" made use of it. At the time [5] made sense, but since then we've made "ls-files" recurse in-process in [6], "grep" in [7], and finally "submodule--helper" in the preceding commits. Let's also remove it from "read-tree", which allows us to remove the option to "git" itself. We can do this because the only remaining user of it is the submodule API, which will now invoke "read-tree" with its new "--super-prefix" option. It will only do so when the "submodule_move_head()" function is called. That "submodule_move_head()" function was then only invoked by "read-tree" itself, but now rather than setting an environment variable to pass "--super-prefix" between cmd_read_tree() we: - Set a new "super_prefix" in "struct unpack_trees_options". The "super_prefixed()" function in "unpack-trees.c" added in [5] will now use this, rather than get_super_prefix() looking up the environment variable we set earlier in the same process. - Add the same field to the "struct checkout", which is only needed to ferry the "super_prefix" in the "struct unpack_trees_options" all the way down to the "entry.c" callers of "submodule_move_head()". Those calls which used the super prefix all originated in "cmd_read_tree()". The only other caller is the "unlink_entry()" caller in "builtin/checkout.c", which now passes a "NULL". 1. 74866d75793 (git: make super-prefix option, 2016-10-07) 2. e77aa336f11 (ls-files: optionally recurse into submodules, 2016-10-07) 3. 89c86265576 (submodule helper: support super prefix, 2016-12-08) 4. 0281e487fd9 (grep: optionally recurse into submodules, 2016-12-16) 5. 3d415425c7b (unpack-trees: support super-prefix option, 2017-01-17) 6. 188dce131fa (ls-files: use repository object, 2017-06-22) 7. f9ee2fcdfa0 (grep: recurse in-process using 'struct repository', 2017-08-02) Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano

checkout: fix two bugs on the final count of updated entries

2022-07-14T17:19:28Z

At the end of `git checkout `, we get a message informing how many entries were updated in the working tree. However, this number can be inaccurate for two reasons: 1) Delayed entries currently get counted twice. 2) Failed entries are included in the count. The first problem happens because the counter is first incremented before inserting the entry in the delayed checkout queue, and once again when finish_delayed_checkout() calls checkout_entry(). And the second happens because the counter is incremented too early in checkout_entry(), before the entry was in fact checked out. Fix that by moving the count increment further down in the call stack and removing the duplicate increment on delayed entries. Note that we have to keep a per-entry reference for the counter (both on parallel checkout and delayed checkout) because not all entries are always accumulated at the same counter. See checkout_worktree(), at builtin/checkout.c for an example. Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

Merge branch 'mc/clean-smudge-with-llp64'

2021-11-29T23:41:51Z

The clean/smudge conversion code path has been prepared to better work on platforms where ulong is narrower than size_t. * mc/clean-smudge-with-llp64: clean/smudge: allow clean filters to process extremely large files odb: guard against data loss checking out a huge file git-compat-util: introduce more size_t helpers odb: teach read_blob_entry to use size_t t1051: introduce a smudge filter test for extremely large files test-lib: add prerequisite for 64-bit platforms test-tool genzeros: generate large amounts of data more efficiently test-genzeros: allow more than 2G zeros in Windows

odb: teach read_blob_entry to use size_t

2021-11-03T18:22:27Z

There is mixed use of size_t and unsigned long to deal with sizes in the codebase. Recall that Windows defines unsigned long as 32 bits even on 64-bit platforms, meaning that converting size_t to unsigned long narrows the range. This mostly doesn't cause a problem since Git rarely deals with files larger than 2^32 bytes. But adjunct systems such as Git LFS, which use smudge/clean filters to keep huge files out of the repository, may have huge file contents passed through some of the functions in entry.c and convert.c. On Windows, this results in a truncated file being written to the workdir. I traced this to one specific use of unsigned long in write_entry (and a similar instance in write_pc_item_to_fd for parallel checkout). That appeared to be for the call to read_blob_entry, which expects a pointer to unsigned long. By altering the signature of read_blob_entry to expect a size_t, write_entry can be switched to use size_t internally (which all of its callers and most of its callees already used). To avoid touching dozens of additional files, read_blob_entry uses a local unsigned long to call a chain of functions which aren't prepared to accept size_t. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano

entry: show finer-grained counter in "Filtering content" progress line

2021-09-09T16:58:19Z

The "Filtering content" progress in entry.c:finish_delayed_checkout() is unusual because of how it calculates the progress count and because it shows the progress of a nested loop. It works basically like this: start_delayed_progress(p, nr_of_paths_to_filter) for_each_filter { display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter) for_each_path_handled_by_the_current_filter { checkout_entry() } } stop_progress(p) There are two issues with this approach: - The work done by the last filter (or the only filter if there is only one) is never counted, so if the last filter still has some paths to process, then the counter shown in the "done" progress line will not match the expected total. The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1 mode[1] helps spot this issue. Under it the 'missing file in delayed checkout' and 'invalid file in delayed checkout' tests in 't0021-conversion.sh' fail, because both use only one filter. (The test 'delayed checkout in process filter' uses two filters but the first one does all the work, so that test already happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.) - The progress counter is updated only once per filter, not once per processed path, so if a filter has a lot of paths to process, then the counter might stay unchanged for a long while and then make a big jump (though the user still gets a sense of progress, because we call display_throughput() after each processed path to show the amount of processed data). Move the display_progress() call to the inner loop, right next to that checkout_entry() call that does the hard work for each path, and use a dedicated counter variable that is incremented upon processing each path. After this change the 'invalid file in delayed checkout' in 't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1 assertion discussed above, but the 'missing file in delayed checkout' test would still fail. It'll fail because its purposefully buggy filter doesn't process any paths, so we won't execute that inner loop at all, see [2] for how to spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not straightforward to fix it with the current progress.c library (see [3] for an attempt), so let's leave it for now. Let's also initialize the *progress to "NULL" while we're at it. Since 7a132c628e5 (checkout: make delayed checkout respect --quiet and --no-progress, 2021-08-26) we have had progress conditional on "show_progress", usually we use the idiom of a "NULL" initialization of the "*progress", rather than the more verbose ternary added in 7a132c628e5. 1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/ 2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev 3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/ Signed-off-by: SZEDER Gábor Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano

checkout: make delayed checkout respect --quiet and --no-progress

2021-08-27T06:15:33Z

The 'Filtering contents...' progress report from delayed checkout is displayed even when checkout and clone are invoked with --quiet or --no-progress. Furthermore, it is displayed unconditionally, without first checking whether stdout is a tty. Let's fix these issues and also add some regression tests for the two code paths that currently use delayed checkout: unpack_trees.c:check_updates() and builtin/checkout.c:checkout_worktree(). Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

string-list.h users: change to use *_{nodup,dup}()

2021-07-01T19:32:22Z

Change all in-tree users of the string_list_init(LIST, BOOL) API to use string_list_init_{nodup,dup}(LIST) instead. As noted in the preceding commit let's leave the now-unused string_list_init() wrapper in-place for any in-flight users, it can be removed at some later date. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano