git/parallel-checkout.c, branch v2.40.2

checkout: fix two bugs on the final count of updated entries

2022-07-14T17:19:28Z

At the end of `git checkout `, we get a message informing how many entries were updated in the working tree. However, this number can be inaccurate for two reasons: 1) Delayed entries currently get counted twice. 2) Failed entries are included in the count. The first problem happens because the counter is first incremented before inserting the entry in the delayed checkout queue, and once again when finish_delayed_checkout() calls checkout_entry(). And the second happens because the counter is incremented too early in checkout_entry(), before the entry was in fact checked out. Fix that by moving the count increment further down in the call stack and removing the duplicate increment on delayed entries. Note that we have to keep a per-entry reference for the counter (both on parallel checkout and delayed checkout) because not all entries are always accumulated at the same counter. See checkout_worktree(), at builtin/checkout.c for an example. Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

i18n: factorize "invalid value" messages

2022-02-04T21:58:28Z

Use the same message when an invalid value is passed to a command line option or a configuration variable. Signed-off-by: Jean-Noël Avila Signed-off-by: Junio C Hamano

Merge branch 'mc/clean-smudge-with-llp64'

2021-11-29T23:41:51Z

The clean/smudge conversion code path has been prepared to better work on platforms where ulong is narrower than size_t. * mc/clean-smudge-with-llp64: clean/smudge: allow clean filters to process extremely large files odb: guard against data loss checking out a huge file git-compat-util: introduce more size_t helpers odb: teach read_blob_entry to use size_t t1051: introduce a smudge filter test for extremely large files test-lib: add prerequisite for 64-bit platforms test-tool genzeros: generate large amounts of data more efficiently test-genzeros: allow more than 2G zeros in Windows

odb: teach read_blob_entry to use size_t

2021-11-03T18:22:27Z

There is mixed use of size_t and unsigned long to deal with sizes in the codebase. Recall that Windows defines unsigned long as 32 bits even on 64-bit platforms, meaning that converting size_t to unsigned long narrows the range. This mostly doesn't cause a problem since Git rarely deals with files larger than 2^32 bytes. But adjunct systems such as Git LFS, which use smudge/clean filters to keep huge files out of the repository, may have huge file contents passed through some of the functions in entry.c and convert.c. On Windows, this results in a truncated file being written to the workdir. I traced this to one specific use of unsigned long in write_entry (and a similar instance in write_pc_item_to_fd for parallel checkout). That appeared to be for the call to read_blob_entry, which expects a pointer to unsigned long. By altering the signature of read_blob_entry to expect a size_t, write_entry can be switched to use size_t internally (which all of its callers and most of its callees already used). To avoid touching dozens of additional files, read_blob_entry uses a local unsigned long to call a chain of functions which aren't prepared to accept size_t. Helped-by: Johannes Schindelin Signed-off-by: Matt Cooper Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano

pkt-line.[ch]: remove unused packet_read_line_buf()

2021-10-15T20:09:40Z

This function was added in 4981fe750b1 (pkt-line: share buffer/descriptor reading implementation, 2013-02-23), but in 01f9ec64c8a (Use packet_reader instead of packet_read_line, 2018-12-29) the code that was using it was removed. Since it's being removed we can in turn remove the "src" and "src_len" arguments to packet_read(), all the remaining users just passed a NULL/NULL pair to it. That function is only a thin wrapper for packet_read_with_status() which still needs those arguments, but for the thin packet_read() convenience wrapper we can do away with it for now. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano

parallel-checkout: send the new object_id algo field to the workers

2021-05-17T20:38:54Z

An object_id storing a SHA-1 name has some unused bytes at the end of the hash array. Since these bytes are not used, they are usually not initialized to any value either. However, at parallel_checkout.c:send_one_item() the object_id of a cache entry is copied into a buffer which is later sent to a checkout worker through a pipe write(). This makes Valgrind complain about passing uninitialized bytes to a syscall. The worker won't use these uninitialized bytes either, but the warning could confuse someone trying to debug this code; So instead of using oidcpy(), send_one_item() uses hashcpy() to only copy the used/initialized bytes of the object_id, and leave the remaining part with zeros. However, since cf0983213c ("hash: add an algo member to struct object_id", 2021-04-26), using hashcpy() is no longer sufficient here as it won't copy the new algo field from the object_id. Let's add and use a new function which meets both our requirements of copying all the important object_id data while still avoiding the uninitialized bytes, by padding the end of the hash array in the destination object_id. With this change, we also no longer need the destination buffer from send_one_item() to be initialized with zeros, so let's switch from xcalloc() to xmalloc() to make this clear. Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

ci: run test round with parallel-checkout enabled

2021-05-05T03:27:17Z

We already have tests for the basic parallel-checkout operations. But this code can also run be executed by other commands, such as git-read-tree and git-sparse-checkout, which are currently not tested with multiple workers. To promote a wider test coverage without duplicating tests: 1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally force parallel-checkout execution during the whole test suite. 2. Set this variable (with a value of 2) in the second test round of our linux-gcc CI job. This round runs `make test` again with some optional GIT_TEST_* variables enabled, so there is no additional overhead in exercising the parallel-checkout code here. Note that tests checking out less than two parallel-eligible entries will fall back to the sequential mode. Nevertheless, it's still a good exercise for the parallel-checkout framework as the fallback codepath also writes the queued entries using the parallel-checkout functions (only without spawning any worker). Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

parallel-checkout: add tests related to path collisions

2021-05-05T03:26:36Z

Add tests to confirm that path collisions are properly detected by checkout workers, both to avoid race conditions and to report colliding entries on clone. Co-authored-by: Jeff Hostetler Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

parallel-checkout: support progress displaying

2021-04-19T18:57:05Z

Original-patch-by: Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano

parallel-checkout: add configuration options

2021-04-19T18:57:05Z

Make parallel checkout configurable by introducing two new settings: checkout.workers and checkout.thresholdForParallelism. The first defines the number of workers (where one means sequential checkout), and the second defines the minimum number of entries to attempt parallel checkout. To decide the default value for checkout.workers, the parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The four tables below show the mean run times and standard deviations for 5 runs in: a local file system on SSD, a local file system on HDD, a Linux NFS server, and Amazon EFS (all on Linux). Each parallel checkout test was executed with the number of workers that brings the best overall results in that environment. Local SSD: Sequential 10 workers Speedup Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03 Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03 Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03 Local HDD: Sequential 10 workers Speedup Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03 Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18 Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07 Linux NFS server (v4.1, on EBS, single availability zone): Sequential 32 workers Speedup Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13 Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07 Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08 EFS (v4.1, replicated over multiple availability zones): Sequential 32 workers Speedup Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07 Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03 Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03 The above benchmarks show that parallel checkout is most effective on repositories located on an SSD or over a distributed file system. For local file systems on spinning disks, and/or older machines, the parallelism does not always bring a good performance. For this reason, the default value for checkout.workers is one, a.k.a. sequential checkout. To decide the default value for checkout.thresholdForParallelism, another benchmark was executed in the "Local SSD" setup, where parallel checkout showed to be beneficial. This time, we compared the runtime of a `git checkout -f`, with and without parallelism, after randomly removing an increasing number of files from the Linux working tree. The "sequential fallback" column below corresponds to the executions where checkout.workers was 10 but checkout.thresholdForParallelism was equal to the number of to-be-updated files plus one (so that we end up writing sequentially). Each test case was sampled 15 times, and each sample had a randomly different set of files removed. Here are the results: sequential fallback 10 workers speedup 10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02 20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02 50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02 100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04 200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05 500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09 1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12 2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12 From the above numbers, 100 files seems to be a reasonable default value for the threshold setting. Note: Up to 1000 files, we observe a drop in the execution time of the parallel code with an increase in the number of files. This is a rather odd behavior, but it was observed in multiple repetitions. Above 1000 files, the execution time increases according to the number of files, as one would expect. About the test environments: Local SSD tests were executed on an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing, the linux repository was removed (or checked out back to its previous state), and `sync && sysctl vm.drop_caches=3` was executed. Co-authored-by: Jeff Hostetler Signed-off-by: Matheus Tavares Signed-off-by: Junio C Hamano