<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/dir.h, branch jch</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=jch</id>
<link rel='self' href='https://git.shady.money/git/atom?h=jch'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2025-09-12T15:59:52Z</updated>
<entry>
<title>dir: add generic "walk all files" helper</title>
<updated>2025-09-12T15:59:52Z</updated>
<author>
<name>Derrick Stolee</name>
<email>stolee@gmail.com</email>
</author>
<published>2025-09-12T10:30:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1588e836bb956d14e6cb38e35933ed2749c023b4'/>
<id>urn:sha1:1588e836bb956d14e6cb38e35933ed2749c023b4</id>
<content type='text'>
There is sometimes a need to visit every file within a directory,
recursively. The main example is remove_dir_recursively(), though it has
some extra flags that make it want to iterate over paths in a custom
way. There is also the fill_directory() approach but that involves an
index and a pathspec.

This change adds a new for_each_file_in_dir() method that will be
helpful in the next change.

Signed-off-by: Derrick Stolee &lt;stolee@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir: move starts_with_dot(_dot)_slash to dir.h</title>
<updated>2025-06-23T23:38:56Z</updated>
<author>
<name>Jacob Keller</name>
<email>jacob.keller@gmail.com</email>
</author>
<published>2025-06-23T23:11:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=059268fd056c4063eb913e6ec265bcdf85437b03'/>
<id>urn:sha1:059268fd056c4063eb913e6ec265bcdf85437b03</id>
<content type='text'>
Both submodule--helper.c and submodule-config.c have an implementation
of starts_with_dot_slash and starts_with_dot_dot_slash. The dir.h header
has starts_with_dot(_dot)_slash_native, which sets PATH_MATCH_NATIVE.

Move the helpers to dir.h as static inlines. I thought about renaming
them to postfix with _platform but that felt too long and ugly. On the
other hand it might be slightly confusing with _native.

This simplifies a submodule refactor which wants to use the helpers
earlier in the submodule--helper.c file.

Signed-off-by: Jacob Keller &lt;jacob.keller@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'am/dir-dedup-decl-of-repository'</title>
<updated>2025-03-29T07:39:09Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2025-03-29T07:39:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f3db666cca0170a43ed602e7130c705882ce7574'/>
<id>urn:sha1:f3db666cca0170a43ed602e7130c705882ce7574</id>
<content type='text'>
Code cleanup.

* am/dir-dedup-decl-of-repository:
  dir.h: remove duplicate forward declaration of struct repository
</content>
</entry>
<entry>
<title>dir.h: remove duplicate forward declaration of struct repository</title>
<updated>2025-03-11T22:13:21Z</updated>
<author>
<name>Abhijeetsingh Meena</name>
<email>abhijeet040403@gmail.com</email>
</author>
<published>2025-03-11T14:59:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5337daddc78605951af39c7f12a9165da3d75462'/>
<id>urn:sha1:5337daddc78605951af39c7f12a9165da3d75462</id>
<content type='text'>
The `struct repository;` forward declaration appears twice in `dir.h`:
once at line 10 and again at line 46. This duplication is unnecessary
and likely unintentional.

Removing the second declaration has no impact on compilation, as verified
by a clean build.

Signed-off-by: Abhijeetsingh Meena &lt;abhijeet040403@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>backfill: add --sparse option</title>
<updated>2025-02-04T00:12:42Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2025-02-03T17:11:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bff455576750bd013a3c87b15cc7086cb8c1eab0'/>
<id>urn:sha1:bff455576750bd013a3c87b15cc7086cb8c1eab0</id>
<content type='text'>
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.

However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.

Note that this is distinctly different from the '--filter=sparse:&lt;oid&gt;'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.

This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.

Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.

To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are

 1. git clone --filter=blob:none --sparse &lt;url&gt;
 2. git sparse-checkout set &lt;dir1&gt; ... &lt;dirN&gt;
 3. git backfill --sparse

For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3          | 110 MB    |       |
| 10K             | 12         | 192 MB    | 17.2s |
| 15K             | 9          | 192 MB    | 15.5s |
| 20K             | 8          | 192 MB    | 15.5s |
| 25K             | 7          | 192 MB    | 14.7s |

This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.

Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:

| Batch Size      | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2          | 1,876 MB  |      |
| 10K             | 11         | 2,187 MB  | 46s  |
| 25K             | 7          | 2,188 MB  | 43s  |
| 50K             | 5          | 2,194 MB  | 44s  |
| 100K            | 4          | 2,194 MB  | 48s  |

This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.

Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.

Signed-off-by: Derrick Stolee &lt;stolee@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>win32: override `fspathcmp()` with a directory separator-aware version</title>
<updated>2024-07-13T23:23:36Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2024-07-13T21:08:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=193eda7507d0ccb2fe6fd42b403c56ecc205e546'/>
<id>urn:sha1:193eda7507d0ccb2fe6fd42b403c56ecc205e546</id>
<content type='text'>
On Windows, the backslash is the directory separator, even if the
forward slash can be used, too, at least since Windows NT.

This means that the paths `a/b` and `a\b` are equivalent, and
`fspathcmp()` needs to be made aware of that fact.

Note that we have to override both `fspathcmp()` and `fspathncmp()`, and
the former cannot be a mere pre-processor constant that transforms calls
to `fspathcmp(a, b)` into `fspathncmp(a, b, (size_t)-1)` because the
function `report_collided_checkout()` in `unpack-trees.c` wants to
assign `list.cmp = fspathcmp`.

Also note that `fspatheq()` does _not_ need to be overridden because it
calls `fspathcmp()` internally.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash-ll: merge with "hash.h"</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8a676bdc5c3a9377322834326605b3804c7c54bf'/>
<id>urn:sha1:8a676bdc5c3a9377322834326605b3804c7c54bf</id>
<content type='text'>
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split
out of hash.h to remove dependency on repository.h, 2023-04-22) to make
explicit the split between hash-related functions that rely on the
global `the_repository`, and those that don't. This split is no longer
necessary now that we we have removed the reliance on `the_repository`.

Merge "hash-ll.h" back into "hash.h". This causes some code units to not
include "repository.h" anymore, which requires us to add some forward
declarations.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir.c: always copy input to add_pattern()</title>
<updated>2024-06-05T16:51:42Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-06-04T10:13:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eed1fbe73b46e5c7f87d724f97fba6d8a4391fc9'/>
<id>urn:sha1:eed1fbe73b46e5c7f87d724f97fba6d8a4391fc9</id>
<content type='text'>
The add_pattern() function has a subtle and undocumented gotcha: the
pattern string you pass in must remain valid as long as the pattern_list
is in use (and nor do we take ownership of it). This is easy to get
wrong, causing either subtle bugs (because you free or reuse the string
buffer) or leaks (because you copy the string, but don't track ownership
separately).

All of this "pattern" code was originally the "exclude" mechanism. So
this _usually_ works OK because you add entries in one of two ways:

  1. From the command-line (e.g., "--exclude"), in which case we're
     pointing to an argv entry which remains valid for the lifetime of
     the program.

  2. From a file (e.g., ".gitignore"), in which case we read the whole
     file into a buffer, attach it to the pattern_list's "filebuf"
     entry, then parse the buffer in-place (adding NULs). The strings
     point into the filebuf, which is cleaned up when the whole
     pattern_list goes away.

But other code, like sparse-checkout, reads individual lines from stdin
and passes them one by one to add_pattern(), leaking each. We could fix
this by refactoring it to take in the whole buffer at once, like (2)
above, and stuff it in "filebuf". But given how subtle the interface is,
let's just fix it to always copy the string.

That seems at first like we'd be wasting extra memory, but we can
mitigate that:

  a. The path_pattern struct already uses a FLEXPTR, since we sometimes
     make a copy (when we see "foo/", we strip off the trailing slash,
     requiring a modifiable copy of the string).

     Since we'll now always embed the string inside the struct, we can
     switch to the regular FLEX_ARRAY pattern, saving us 8 bytes of
     pointer. So patterns with a trailing slash and ones under 8 bytes
     actually get smaller.

  b. Now that we don't need the original string to hang around, we can
     get rid of the "filebuf" mechanism entirely, and just free the file
     contents after parsing. Since files are the sources we'd expect to
     have the largest pattern sets, we should mostly break even on
     stuffing the same data into the individual structs.

This patch just adjusts the add_pattern() interface; it doesn't fix any
leaky callers yet.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Sync with 2.44.1</title>
<updated>2024-04-29T18:42:30Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2024-04-24T07:11:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1c00f92eb5ee4a48ab615eefa41f2dd6024d43bc'/>
<id>urn:sha1:1c00f92eb5ee4a48ab615eefa41f2dd6024d43bc</id>
<content type='text'>
* maint-2.44: (41 commits)
  Git 2.44.1
  Git 2.43.4
  Git 2.42.2
  Git 2.41.1
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  ...
</content>
</entry>
<entry>
<title>Sync with 2.42.2</title>
<updated>2024-04-19T10:38:50Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2024-04-10T20:04:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8e97ec3662a54b07e4c19bb761e95cf87bd54364'/>
<id>urn:sha1:8e97ec3662a54b07e4c19bb761e95cf87bd54364</id>
<content type='text'>
* maint-2.42: (39 commits)
  Git 2.42.2
  Git 2.41.1
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  t7423: add tests for symlinked submodule directories
  has_dir_name(): do not get confused by characters &lt; '/'
  ...
</content>
</entry>
</feed>
