<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/dir.h, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-08-19T00:17:31Z</updated>
<entry>
<title>dir: fix problematic API to avoid memory leaks</title>
<updated>2020-08-19T00:17:31Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2020-08-18T22:58:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eceba5321410bbcc9e0b11e6aa479832f574eca8'/>
<id>urn:sha1:eceba5321410bbcc9e0b11e6aa479832f574eca8</id>
<content type='text'>
The dir structure seemed to have a number of leaks and problems around
it.  First I noticed that parent_hashmap and recursive_hashmap were
being leaked (though Peff noticed and submitted fixes before me).  Then
I noticed in the previous commit that clear_directory() was only taking
responsibility for a subset of fields within dir_struct, despite the
fact that entries[] and ignored[] we allocated internally to dir.c.
That, of course, resulted in many callers either leaking or haphazardly
trying to free these arrays and their contents.

Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all.  However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear.  I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.

Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes.  I hope that a name
like "dir_clear()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that they need to look
for either a dir_clear() or a dir_free() and lead them to find
dir_clear().

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir: make clear_directory() free all relevant memory</title>
<updated>2020-08-19T00:17:29Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2020-08-18T22:58:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=dad4f23ce59339bb32ad9e1cc1682c696f7a724f'/>
<id>urn:sha1:dad4f23ce59339bb32ad9e1cc1682c696f7a724f</id>
<content type='text'>
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory.  However,
clear_directory() didn't free dir-&gt;entries or dir-&gt;ignored.  I believe
this was an oversight, but a number of callers noticed memory leaks and
started free'ing these.  Unfortunately, they did so somewhat haphazardly
(sometimes freeing the entries in the arrays, and sometimes only
free'ing the arrays themselves).  This suggests the callers weren't
trying to make sure any possible memory used might be free'd, but just
the memory they noticed their usecase definitely had allocated.

Fix this mess by moving all the duplicated free'ing logic into
clear_directory().  End by resetting dir to a pristine state so it could
be reused if desired.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/sparse-cone'</title>
<updated>2019-12-25T19:21:58Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-12-25T19:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bd72a08d6ce451e7d4725d6b3b411d482333e5cb'/>
<id>urn:sha1:bd72a08d6ce451e7d4725d6b3b411d482333e5cb</id>
<content type='text'>
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.

* ds/sparse-cone: (21 commits)
  sparse-checkout: improve OS ls compatibility
  sparse-checkout: respect core.ignoreCase in cone mode
  sparse-checkout: check for dirty status
  sparse-checkout: update working directory in-process for 'init'
  sparse-checkout: cone mode should not interact with .gitignore
  sparse-checkout: write using lockfile
  sparse-checkout: use in-process update for disable subcommand
  sparse-checkout: update working directory in-process
  sparse-checkout: sanitize for nested folders
  unpack-trees: add progress to clear_ce_flags()
  unpack-trees: hash less in cone mode
  sparse-checkout: init and set in cone mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: add 'cone' mode
  trace2: add region in clear_ce_flags
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add '--stdin' option to set subcommand
  sparse-checkout: 'set' subcommand
  clone: add --sparse mode
  sparse-checkout: create 'init' subcommand
  ...
</content>
</entry>
<entry>
<title>unpack-trees: hash less in cone mode</title>
<updated>2019-11-22T07:11:44Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-11-21T22:04:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=eb42feca974a333e58c2ca0f3cfa8bf0dd421402'/>
<id>urn:sha1:eb42feca974a333e58c2ca0f3cfa8bf0dd421402</id>
<content type='text'>
The sparse-checkout feature in "cone mode" can use the fact that
the recursive patterns are "connected" to the root via parent
patterns to decide if a directory is entirely contained in the
sparse-checkout or entirely removed.

In these cases, we can skip hashing the paths within those
directories and simply set the skipworktree bit to the correct
value.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-checkout: init and set in cone mode</title>
<updated>2019-11-22T07:11:44Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-11-21T22:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=af09ce24a9c79f6efc12d1d8f1052e1d1dbe5016'/>
<id>urn:sha1:af09ce24a9c79f6efc12d1d8f1052e1d1dbe5016</id>
<content type='text'>
To make the cone pattern set easy to use, update the behavior of
'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config
option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only
needs to supply a list of recursive folder matches. Git will
automatically add the necessary parent matches for the leading
directories.

When testing 'git sparse-checkout set' in cone mode, check the
error stream to ensure we do not see any errors. Specifically,
we want to avoid the warning that the patterns do not match
the cone-mode patterns.

Helped-by: Eric Wong &lt;e@80x24.org&gt;
Helped-by: Johannes Schindelin &lt;Johannes.Schindelin@gmx.de&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-checkout: use hashmaps for cone patterns</title>
<updated>2019-11-22T07:11:44Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-11-21T22:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=96cc8ab5318cd57c8bc203b8f064b35883b2386f'/>
<id>urn:sha1:96cc8ab5318cd57c8bc203b8f064b35883b2386f</id>
<content type='text'>
The parent and recursive patterns allowed by the "cone mode"
option in sparse-checkout are restrictive enough that we
can avoid using the regex parsing. Everything is based on
prefix matches, so we can use hashsets to store the prefixes
from the sparse-checkout file. When checking a path, we can
strip path entries from the path and check the hashset for
an exact match.

As a test, I created a cone-mode sparse-checkout file for the
Linux repository that actually includes every file. This was
constructed by taking every folder in the Linux repo and creating
the pattern pairs here:

	/$folder/
	!/$folder/*/

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following
performance:

    core.sparseCheckout=false: 0.21 s (0.00 s)
     core.sparseCheckout=true: 3.75 s (3.50 s)
 core.sparseCheckoutCone=true: 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent
in the first clear_ce_flags() call, according to the trace2
performance traces.

While this example is contrived, it demonstrates how these
patterns can slow the sparse-checkout feature.

Helped-by: Eric Wong &lt;e@80x24.org&gt;
Helped-by: Johannes Schindelin &lt;Johannes.Schindelin@gmx.de&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir: move doc to dir.h</title>
<updated>2019-11-18T06:21:28Z</updated>
<author>
<name>Heba Waly</name>
<email>heba.waly@gmail.com</email>
</author>
<published>2019-11-17T21:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=266f03eccd36b84c88206aa6d18b163223294229'/>
<id>urn:sha1:266f03eccd36b84c88206aa6d18b163223294229</id>
<content type='text'>
Move the documentation from Documentation/technical/api-directory-listing.txt
to dir.h as it's easier for the developers to find the usage information
beside the code instead of looking for it in another doc file.

Also documentation/technical/api-directory-listing.txt is removed because
the information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header files.

Signed-off-by: Heba Waly &lt;heba.waly@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/clean-nested-with-ignored'</title>
<updated>2019-10-11T05:24:46Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-11T05:24:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=aafb75452b2e9b3f17db3a07e9ed1cf77fdce693'/>
<id>urn:sha1:aafb75452b2e9b3f17db3a07e9ed1cf77fdce693</id>
<content type='text'>
"git clean" fixes.

* en/clean-nested-with-ignored:
  dir: special case check for the possibility that pathspec is NULL
  clean: fix theoretical path corruption
  clean: rewrap overly long line
  clean: avoid removing untracked files in a nested git repository
  clean: disambiguate the definition of -d
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  dir: add commentary explaining match_pathspec_item's return value
  dir: if our pathspec might match files under a dir, recurse into it
  dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
  dir: also check directories for matching pathspecs
  dir: fix off-by-one error in match_pathspec_item
  dir: fix typo in comment
  t7300: add testcases showing failure to clean specified pathspecs
</content>
</entry>
<entry>
<title>clean: avoid removing untracked files in a nested git repository</title>
<updated>2019-09-17T19:20:35Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-09-17T16:35:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=09487f2cbad36d5da42e204c9e0d201e8607acb8'/>
<id>urn:sha1:09487f2cbad36d5da42e204c9e0d201e8607acb8</id>
<content type='text'>
Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   &gt;tracked
   git add tracked
   git commit -m init
   &gt;untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   &gt;nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir: if our pathspec might match files under a dir, recurse into it</title>
<updated>2019-09-17T19:20:35Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-09-17T16:34:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=89a1f4aaf7650288b976c6022b2c5854950d52c6'/>
<id>urn:sha1:89a1f4aaf7650288b976c6022b2c5854950d52c6</id>
<content type='text'>
For git clean, if a directory is entirely untracked and the user did not
specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
not want to remove that directory and thus do not recurse into it.
However, if the user manually specified specific (or even globbed) paths
somewhere under that directory to remove, then we need to recurse into
the directory to make sure we remove the relevant paths under that
directory as the user requested.

Note that this does not mean that the recursed-into directory will be
added to dir-&gt;entries for later removal; as of a few commits earlier in
this series, there is another more strict match check that is run after
returning from a recursed-into directory before deciding to add it to the
list of entries.  Therefore, this will only result in files underneath
the given directory which match one of the pathspecs being added to the
entries list.

Two notes of potential interest to future readers:

  * If we wanted to only recurse into a directory when it is specifically
    matched rather than matched-via-glob (e.g. '*.c'), then we could do
    so via making the final non-zero return in match_pathspec_item be
    MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    and MATCHED_RECURSIVELY are important for such a change.)  I was
    leaving open that possibility while writing an RFC asking for the
    behavior we want, but even though we don't want it, that knowledge
    might help you understand the code flow better.

  * There is a growing amount of logic in read_directory_recursive() for
    deciding whether to recurse into a subdirectory.  However, there is a
    comment immediately preceding this logic that says to recurse if
    instructed by treat_path().   It may be better for the logic in
    read_directory_recursive() to ultimately be moved to treat_path() (or
    another function it calls, such as treat_directory()), but I have
    left that for someone else to tackle in the future.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
