<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/git-filter-branch.sh, branch v2.6.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.6.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.6.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2015-04-29T17:01:04Z</updated>
<entry>
<title>filter-branch: avoid passing commit message through sed</title>
<updated>2015-04-29T17:01:04Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2015-04-29T15:48:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=df0620108b9710a06d5a2d9c125d43b97590cce6'/>
<id>urn:sha1:df0620108b9710a06d5a2d9c125d43b97590cce6</id>
<content type='text'>
On some systems (like OS X), if sed encounters input without
a trailing newline, it will silently add it. As a result,
"git filter-branch" on such systems may silently rewrite
commit messages that omit a trailing newline. Even though
this is not something we generate ourselves with "git
commit", it's better for filter-branch to preserve the
original data as closely as possible.

We're using sed here only to strip the header fields from
the commit object. We can accomplish the same thing with a
shell loop. Since shell "read" calls are slow (usually one
syscall per byte), we use "cat" once we've skipped past the
header. Depending on the size of your commit messages, this
is probably faster (you pay the cost to fork, but then read
the data in saner-sized chunks). This idea is shamelessly
stolen from Junio.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>filter-branch: eliminate duplicate mapped parents</title>
<updated>2014-07-01T15:30:41Z</updated>
<author>
<name>Charles Bailey</name>
<email>cbailey32@bloomberg.net</email>
</author>
<published>2014-06-30T21:20:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=79bc4ef3686bc6795aa79a1d4aa6d3060a2cbd93'/>
<id>urn:sha1:79bc4ef3686bc6795aa79a1d4aa6d3060a2cbd93</id>
<content type='text'>
When multiple parents of a merge commit get mapped to the same
commit, filter-branch used to pass all instances of the parent
commit to the parent and commit filters and to "git commit-tree" or
"git_commit_non_empty_tree".

This can often happen when extracting a small project from a large
repository; merges can join history with no commits on any branch
which affect the paths being retained.  Once the intermediate
commits have been filtered out, all the immediate parents of the
merge commit can end up being mapped to the same commit - either the
original merge-base or an ancestor of it.

"git commit-tree" would display an error but write the commit with
the normalized parents in any case.  "git_commit_non_empty_tree"
would fail to notice that the commit being made was in fact a
non-merge commit and would retain it even if a further pass with
"--prune-empty" would discard the commit as empty.

Ensure that duplicate parents are pruned before the parent filter to
make "--prune-empty" idempotent, removing all empty non-merge
commits in a singe pass.

Signed-off-by: Charles Bailey &lt;cbailey32@bloomberg.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'lc/filter-branch-too-many-refs'</title>
<updated>2013-10-17T22:55:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2013-10-17T22:55:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f52752d36a98d0e5a70242d615d0f36f84936b45'/>
<id>urn:sha1:f52752d36a98d0e5a70242d615d0f36f84936b45</id>
<content type='text'>
"git filter-branch" in a repository with many refs blew limit of
command line length.

* lc/filter-branch-too-many-refs:
  Allow git-filter-branch to process large repositories with lots of branches.
</content>
</entry>
<entry>
<title>Allow git-filter-branch to process large repositories with lots of branches.</title>
<updated>2013-09-12T18:00:51Z</updated>
<author>
<name>Lee Carver</name>
<email>Lee.Carver@servicenow.com</email>
</author>
<published>2013-09-10T22:55:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3361a548dbedde96d75bd4134e9ab9e6d82774dd'/>
<id>urn:sha1:3361a548dbedde96d75bd4134e9ab9e6d82774dd</id>
<content type='text'>
A recommended way to move trees between repositories is to use
git-filter-branch to revise the history for a single tree:

However, this can lead to "argument list too long" errors when the
original repository has many retained branches (&gt;6k)

    /usr/local/git/libexec/git-core/git-filter-branch: line 270:
    /usr/local/git/libexec/git-core/git: Argument list too long
    Could not get the commits

Saving the output from rev-parse and feeding it into rev-list from
its standard input avoids this problem, since the rev-parse output
is not processed as a command line argument.

Signed-off-by: Lee Carver &lt;Lee.Carver@servicenow.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>write_index: optionally allow broken null sha1s</title>
<updated>2013-08-29T03:54:43Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2013-08-27T20:41:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=83bd7437ca6acbee0db431fc8ec7cf823d9459ec'/>
<id>urn:sha1:83bd7437ca6acbee0db431fc8ec7cf823d9459ec</id>
<content type='text'>
Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) added a safety check preventing git from writing
null sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).

Some existing repositories may have invalid trees that
contain null sha1s already, though.  Until 4337b58, a common
way to clean this up would be to use git-filter-branch's
index-filter to repair such broken entries.  That now fails
when filter-branch tries to write out the index.

Introduce a GIT_ALLOW_NULL_SHA1 environment variable to
relax this check and make it easier to recover from such a
history.

It is tempting to not involve filter-branch in this commit
at all, and instead require the user to manually invoke

	GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

to perform an index-filter on a history with trees with null
sha1s.  That would be slightly safer, but requires some
specialized knowledge from the user.  So let's set the
GIT_ALLOW_NULL_SHA1 variable automatically when checking out
the to-be-filtered trees.  Advice on using filter-branch to
remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on
recent versions of git.

Further commands that touch the index will still notice and
fail, unless they actually remove the broken entries.  A
filter-branch whose filters do not touch the index at all
will not error out (since we complain of the null sha1 only
on writing, not when making a tree out of the index), but
this is acceptable, as we still print a loud warning, so the
problem is unlikely to go unnoticed.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Reviewed-by: Jonathan Nieder &lt;jrnieder@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/filter-branch-come-back-to-original'</title>
<updated>2013-04-07T21:29:34Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2013-04-07T21:29:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9a11f13d9e5cfa1db0a3f42bc9ee886ffa7a5a22'/>
<id>urn:sha1:9a11f13d9e5cfa1db0a3f42bc9ee886ffa7a5a22</id>
<content type='text'>
When used with "-d temporary-directory" option, "git filter-branch"
failed to come back to the original working tree to perform the
final clean-up procedure.

* jk/filter-branch-come-back-to-original:
  filter-branch: return to original dir after filtering
</content>
</entry>
<entry>
<title>filter-branch: return to original dir after filtering</title>
<updated>2013-04-02T20:34:55Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2013-04-02T14:22:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=97276019bb20829c97528b53dc453a37177c35bb'/>
<id>urn:sha1:97276019bb20829c97528b53dc453a37177c35bb</id>
<content type='text'>
The first thing filter-branch does is to create a temporary
directory, either ".git-rewrite" in the current directory
(which may be the working tree or the repository if bare),
or in a directory specified by "-d". We then chdir to
$tempdir/t as our temporary working directory in which to run
tree filters.

After finishing the filter, we then attempt to go back to
the original directory with "cd ../..". This works in the
.git-rewrite case, but if "-d" is used, we end up in a
random directory. The only thing we do after this chdir is
to run git-read-tree, but that means that:

  1. The working directory is not updated to reflect the
     filtered history.

  2. We dump random files into "$tempdir/.." (e.g., if you
     use "-d /tmp/foo", we dump junk into /tmp).

Fix it by recording the full path to the original directory
and returning there explicitly.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>filter-branch: use git-sh-setup's ident parsing functions</title>
<updated>2012-10-18T22:43:49Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2012-10-18T10:33:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=3c730fab2cae1bb52d34620af170a628b3b8c537'/>
<id>urn:sha1:3c730fab2cae1bb52d34620af170a628b3b8c537</id>
<content type='text'>
This saves us some code, but it also reduces the number of
processes we start for each filtered commit. Since we can
parse both author and committer in the same sed invocation,
we save one process. And since the new interface avoids tr,
we save 4 processes.

It also avoids using "tr", which has had some odd
portability problems reported with from Solaris's xpg6
version.

We also tweak one of the tests in t7003 to double-check that
we are properly exporting the variables (because test-lib.sh
exports GIT_AUTHOR_NAME, it will be automatically exported
in subprograms. We override this to make sure that
filter-branch handles it properly itself).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jc/maint-filter-branch-epoch-date'</title>
<updated>2012-07-22T19:55:05Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2012-07-22T19:54:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9a0231b395cb9720365b7066312eeaa86e37ed31'/>
<id>urn:sha1:9a0231b395cb9720365b7066312eeaa86e37ed31</id>
<content type='text'>
In 1.7.9 era, we taught "git rebase" about the raw timestamp format
but we did not teach the same trick to "filter-branch", which rolled
a similar logic on its own.  Because of this, "filter-branch" failed
to rewrite commits with ancient timestamps.

* jc/maint-filter-branch-epoch-date:
  t7003: add test to filter a branch with a commit at epoch
  date.c: Fix off by one error in object-header date parsing
  filter-branch: do not forget the '@' prefix to force git-timestamp
</content>
</entry>
<entry>
<title>filter-branch: do not forget the '@' prefix to force git-timestamp</title>
<updated>2012-07-10T03:42:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2012-07-09T23:53:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=cb102b083252b575b18afa8d4b4a4b1bc1ffdf9e'/>
<id>urn:sha1:cb102b083252b575b18afa8d4b4a4b1bc1ffdf9e</id>
<content type='text'>
For some reason, this script reinvents, instead of refactoring the
existing one in git-sh-setup, the logic to grab ident information
from an existing commit; it was missed when the corresponding logic
in git-sh-setup was updated with 2c733fb (parse_date(): '@' prefix
forces git-timestamp, 2012-02-02).

Teach the script that it is OK to have a way ancient timestamp in
the commits that are being filtered.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
