git/git-filter-branch.sh, branch v2.6.2

filter-branch: avoid passing commit message through sed

2015-04-29T17:01:04Z

On some systems (like OS X), if sed encounters input without a trailing newline, it will silently add it. As a result, "git filter-branch" on such systems may silently rewrite commit messages that omit a trailing newline. Even though this is not something we generate ourselves with "git commit", it's better for filter-branch to preserve the original data as closely as possible. We're using sed here only to strip the header fields from the commit object. We can accomplish the same thing with a shell loop. Since shell "read" calls are slow (usually one syscall per byte), we use "cat" once we've skipped past the header. Depending on the size of your commit messages, this is probably faster (you pay the cost to fork, but then read the data in saner-sized chunks). This idea is shamelessly stolen from Junio. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

filter-branch: eliminate duplicate mapped parents

2014-07-01T15:30:41Z

When multiple parents of a merge commit get mapped to the same commit, filter-branch used to pass all instances of the parent commit to the parent and commit filters and to "git commit-tree" or "git_commit_non_empty_tree". This can often happen when extracting a small project from a large repository; merges can join history with no commits on any branch which affect the paths being retained. Once the intermediate commits have been filtered out, all the immediate parents of the merge commit can end up being mapped to the same commit - either the original merge-base or an ancestor of it. "git commit-tree" would display an error but write the commit with the normalized parents in any case. "git_commit_non_empty_tree" would fail to notice that the commit being made was in fact a non-merge commit and would retain it even if a further pass with "--prune-empty" would discard the commit as empty. Ensure that duplicate parents are pruned before the parent filter to make "--prune-empty" idempotent, removing all empty non-merge commits in a singe pass. Signed-off-by: Charles Bailey Signed-off-by: Junio C Hamano

Merge branch 'lc/filter-branch-too-many-refs'

2013-10-17T22:55:12Z

"git filter-branch" in a repository with many refs blew limit of command line length. * lc/filter-branch-too-many-refs: Allow git-filter-branch to process large repositories with lots of branches.

Allow git-filter-branch to process large repositories with lots of branches.

2013-09-12T18:00:51Z

A recommended way to move trees between repositories is to use git-filter-branch to revise the history for a single tree: However, this can lead to "argument list too long" errors when the original repository has many retained branches (>6k) /usr/local/git/libexec/git-core/git-filter-branch: line 270: /usr/local/git/libexec/git-core/git: Argument list too long Could not get the commits Saving the output from rev-parse and feeding it into rev-list from its standard input avoids this problem, since the rev-parse output is not processed as a command line argument. Signed-off-by: Lee Carver Signed-off-by: Junio C Hamano

write_index: optionally allow broken null sha1s

2013-08-29T03:54:43Z

Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories may have invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. Signed-off-by: Jeff King Reviewed-by: Jonathan Nieder Signed-off-by: Junio C Hamano

Merge branch 'jk/filter-branch-come-back-to-original'

2013-04-07T21:29:34Z

When used with "-d temporary-directory" option, "git filter-branch" failed to come back to the original working tree to perform the final clean-up procedure. * jk/filter-branch-come-back-to-original: filter-branch: return to original dir after filtering

filter-branch: return to original dir after filtering

2013-04-02T20:34:55Z

The first thing filter-branch does is to create a temporary directory, either ".git-rewrite" in the current directory (which may be the working tree or the repository if bare), or in a directory specified by "-d". We then chdir to $tempdir/t as our temporary working directory in which to run tree filters. After finishing the filter, we then attempt to go back to the original directory with "cd ../..". This works in the .git-rewrite case, but if "-d" is used, we end up in a random directory. The only thing we do after this chdir is to run git-read-tree, but that means that: 1. The working directory is not updated to reflect the filtered history. 2. We dump random files into "$tempdir/.." (e.g., if you use "-d /tmp/foo", we dump junk into /tmp). Fix it by recording the full path to the original directory and returning there explicitly. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

filter-branch: use git-sh-setup's ident parsing functions

2012-10-18T22:43:49Z

This saves us some code, but it also reduces the number of processes we start for each filtered commit. Since we can parse both author and committer in the same sed invocation, we save one process. And since the new interface avoids tr, we save 4 processes. It also avoids using "tr", which has had some odd portability problems reported with from Solaris's xpg6 version. We also tweak one of the tests in t7003 to double-check that we are properly exporting the variables (because test-lib.sh exports GIT_AUTHOR_NAME, it will be automatically exported in subprograms. We override this to make sure that filter-branch handles it properly itself). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

Merge branch 'jc/maint-filter-branch-epoch-date'

2012-07-22T19:55:05Z

In 1.7.9 era, we taught "git rebase" about the raw timestamp format but we did not teach the same trick to "filter-branch", which rolled a similar logic on its own. Because of this, "filter-branch" failed to rewrite commits with ancient timestamps. * jc/maint-filter-branch-epoch-date: t7003: add test to filter a branch with a commit at epoch date.c: Fix off by one error in object-header date parsing filter-branch: do not forget the '@' prefix to force git-timestamp

filter-branch: do not forget the '@' prefix to force git-timestamp

2012-07-10T03:42:54Z

For some reason, this script reinvents, instead of refactoring the existing one in git-sh-setup, the logic to grab ident information from an existing commit; it was missed when the corresponding logic in git-sh-setup was updated with 2c733fb (parse_date(): '@' prefix forces git-timestamp, 2012-02-02). Teach the script that it is OK to have a way ancient timestamp in the commits that are being filtered. Signed-off-by: Junio C Hamano