You're talking about things I haven't done yet
I've been converting all my Mercurial repositories to Git. One of the motivations for this was hg's poor handling of branches and tags: branches are just another metadata field on a commit, and tags are entries in a text file called .hgtags that is tracked in the repository(!). Git has its flaws as well, but I prefer its view that branches and tags are just refs, which are pointers that exist at the repository level and are pushed or pulled around in much the same way as the objects they refer to.
The excellent hg-to-git.py script included in contrib created Git tags corresponding to my old Mercurial tags, but of course didn't modify the actual commits. So I still had a .hgtags file in my Git repository, and a boilerplate commit of the form "Add tag foo for changeset bar" each time I added to it (recall that .hgtags is just another file. Thankfully, I never had to deal with two branches where I added different tags...). I wanted to remove these. Git has a 'filter-branch' command that can totally rewrite or expunge commits; this is of course a horrible idea for already published code, but there's no harm in using while initially preparing a repository.
While I appreciate git's object model and speed, I must agree with its detractors that the user interface is terrible. It took some tinkering to get git-filter-branch to do what I wanted, so I'm writing this to save my notes for next time (and in case someone else is searching for how to do this). Here's the command I arrived at:
git filter-branch \
--tag-name-filter cat \
--index-filter 'git update-index --remove .hgtags'
--commit-filter \
'if [ $# = 3 ] && git diff-tree --quiet $1 $3; then
skip_commit "$@"
else
git commit-tree "$@"
fi' \
HEAD
The tag name filter is always necessary if you want tags to be updated to point to the corresponding commits on the new, rewritten branch. I consider this a UI failure -- when a branch is rewritten, the ref is modified, and the old one moved to refs/original. Tags, on the other hand, stay where they are, without any indication on the new branch that this is where you might want to move that old tag and sign it again or whatever. IMHO they ought to be handled the same as branches.
The index filter is simply an efficient way of removing the unwanted file from all commits. This and the tag filter are both covered in the manual page.
Writing a commit filter is a little more obscure. After .hgtags is removed from the index, we may end up at one of those useless "Added tag foo" commits and have no changes to record in the commit. By default, of course, filter-branch still records these -- the commit message might be useful, or something. But I want to suppress them.
The commit filter is called with a tree -- you're at the point between write-tree and commit-tree (I recommend Git from the bottom up if you're confused here.) It gets that tree ($1), and then "-p PARENT" for each parent, just like commit-tree. So, if this is a normal commit with one parent, there will be 3 arguments. (If there's only one argument, there is no parent, i.e., the first commit, and if there are more, then it's a merge.) This is the only case we want to mess with. If there are no changes between our tree and the parent's tree, then it's one of those no-op commits, and we can skip it (skip_commit, a shell function defined by filter-tree, uses some deep magic to hand us the original parent again next time).
I think diffing the index and the parent would work as well, but this seemed clearer. It still feels like a hack, so I'd love to hear from anyone who can suggest improvements. Since this is a special case, maybe it's better off being implemented in hg-to-git.py itself. There's always more than one way to do it.
Update: Teemu Likonen points out that the next version of Git (1.6.2, not yet in unstable) will have a --prune-empty option which makes this particular problem totally trivial. I am starting to get the feeling that the Git developers are all reading our minds... :-)
