There is a common saying around version control systems stating the following:

Do not rewrite the history.

And it is pretty solid saying to be fair, supported at many threads, for instance at FS#45425 or elsewhere. You simply have to assume that once you pushed something into the public, it should stay as it is. But there are some times when cleaning the mess is not just required, but may play out well in the long term.

Rewriting history will almost definitely not be possible for some public projects with commits added on top of yours. But for a very freshly created repositories somewhere in the forgotten parts of the Internet, the leap of faith might be worth taking. For not-pushed work, it is almost always safe and very encouraged to do the cleaning, so knowing the efficient tools to get the job done is essential. Also, even better is to know the tool that gets the job done while not placing the user at the risk of unrecoverable damage in the form of mangled history.

Getting started

There is one more saying that is especially relevant in this context, which states:

Always keep multiple backups.

This saying cuts even deeper. Nothing is 100% reliable. Before continuing, back up your work. Some software has a proven history to be battle tested, usually meaning that the edge cases were polished to the point they are not visible, but you can bet on the fact that Murphy will always get you. You have been warned. The tool we take a look at is newren/git-filter-repo.

Beware: Using the tool can lead to catastrophic scenarios if used incorrectly.

The tool is encouraged to be used only on the fresh clones to make sure the work is recoverable in case of a disaster. Try to avoid using the --force parameter at all costs to prevent data loss.

If unsure, instead use --dry-run or --analyze along with the actual command to inspect the changes before doing them. Now lets look at some of the use cases of the tool.

Replace sensitive string in all files

The most common use case for rewriting git history is probably removing sensitive information such as passwords or access tokens checked in by accident. It is not enough to just replace all occurrences in the current index, because the information might still be present in earlier commits. Doing this manually via interactive rebase is time-consuming and error prone. Instead, this command can be used:

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx')

The reason for the <( ... ) syntax denoting an I/O redirection is that the --replace-text argument originally requires a file descriptor with as many key-value pairs as needed. With the above syntax, one can skip creating a file altogether. Useful when only a single replace is needed.

This is where usually the use case for the filter-repo tool ends. It is also quite hard to remember due to used shell intricacies and the uncommon syntax requiring a long double-arrow symbol ==>, so you probably end up searching this up every time the need arises. But there is much more one can do, so lets look at some less documented features I found scattered around the internet.

Remove a single folder, keeping history

Scenario, where a repository has a folder that has to be taken out of it, leaving no traces in history:

git-filter-repo --path path_to_the_folder/ --invert-paths

Now the repository has no trace of the tracked files inside ./path_to_the_folder/. Beware that all the untracked files are preserved while tracked files are completely delete. If all the files in the folder are tracked, then the empty folder will be deleted as well.

Extract a single folder, keeping history

The opposite is even simpler with one less parameter. When you want to extract commit history of a single folder, omitting every other file:

git-filter-repo --path path_to_the_folder/

The repository now contains only the ./path_to_the_folder/ and all other files that are untracked.

Move everything from sub-folder one level up

This goes very well together with the above command. After extraction, sometimes you need to make the contents of the extracted folder the root of the repository, shifting everything one level up in the path:

git-filter-repo --path-rename path_to_the_folder/:

Note the colon character : at the end. The repository now no longer contains ./path_to_the_folder/ and instead you will look at contents on that folder directly.

Replace email address in commits

This is a little bit different from the above commands, but sometimes you made commits with a wrong email address. This can be fixed by creating a file named .mailmap in the desired repository with the following contents:

<new@email> <curent@email>

Note that angle brackets < and > around both email addresses are mandatory, otherwise the following error happens:

Unparseable mailmap file: line #1 is bad: ...

With the properly formatted .mailmap file in place, issue the rewrite command:

git-filter-repo --use-mailmap

Even though changing email address in commits seems like an innocent change, it too changes commit SHA hashes, as they are computed with the authors email address in mind.

Replace author's name in commits

A variation of the above is to replace the author's name. I have not used this personally, but I can think of a situation of using a nickname for commits you just want to make public or the opposite scenario, where you made commits with your true identity, but you want to show off just using a nickname. All the steps are identical, with a tweak to the .mailmap file:

Name Surname <current@email> <current@email>

And again, run the following:

git-filter-repo --use-mailmap

You can obviously also combine changing both the author and the email in the same step:

Name Surname <new@email> <current@email>

Note that the author's name will be only changed for the commits that match current@email so this is something to keep in mind!

Checking the changes

After you've done your changes, it is always safe to check if everything went right. One way of doing so is to use the git inspection GUI to inspect all branches:

gitk --all

If GUI is not available, this command could serve as a base for the endeavor:

git log --graph --all --format='%h %an <%ae>'

Tweak the above if needed.

Conclusion

The git-filter-repo is a very versatile tool that can do many actions with just one line. It is the official preferred way of rewriting git history. Most of the time you find yourself using it for removing sensitive information such as passwords, but most other actions needed for a repository clean-up are possible, when you know the right syntax. Remember to keep the backups, do not rewrite public repositories unless absolutely necessary and keep your repositories clean. Enjoy!