This edition recommends a Git extension tool for scanning, cleaning, and rewriting the commit history of large files in Git repositories.
introduce
git repo-clean is a Git extension tool developed with Golang that provides the ability to scan, clean, and rewrite commit records for large Git repositories.
The general flow of Git repository data filtering
Git itself provides two commands: git-fast-export and git-fast-import, which respectively export git repository data (.git/objects) as metadata in a specific format, stream read the metadata in this specific format, and generate a completed git repository. Any file that fits this format and is typed into git-fast-import will create a Git repository.
So the general process of git-clean-repo is as follows:
fast-export
|
| output stream
|
---> parser(blob, commit, reset, tag...)
|
|
|
---> filter(blob size, blob oid)
|
| input stream
|
---> fast-import
Dependent environment:
- Git >= 2.24.0 (必须)
- Golang >= 1.15 (可选)
Install
1 Install binary packages
Download link:https://gitee.com/oschina/git-repo-clean/releases/
Decompress the file to enter the pressurized directory, which contains the following files:
-rwxrwxr-x 1 git git 6.3M Dec 1 17:31 git-repo-clean.exe # Package (This is a package under Windows, similar to other platforms)
-rw-rw-r-- 1 git git 5.1K Dec 1 17:31 README.md # Usage document (this document)
drwxrwxr-x 3 git git 4.0K Dec 1 17:31 docs # Appendix document
-rw-rw-r-- 1 git git 9.6K Dec 1 17:31 LICENSE # licence
Instead of just clicking git-repo-clean.exe, you need to go through the following installation steps to use it.
2 Source code compilation installation package
This method requires your computer to have a basic make compilation environment, as well as a Golang environment
$ git clone https://gitee.com/oschina/git-repo-clean
# Go to the source directory and compile
$ cd git-repo-clean
$ make
# In the bin/ directory is the compiled package
- Linux environment
sudo cp git-repo-clean $(git --exec-path)
- Windows environment
Method 1: Put the PATH of the executable file git-repo-clean.exe into the $PATH path of the system. The general procedure is as follows: Click Windows [to] — > input path — – > select edit system environment variables — > select environment variable < N > – > select the path in the system variable (S) – > select new (N) — — > Copy the path that contains the git-repo-clean.exe file you just extracted into the newly created environment variable.
Method 2: Also copy git-repo-clean.exe to the git execution directory: cp git-repo-clean.exe $(git –exec-path). (Git may be installed in a directory on drive C and requires special permission to copy.)
Method 3: You can also copy the executable file git-repo-clean.exe directly to the C:\Windows\system32 directory. (This method is not recommended because it may cause damage to system files.)
- Mac OS environment
Similar to the operation on Linux. However, note that the configuration may not be executed on Mac OS and requires authorization by following the following methods: System Preferences -> Security & Privacy click Allow Anyway to allow:
After the installation, run the following command to check whether the installation is successful:
git repo-clean --version
Use
- Interactive usage
Enter git repo-clean to enter the interactive mode directly. This way, you can only use the default options because no parameters are added. In this mode, the default options are –scan, –delete, and –verbose. If you want to use other options, such as –branch, you can use the following options:
git repo-clean -i[--interactive]
Enter interactive mode with the -i option, which can append other options, such as git repo-clean -i –branch=topic
- Command line usage
git repo-clean --scan --limit=1G --type=tar.gz --number=1
In the warehouse, run the command line to scan the files in the current branch of the warehouse. The file size is at least 1G and the type is tar.gz
git repo-clean --scan --limit=1G --type=tar.gz --number=1 --delete
Add the –delete option to batch delete the files scanned by the current branch and rewrite the relevant commit history (including the HEAD).
If you want to clear the data of other branches or all branches, you can use the –branch option. For example, –branch=all can perform a full scan and clear the filtered data of all branches.
git repo-clean --scan --limit=1G --type=tar.gz --number=1 --delete --branch=all
With the –branch option, the files for all branches are scanned for deletion and the relevant commit history is rewritten.
