post-thumb

How to Delete Large Files From Git History

If you develop a game and save codes and assets on Github without using Git FS, your repo’s size will become bigger and bigger when you make any changes and push them to Github. Git History will record all the changes for codes, images, audio files, or even binary files.

This could make the building time of games become longer and delay your work process because your CI/CD Pipeline needs to spend a lot of time downloading games and assets from Github.

How to clean up those large files in Git History?

In this post, we will provide a practical example to resolve this problem.

Step 1: Check the Size of Git History

du -hs .git/objects

Step 2: Create a script to find larges files

Create git_find_big.sh script

vim git_find_big.sh

#!/bin/bash
#set -x

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
	# extract the size in bytes
	size=$((`echo $y | cut -f 5 -d ' '`/1024))
	# extract the compressed size in bytes
	compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
	# extract the SHA
	sha=`echo $y | cut -f 1 -d ' '`
	# find the objects location in the repository tree
	other=`git rev-list --all --objects | grep $sha`
	#lineBreak=`echo -e "\n"`
	output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

Step 3: Find Top 10 Large files

./git_find_big.sh

Step 4. Install git-filter-repo

Run

pip3 install git-filter-repo

Step 5: Remove large files from the .git history

You can get the Top 10 large file names from Step 3.

Then run below to delete it in .git history.

git filter-repo --force --path <large file path> --invert-paths

This command will rewrite your .git history and delete it.


Step 6: Clean Unused Fils

git reflog expire --expire=now --all

git gc --prune=now

Step 7. Add Git Remote URL and Push Back the Result

# Add Your Github URL
git remote add origin <Your Github URL>

# Push the result to all the branches
git push --all --force

# Push the result to origin repo
git push -u origin --all
git push -u origin --tags -f

Conclusion

We provide a solution to resolve the problem which is your repo’s size might become too big as your game has more and more rich content. Remember you should use Git FS to save content.

If not, don’t forget to regularly check whether or not there are some large files in Git History. Making your game codes as small as possible not only can boost work efficiency, but also provide high quality of games on time and on budget. Because it’s easy to maintain codes and add new features with less efforts.

You might be interested in

How to implement Singed Cookies in Unity with CloudFront? 

How to reverse engineer C# and Unity3D Games??