I’ve just converted Netbeans/main Mercurial repository to Git using fast-export:
git clone http://repo.or.cz/r/fast-export.git mkdir netbeans-git cd netbeans-git git init ~/fast-export/hg-fast-export.sh -r ~/netbeans-hg
The conversion lasts more than 24h. Using Python cProfile I’ve found that all measurable time was spent into the patch extraction of Mercurial. I did not investigate more but here are 2 hypothesis:
- Mercurial is not designed to work on large source tree.
- fast-export is not using Mercurial in the most efficient way.
Having such huge repositories in both Mercurial and Git is a good opportunity to measure how much Mercurial is slower than Git. For the following tests, I’m using Git 1.7.4.1 and Mercurial 1.7.5. I chose two commands:
status
which shows how the tool scale with large source tree.log
which shows how the tool scale with a large number of commit.
The Netbeans/main repository repository contains more than 190000 commits and the current source tree contains 90519 files.
Here is the Git test script:
echo REPO SIZE du -hs .git echo STATUS TIMING #Clear disk cache sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' bash -c "time git status" echo LOG TIMING #Clear disk cache sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' bash -c "time git log > /dev/null"
And the Mercurial one:
echo REPO SIZE du -hs .hg echo STATUS TIMING #Clear disk cache sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' bash -c "time hg status" echo LOG TIMING #Clear disk cache sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' bash -c "time hg log > /dev/null"
And here is the result:
Git Mercurial ----------------- ----------------- REPO SIZE REPO SIZE 613M .git 2,7G .hg STATUS TIMING STATUS TIMING real 0m17.240s real 0m44.854s user 0m0.632s user 0m2.944s sys 0m1.400s sys 0m1.880s LOG TIMING LOG TIMING real 0m10.798s real 1m1.934s user 0m4.236s user 0m48.823s sys 0m0.384s sys 0m1.848s
Looking at real/user/sys values we can see that Mercurial is doing much more disk access in the status command, so
here the performance problem doesn’t come from Python. On the other hand the Mercurial log command time is almost entirely spent in the CPU, which is a more expected behavior.
Here are two interesting (yet a bit old) articles about Git vs Mercurial comparison: