gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: Future of GNU Arch, bazaar and bazaar-ng ... ?


From: Martin Langhoff
Subject: Re: [Gnu-arch-users] Re: Future of GNU Arch, bazaar and bazaar-ng ... ?
Date: Tue, 23 Aug 2005 07:42:14 +1200

On 8/22/05, Bruce Stephens <address@hidden> wrote:
> I've read some comments (well, gossip, really) that git's use of large
> numbers of files was causing operational problems on
> kernel.org---specifically that the daily backup took more than 24
> hours.  (If I understand correctly, it stores each file separately,
> and each change to each file separately.  As a contrast, subversion's
> fsfs seems to use one or two files per revision.)
...
> Is that actually a problem in reality, or is it a mostly bogus attack?

It was an early problem, quickly addressed. 

git has 2 storage modes. Initially, when you commit (or you do an
import from cvs) each version of each file is stored as a different,
complete file. Enormously fast, but a huge number of files on disk -
and kills mirroring processes as you say. For example, Moodle
(http://moodle.org) import went from ~200MB in CVS to 600MB in git. I
think it is about 1800 commits on a 35MB tree.

But git also has the 'pack' format, which is indexed & compressed.
After I do git-repack && git-prune-packed it's down to 75MB in just a
handful of pack files. The pack files are gzipped (or bzipped, I
forget), and the deltas are reordered if it makes sense. It is
arguably slower, but I haven't been able to perceive the difference
myself.

The concept is that you work-commit for a while, and perhaps once a
month you git-repack: 1 month old commits are unlikely to be
manipulated directly (undoing commits, etc) and somewhat less likely
to be queried (so the performance impact is minimized).

For reference: the Moodle import spans several (4?) years. Just 6 few
months of history of this project take about 1.6GB as an arch library.
The 75MB packs contain the whole project history with all the branches
ever in existence.

cheers,


martin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]