[Monotone-devel] monotone vs git storage efficiency

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] monotone vs git storage efficiency

From:	Bruce Stephens
Subject:	[Monotone-devel] monotone vs git storage efficiency
Date:	Wed, 01 Feb 2006 16:36:14 +0000
User-agent:	Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

On IRC, njs suggested

<njs`> (though in fact the rumor I hear is that git has the smallest space
       usage of anyone, now, after you pack.  I'm curious what algorithm they
       use.)

So I thought I'd give it a go.

So I used tailor to convert a (linear) subversion repository of 666
(coincidence; I didn't plan that) revisions of stuff I regularly do
from our work CVS repository (so it's a decent size, with largish
changesets because we use scripts to commit to CVS).  Then I did a
monotone pull into a new database of the corresponding branch from the
repository I use (which contains 672 revisions, because I committed
accidentally to the branch a few times---basically the same, though).

Anyway, after "git repack; git prune-packed", the .git directory is
20M.  The monotone (0.25) database is 35M.

For something we use raw CVS on, 890 revisions turned into 17M for
git, 31M for monotone.

tgz sizes for the checked out heads of the two are 14.5M and 14.9M
respectively.

That doesn't feel *so* bad, in terms of space.  Certainly not worth
panicking about, IMHO, at present, although the base64 thing is a bit
silly, so it would be good to get rid of that.  

I guess it *does* make git look really good: 889 changes in about 2M,
or 665 (larger) changes in 5M.

I can't see exactly how git orders deltas.  There's comments saying
that it prefers storing deltas of bigger files to smaller ones rather
than vice versa, which seems like a good heuristic, but I've no real
idea how you'd use that to produce a set of deltas that would let you
build any version of a file.  

"git repack" gets to see all the objects and gets a better chance of
optimality, I guess.  Maybe put the objects (corresponding to the same
thing---file, tree, or whatever) in some kind of tree, such that
things higher up are bigger; oh, more obvious: make a skiplist, only
sort the objects by size, not history?  

Tricky to do as you're building the history during normal commits and
things, but if you're willing to rebuild everything, and assume that
you can hold a couple of plaintexts in memory, and stuff, it ought to
be doable?  

Ah, monotone doesn't know how big files are---just their hashes---so
sorting by size would be too expensive.  Might be worth doing as an
experiment?  Or does netsync require that deltas go in one particular
direction: I could imagine problems if file deltas were between random
revisions and not particularly related to ancestry?  I'm not sure how
netsync works, though, so quite possibly any necessary deltas are
computed on the fly?

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] monotone vs git storage efficiency, Bruce Stephens <=

Prev by Date: Re: [Monotone-devel] Re: Bug in CRLF conversions
Next by Date: Re: [Monotone-devel] Re: Bug in CRLF conversions
Previous by thread: Re: [Monotone-devel] Re: Bug in CRLF conversions
Next by thread: [Monotone-devel] Default get_preferred_merge3_command -- converting to Cygwin paths
Index(es):
- Date
- Thread