gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Storage efficiency of revlibs


From: Mikhael Goikhman
Subject: Re: [Gnu-arch-users] Storage efficiency of revlibs
Date: Wed, 7 Dec 2005 22:49:03 +0000
User-agent: Mutt/1.4.2.1i

On 07 Dec 2005 14:05:51 +0100, Ludovic Courtès wrote:
> 
> Conclusion:
> 
>   On projects with a small fraction of modified files per revision, the
>   revlib technique yields a (slightly) better compression ratio than
>   tar+gz of each revlib.

This looks theoretically expected, but I am not sure you did enough
experimentation with your project to make such conclusion. Try this: go
to some advanced revision, say patch-300, and post two numbers, "du -s"
and the size of tar.gz of your revlib revision tree without ,,* files.
The numbers for archzoom tree are:

  % revision=archzoom--devel--0--patch-300
  % cd `tla library-find $revision`/..
  % tar cf - --exclude $revision/,,patch-set --exclude $revision/,,index \
    --exclude $revision/,,index-by-name $revision | gzip -9 >$revision.tar.gz
  % du -s --block-size=1 $revision
  % ls -s --block-size=1 $revision.tar.gz
  3403776 archzoom--devel--0--patch-300
  163840 archzoom--devel--0--patch-300.tar.gz

The ratio is 21. There is a small, but increasing gain when compared with
earlier revisions (18), in particular because {arch} contains a lot of
small files that are compressed nicely. Probably better than hardlinking.

> On a project with a certain number of files, most of which remain
> identical across revisions, revlib can achieve compression not
> achievable otherwise: it can compress /across/ revisions.

Please don't forget that a hardlink costs more than 0, and also that for
every merged external revision there are at least 2 more files, in {arch}
and ,,patch-log/, and possibly new subdirs too (not hardlink-able). So
any nice theory should be verified against the real numbers.

> The most efficient solution would consist in augmenting the revlib
> technique by gzipping each file individually.

For me (and for du/rm) it is not the size, but number of inodes that is
more important, so this very CPU expensive solution would not solve much.

> BTW, as Stefan noted, comparing cachedrevs and revlibs would only make
> sense if cachedrevs could be used as transparently as revlibs.

I argued against automatic revlib, but not against a user controllable
revlib. It is a reasonable _optional_ disk-space-for-speed optimization.

Regards,
Mikhael.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]