[Monotone-devel] scalability

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] scalability

From:	graydon hoare
Subject:	[Monotone-devel] scalability
Date:	04 Nov 2003 10:52:36 -0500
User-agent:	Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

hi,

I've been playing with scalability and performance a bit more
recently, and have achieved a couple happy results: first, having
isolated the speed issue in cvs importing, I can pull in the gcc
repository without bringing my machine to its knees:

$ time ../monotone --db=import.db cvs_import ~/src/gcc-cvs/gcc-cvs/gcc/
monotone: [file branches: 65964] [tree branches: 9599] [versions: 306816] 
symlink-tree,v
monotone: phase 1 (version import) complete
monotone: [branches: 9597] [edges: 115264] [file branches: 65965] [tree 
branches: 9599] [versions: 306820]
monotone: phase 2 (ancestry reconstruction) complete

real    328m6.030s
user    282m7.100s
sys     4m54.270s

$ ../monotone --db=import.db db info
schema version  : f042f3c4d0a4f98f6658cbaf603d376acf88ff4b
full manifests  : 1
manifest deltas : 98543
full files      : 20385
file deltas     : 191994

$ find /media/src/gcc-cvs/gcc-cvs/gcc -type f | wc -l
  23248

so.. about 6 hours to import 90k tree versions, or a sustained 5 tree
states / second, with ~20000 manifest entries in each tree
state. still feels a bit slow when it's happening, but it can be done
in a workday now. flat profiles show RSA and SHA1 functions holding
most of the top slots now, so there's not much else I can do
speed-wise. perhaps another minor jump if I finish porting the p4
multiply8 implementation from msvc.

so, that's cool, I'm immensely happy. but what's *more* interesting is
that I played with some sqlite pager parameters a bit, and got this
result:

$ ls -lh import.db 
-rw-r--r--    1 graydon  graydon      648M Nov  4 06:54 import.db

$ du -skh /media/src/gcc-cvs/gcc-cvs/gcc
1.2G    /media/src/gcc-cvs/gcc-cvs/gcc

yup, by tweaking the pager parameters the database can be made to
occupy about *half* the space of the corresponding CVS repo (probably
just because the head versions are gzipped). this rule appears to
apply to any moderately large repo; libjava alone has the same
characteristic. it's surprising because I used to be following the CVS
size almost exactly, so we were being *very* wasteful with our sqlite
pages, probably overflowing a lot of page cells.

unfortunately I can't reuse the schema migration stuff to move from
one page size to another; either you got it or you don't. sqlite
refuses to even open a db with a different page size. given the
enormous space savings I'm tempted to commit this new setting, but
I'll need to implement the ascii "db dump" & "db load" commands, to
handle existing DBs, and document the change.

any objection? 

-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] scalability, graydon hoare <=

Prev by Date: [Monotone-devel] is monotone for me?
Next by Date: [Monotone-devel] monotone 0.7 test report
Previous by thread: [Monotone-devel] is monotone for me?
Next by thread: [Monotone-devel] monotone 0.7 test report
Index(es):
- Date
- Thread