[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: Improving the performance of annotate
From: |
Graydon Hoare |
Subject: |
[Monotone-devel] Re: Improving the performance of annotate |
Date: |
Tue, 18 Jul 2006 21:48:10 -0700 |
User-agent: |
Thunderbird 1.5.0.4 (Windows/20060516) |
Daniel Carosone wrote:
my suggestion and preference here would be storing the roster details
in open sql, or at least caching the relevant details in sql. This is
similar to the 'per-file DAG' information discussed previously. The
idea of inventing and storing our own additional indexes when we
already have a storage layer with these capabilities just seems
incongruous.
Agreed. I spent some time last time I was traveling making some notes on
this and sketching out a picture, but never posted about it. I think a
sensible shift in the storage layer would involve:
- Storing only head rosters in full, no roster deltas.
- Storing, along with each revision (in fulltext), a symbolic
encoding of the changes-to-nodes-and-markings that the revision
implies. This would be done as a set of split-out SQL rows.
- The xdelta-reconstruction code would need to have its path-finding
code split out from its applicator-feeding code, so that we could
reuse the path-finding stuff to reconstruct arbitrary rosters from
"head roster" + "sequence of symbolic roster actions". But it's
totally doable.
This would attack most of our worst remaining bottlenecks:
- Turning off the manifest hash check during netsync would make
netsync a no-hash, no-xdelta operation; it would still spend a
bit of time applying revs to an in-memory roster and working out
markings, but the internal i/o costs could go away. Manifest checks
could be run offline / periodically for users who don't mind
relaxing a bit.
- Annotate, log, etc. could apply a restriction to head rosters
at first and then walk backwards through database queries
against the affected nodes alone, skipping the parsing
and hashing of intermediate rosters *and* skipping between
revs that actually affect the file(s) in question.
You'd effectively be getting "per file DAGs" out of this, since the
changes-to-markings *are* the DAG edges you care about.
I'm not sure how far off this scheme is from what the existing "per file
DAGs" work does; I've not had time to investigate it. But I think it's
completely compatible with how we do things now, and would not involve
changing network protocol or external formats. So I'm quite happy if
someone wants to spend time playing with it.
You're absolutely right that rosters are a "local cache"; they were
designed that way, and we expressly kept their format private-to-the-db
so that we could change it as efficiency demanded. Efficiency is now
demanding, and changing the format to suit is quite legitimate.
-graydon