monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: [sqlite] disk locality (and delta storage)


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: [sqlite] disk locality (and delta storage)
Date: Thu, 16 Feb 2006 20:28:50 -0800
User-agent: Mutt/1.5.11

On Wed, Feb 15, 2006 at 09:12:25PM -0500, address@hidden wrote:
> Nathaniel Smith <address@hidden> wrote:
> > 
> > Right now
> > we do backwards linear delta chaining, so we always have the latest
> > version of everything stored in full, and then it takes O(n) time to
> > fetch a version n steps back in history.  While theoretically
> > problematic, this actually has caused zero complaints; people in
> > practice always check out the latest head... However, it is causing us
> > problems when synchronizing databases over the network, because when
> > synchronizing you want to send _forward_ deltas. 
> 
> Can you store forward deltas for the benefit of netsync, but
> also keep a cache of the N most recently accessed files (which
> will typically be the ends of the chains) or perhaps just the
> latest files in each chain, and thus avoid the O(n) deltas
> that would otherwise be required to checkout the head?

Hrm.  The downsides are probably that 
  -- people access stuff near the head as well (e.g., running "diff"
     on the latest few revisions, or running plain "diff" or "commit"
     in a workspace whose base revision is not the head).
  -- it makes performance unpredictable -- sometimes you go fast,
     sometimes slow, and the user can't tell why.
But, it certainly wouldn't be terribly hard to implement.

Our current plan is to try out a few different things and get some
idea for how the tradeoffs actually work:
   http://venge.net/monotone/wiki/DeltaStorageStrategies/ShootOut
I guess such caching schemes are something to keep in mind as an
option for the simple forward chaining schemes, when weighing up the
costs and balances once we have more data?  There is a theory which
suggests that a revlog-based system will just kick everything else's
butt no matter how optimized, but perhaps not, and single-file is a
lot to give up if the win is only minor...

Also, thanks, without writing this reply I probably wouldn't have
thought to put a test on that page that exercises accessing "near
head" revisions :-) (I.e., the 'log --diffs --last=20' I just
added to the list of things to measure.)

-- Nathaniel

-- 
The Universe may  /  Be as large as they say
But it wouldn't be missed  /  If it didn't exist.
  -- Piet Hein




reply via email to

[Prev in Thread] Current Thread [Next in Thread]