monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: netsync question


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: netsync question
Date: Sat, 29 Apr 2006 05:32:58 -0700
User-agent: Mutt/1.5.11

On Sat, Apr 29, 2006 at 01:37:04PM +0200, Markus Schiltknecht wrote:
> On Sat, 2006-04-29 at 12:15 +0100, Bruce Stephens wrote:
> > Well, you couldn't use *only* backward deltas.  If you've got
> > A->B->C->D and you want to send that to someone who's got A, it's no
> > good just sending the backward deltas D->C, C->B, B->A.  
> 
> Uhm.. why not? Couldn't this 'someone' just apply the backward deltas
> with a 'patch -R' sort of? I begin to question my understanding of a
> 'delta'...

There are basically two styles of delta compression in common use.

The one everyone knows is the sort that "diff" and "patch" handle,
which in the jargon are known as "insert/delete deltas".  What this
means is, they encode a string as a base, plus a list of things to
remove, and things to add.  You can trivially "flip" a delta like that
around -- just turn all the "deletes" into "adds" and all the "adds"
into "deletes".  Such deltas are very expensive to compute (optimal
algorithms exist, but they are very super-linear), so they're best
used for things where they're really suited -- in particular,
they're the Right Thing to use for displaying diffs, and for
calculating textual merges.

The one that is more useful for us is the sort that "xdelta" produces.
These are known as "copy/insert" deltas.  Basically what they are is a
script written in a very simple language, whose two commands are "copy
this byte range from the base version into the output", and "insert
the following bytes into the output".  These are smaller, because
they don't have to include any data that was left out of the new
version, and because they don't have to preserve order.  For example,
if I had a file that looked like AAAABBBB and I changed it to
BBBBAAAA, then my delta can simply have two tiny "copy" instructions.
With a diff, I'd have to list a bunch of the moved lines as if they
were completely new.  In addition, these kinds of deltas are much
cheaper to compute (xdelta takes linear time and space), and it's very
fast to do things like combine a bunch of them together.

But, since they simply leave out any data that was removed from the
old version, there's no way to take such a delta and "flip it around"
like "patch -R" does.

Basically, the first sort is good for representing "what changed".
The second sort is good when what you basically just have a bunch of
files you want to store, that you happen to know have a lot in common
with each other, and you want to apply really really good lossless
compression.

> > You could send the full text of D, and the backward deltas.  Or, I
> > guess, send the constructed A->D and the backward deltas.
> > 
> > That would have the disadvantage that it would be an all-or-nothing
> > operation: if the connection died, or there was some verification
> > error, then you'd end up with nothing.
> 
> ...except if you send the fulltext of D. Which is acceptable for initial
> pulls. In case of a subsequent pull, I guess 'all-or-nothing' is
> acceptable.

We used to have a special case for the initial pull so it would do
this.  It was sort of brokenly implemented, and a bit useless in the
general case anyway.  (The initial pull problem reappears as the "I was
on vacation for a week and now I need the pull the 500 new revisions
that showed up while I was gone" problem, and such hacks don't help so
much for that.)  So possibly worth considering, but maybe better to
consider more general solutions first.

> Plus it would allow to fetch only the most recent history. Often people
> don't care about all the history and only want the most recent revision.
> Monotone should then be able to gracefully handle missing revisions and
> offer to fetch them when needed.

This kind of "partial pull" functionality would indeed be really nice,
but it's _way_ more complicated than just adding some new delta
transmission code to netsync :-).  Definitely worth doing at some
point, but it's a major new piece of functionality, maybe not
something to chase off after while the stuff we _do_ have working is
too slow.

Hope that helps,
-- Nathaniel

-- 
"Lull'd in the countless chambers of the brain,
Our thoughts are link'd by many a hidden chain:
Awake but one, and lo! what myriads rise!
Each stamps its image as the other flies"
  -- Ann Ward Radcliffe, The Mysteries of Udolpho




reply via email to

[Prev in Thread] Current Thread [Next in Thread]