monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] merge weirdness...


From: Nathaniel Smith
Subject: Re: [Monotone-devel] merge weirdness...
Date: Wed, 27 Apr 2005 03:54:14 -0700
User-agent: Mutt/1.5.9i

On Tue, Apr 26, 2005 at 11:20:57PM +0200, Richard Levitte - VMS Whacker wrote:
> In message <address@hidden> on Tue, 26 Apr 2005 23:06:02 +0200 (CEST), 
> Richard Levitte - VMS Whacker <address@hidden> said:
> 
> richard> Funny thing is, 'monotone lca' seems to get it right:
> richard> 
> richard>   : ; monotone lca 19342b164e493e3da5de22985e38d9c94a664dcd 
> b34d090b64d6a6c48db0583bc210b448142674d4
> richard>   f188f1efa53fc5e1f7aa1313f5d5af8577a0dac9
> 
> I've now noticed 'monotone lcad', which gives that age old revision as
> an answer:
> 
>   : ; monotone lcad 19342b164e493e3da5de22985e38d9c94a664dcd 
> b34d090b64d6a6c48db0583bc210b448142674d4
>   cdb64688be75821fe5e8187e8ca815cba25d4e85
> 
> Still, I don't currently understand why it's wandering that far back.
> What's the thought behind this behavior?

For merging, monotone does _not_ use the closest common ancestor.  The
reason is that doing so could silently corrupt your code.

The simplest example is the famous "criss-cross merge":
      A
     / \
    B   C
    |\ /|
    | X |
    |/ \|
    D   E
This graph happens when B and C are heads, then two different people
independently merge them.  (There are several real examples in
monotone's history.)  Suppose there is a hunk in B and C which
conflicted.  Suppose further that the person who merged them to create
D thought that B's version of this code was better, and the person who
merged them to create E thought that C's version of the code was
better, so they each just chose their favorite side to "win" that
conflict.

Now we come along to try and re-merge them.  Which ancestor do we pick
to use for the merge?

If we pick B, doing a 3-way merge between D and E using B for an
ancestor, we will calculate that our hunk is unchanged between D and
B, but has changed from B to E.  Therefore, 3-way merge says that the
situation is obvious -- E should win.  If, on the other hand, we pick
C as our common ancestor, we will find that E is identical to C, while
edits have been made to D -- obviously, D should win.  No matter which
LCA we pick, we will silently pick one side to be a winner.  In fact,
this should be a conflict.

To avoid this kind of problem, monotone currently uses the "LCA+DOM",
or "lcad" ancestor selection algorithm -- it chooses the most recent
revision that is a common ancestor of both nodes, and a dominator of
one of them.  In the above case, this would be A, which would be
correct.  This is "correct but conservative", in the sense that it
will never silently ignore a conflict, but it will sometimes give you
conflicts to resolve that you should not need to.

It's been clear for some time that this isn't really sufficient (in
particular, it has really terrible behavior in the common case of
long-lived parallel branches -- it degrades to something almost
CVS-like), but fixing it didn't bubble up to the top of the priority
list, since it seemed to work well enough in practice, for now.  It
appears that "now" has ended; I was wondering a bit the last week
whether the more intensely parallel development on mainline plus a
greater number of feature-development branches was going to make this
problem more obvious.  Between Richard's and Christof's comments, it
seems this is so...

So, let's fix this.

There are some different approaches to doing so:
  -- choosing ancestors per-file: this would dramatically improve many
     cases (though others, on files that are modified often even in
     different branches, like ChangeLog or monotone.texi, would not be
     helped nearly so much).  What I mean here is, suppose that in the
     above example, there was some file that was modified on the A ->
     B edge, but not on the A -> C edge.  Then it _would_ be correct
     to choose B's version of that file as the common ancestor for
     purposes of merging that file, though we still might need to go
     up to A for merging other files.  I.e., for each file, reduce the
     graph to just those nodes in which the given file was modified,
     and then run the standard ancestry selection algorithm on this
     reduced graph.
  -- use a better ancestor selection algorithm:
     -- I'm pretty sure that "if there's a unique LCA, use it;
        otherwise, fall back to LCA+DOM" is a correct algorithm, and
        it should help a lot in the long-lived branches case, where
        propagates are reasonably rare.
     -- It is probably possible to do much better; I have some
        sketches on how one can calculate some kind of optimal safe
        merge ancestor.  They are currently sketches.
  -- implement codeville-merge
  -- implement darcs-merge

Darcs merge is very complicated and not very well understood by anyone
outside the core darcs developers; it isn't clear that even they
understand it as well as they need to yet.  (It seems an open question
whether it can be implemented efficiently and robustly.)  So let's
ignore that one for now.

The easiest thing to implement would be the "unique LCA or LCA+DOM"
ancestor selection.  (Though it is not trivial, because we do not have
a find_unique_common_ancestor_if_exists algorithm, our
alleged LCA algorithm actually finds _any_ least common ancestor; and
sometimes not even that (try "lca e3623ca77d2a8b45817ecaa5b67018c453652830
7d5918d0181f7bb9adba2ee63234146d23bcf83e" in the monotone repo).)
This is only a stopgap measure, however.  Per-file ancestors,
similarly, while very useful, are unlikely to solve the whole problem.  

So, for the long term, that leaves "magic ancestor selection algorithm
possibly hiding in the crannies of Nathaniel's brain" or "codeville
merge".  It isn't _entirely_ clear which of these to prefer.  The
former involves much less of a change to the code, and 3-way merge is
still in some ways better understood than codeville-merge -- for
instance, we know how to present a 3-way merge conflict to user using
a nice 3-way merge tool, so they can see what's going on and resolve
it, while this is less clear with codeville-merge.  codeville-merge
also has some complicated edge conditions, that only Bram and Ross
currently grok (though they seem perfectly willing to talk,
given sufficient time and toits).  The main pre-requisite for
experimenting with codeville-merge is "annotate" functionality; by
coincidence, Emile has just written an unoptimized-but-apparently
working version of "annotate", available on the .annotate branch and
soon in mainline.  So I'm not sure whether it's worth trying to work
out magic ancestor selection at all.

So: short term, if anyone really wants to make this a little less
painful, ULCA-or-LCA+DOM is probably the easy quick way to get some
improvement.  In the long run, let's seriously look at
codeville-merge.  Per-file merge ancestor selection might be useful
even with codeville-merge, just as a way to provide some somewhat
useful temporal context to the user at conflict time.

Thoughts?

-- Nathaniel

-- 
Damn the Solar System.  Bad light; planets too distant; pestered with
comets; feeble contrivance; could make a better one myself.
  -- Lord Jeffrey




reply via email to

[Prev in Thread] Current Thread [Next in Thread]