[Monotone-devel] Blob o' thoughts on converging edges

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Blob o' thoughts on converging edges

From:	Nathaniel Smith
Subject:	[Monotone-devel] Blob o' thoughts on converging edges
Date:	Sun, 25 Jul 2004 02:39:53 -0700

(This message is basically not written to be comprehensible to anyone
but graydon, but hey, maybe not, and public discussion is almost always
a good thing, so I'm cc'ing it to monotone-devel too.)

(Some notation: if P1 and P2 are paths, P1 + P2 is their
concatenation.  If A and B are revisions, A an ancestor of B,
treediff(A, B) is the change in trees accumulated between A and B,
however we end up representing that.  If A, B, ..., C are revisions,
then [A, B, ..., C] is a path from A to C traversing B, ....  If P is
a path, traverse(P) is the change in trees calculated by traversing
the path P.  treediff(A, B) := traverse([A,B]).)

With tree layout change tracking, calculating the changes made between
ancestor and left, ancestor and right, suddenly involves traversing
the graph between them; previously it was dependent only on the actual
revisions chosen, and not the graph itself.  It is highly desirable to
regain this property, i.e., we want the following to hold:

  Proposition *: given two revisions A and B, A an ancestor of B, then
  no matter which path between A and B is traversed to calculate tree
  layout changes, the final representation of those changes should be
  the same.  (Shorter version: treediff(A, B) is well-defined.)

If (*) is possible at all, it will hold iff we are sufficiently clever
at deciding what to put at nodes where edges converge, i.e. merges
(including propagates).  This is the only place where we can make this
work; we cannot detect divergence when it occurs, and at straight-line
changes there's nothing to do.  Therefore (#) is a necessary
condition; the conjecture is that it is a sufficient one:

  Conjecture #: If we start with a DAG in which (*) holds, and merge
  versions B and C to create version D, using ancestor A, then
  choosing treediff(B, D) and treediff(C, D) such that
     traverse([A, B, D]) = traverse([A, C, D])
  will result in a DAG in which (*) holds.

The tricky part here is that when we create a new such node, we create
lots and lots of paths, including ones that terminate before reaching
A -- there can be arbitrary amounts of messiness between A and D.  But
I think with some conditions on how paths compose, e.g. if
  traverse(P1 + P) = traverse(P2 + P) => traverse(P1) = traverse(P2)
holds, then (#) will turn out to be true.  (It may also depend on how
one chooses A, which is an interesting twist, e.g. it may be important
that A dominates at least one of B, C.)  Umm, I guess it also depends
on the tree layout change representation whether satisfying (#) is
even possible.


Here's an interesting example of how properly dealing with converging
edges is important.  A tricky use case Zack gave me is when you have
two branches, and a file has been renamed on one of them, and you want
to keep things that way.  Say you have two different branches, and
they contain files named x and y that were originally the same.

x_branch   y_branch

  A
  |\   this edge says rename(x, y)
  | ----------
  |           \
  B            E
  |            |
  |            |
  C            |
  |\propagate1 |
  | -----------F
  |            |
  |            |
  D            |
   \propagate2 |
    -----------G

If any changes occur to x in between A and C, you want to merge them
into y in F.  Okay, we've recorded the rename, we'll find it when we
traverse [A, E], all is well.  Now we propagate again into G, and
again we want any changes to x between C and D to be merged into G.
But now, naively, there's a problem... the rename happened up on [A,
E], and our merge is only going to trace [C, D], [C, F]; it will know
nothing about the rename.  (And possibly get very confused, since
there's this file y that doesn't exist in the ancestor and was never
added...)  Conclusion: the edge labeled propagate1 must _also_ be
annotated with rename(x, y), as if the rename happened again there.
This causes (*) to be satisfied, and makes this merge work.
Calculating this could be tricky, though; if, say, someone else
renamed x to y on the [B, C] edge, then we would need to _not_
annotate this on the [C, F] edge.


The example that seemed problematic a few days ago with my description
of "copy" is isomorphic to this.  Say on the [A, E] edge you copied
a gcc backend directory to another name, so you could port gcc to a
new chip, and in doing so you inherited a bunch of bugs.  Say also
that we're using the copy-and-edit-original-conflict model.  If
someone edits the original backend on [A, C], then monotone will point
this out when you perform propagate1, and give you the option of
merging those changes into your copied backend.  It then annotates [C,
F] with the same copy operation, and propagate2 will again know about
the copy.  Of course, if you ever propagate back from y branch to x
branch, the copy will exist in both places, and these annotations (and
conflicts) will stop occurring.

This is a principled way of behaving, and there are situations where
it could be incredibly useful and well beyond what other VC tools do;
on the other hand, there are also situations where it could be
incredibly annoying.  So I'm not sure whether I like it or not; maybe
some way to turn it off on a case by case basis, "okay, I'm _really_
diverging these two files now"?  Dunno.

-- Nathaniel

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Blob o' thoughts on converging edges, Nathaniel Smith <=

Prev by Date: [Monotone-devel] netsys and crypto...
Next by Date: Re: [Monotone-devel] bash completion
Previous by thread: [Monotone-devel] netsys and crypto...
Next by thread: Re: [Monotone-devel] bash completion
Index(es):
- Date
- Thread