monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Change-Version Duality


From: Tobias Oberstein
Subject: [Monotone-devel] Change-Version Duality
Date: Sun, 18 Jul 2004 21:53:21 +0200
User-agent: Mozilla Thunderbird 0.7.2 (Windows/20040707)

I refer to a recent disussion related to the "duality between
change and version" oriented views on evolving data.

Here's context:

 > in general, the more I've discussed monotone with others in recent
 > days, the more clear it has become that we need a slightly (or
 > massively) enhanced ability to denote, select, search, split, combine
 >  etc. edges ("changes") in addition to versions ("states"). it is
 > possible that this can be done synthetically (as we currently do)
 > from the ancestry graph, but it remains to me an open question.
 >
 > indeed, my feeling increasingly is that the *central* problem in VC
 > is dealing with the change / version duality in a seamless manner. by
 >  seamless I mean both from a UI perspective, and from the perspective
 >  of "equally efficient supporting data structures". all systems I
 > know of focus on one view or the other. can we make a system which
 > does both equally well?
 >
 > I'd welcome further thoughts on the matter.
 >
 > -graydon

I did some mind bending exercises and here's my report. So, if that's
useful for the discussion, I'd be happy to hear your comments.

Tobias

------------------------------------
Lets look at a simplified setting where the repository tracks the
evolution of a single unnamed file.

By doing so, we step aside (for the moment) the issues that are
introduced when a repository namespace is a combination of a
hierachical, non-ordered namespace ("file system/hierarchy")
with attached non-hierarchical, linear ordered namespaces ("files").

------------------------------------
I'll compare two data structures:

   CG = "change graph" and
   VG = "version graph",

which definitions will follow immediately. Please have a look at the
attached PDF .. it's probably much more revealing than the description
that follows.

The first page gives 13 example strings S0 - S12. This is the data
we'd like to track. For the moment, ignore the red and blue colorings
on this and the second page.

The second page compares an example CG and a VG which happen to be
equivalent by T. Imagine you had started with S0, modified it for S1
and checked in. To retrieve version 1 you would start at node "a"
(which is the designated start node) in the version graph on the
right, follow whichever edge has the largest number smaller than 1
 before the ":" and write down the text on the edge.

That is, you'd follow these edges:

0: 1, „Ths is\n“
0: 1, „a little\n“
0: 1, „sentence.\n“

For version 2, you'd follow:

0: 1, „Ths is\n“
0: 1, „a little\n“
1: 2, „too stupid.\n“

[remember to ignore the red and blue at this time]

For version 6, you'd follow:

4: 5, „Headline\n“
0: 1, „Ths is\n“
0: 1, „a little\n“
5: 6, „new line 1\n“
5: 6, „new line 2\n“
5: 6, -
2: 3, „farfetched.\n“
3: 4, „No, just\n“
3: 4, „kdd ..\n“

Hope you see how this is done. The crucial point is, that there is
more .. by traversing the version graph, you can not only retrieve
a version but also _construct_ the equivalent change graph!

That is, the one data structure gives a recipe how to construct
the other. In other words, T is defined by transforming a
node-edge-node in the version graph, e.g.

(c) --> 2: 3, „farfetched.\n“ --> (d)

into an edge

(2) --> c: d, "farfetched.\n" --> (3)

Now, what can you do with the change graph? What if we'd like to know
what changes we did in the first 3 evolution steps of our data?

Asking the change graph, we would travel from root to node 3 and
collect the changes as defined by the edges. Doing the same using
the version graph, we would have to traverse potentially the whole
graph, since changes can be arbitrarily scattered over the document
structure. On the other hand, getting all versions of a single
line (better "pinned location" .. se below) is harder using the VG
since the information can likewise be scattered all over the graph.

This is indeed a critical point: CG exposes the data evolution by
"change", whereas VG exposes the evolution by "version".

I'm pretty unsure if the picture and those few lines can provide
you with enough information to get the idea. Maybe it's faster
to get feedback first ..

So thats the "how", but what about the "why" in the first place?

------------------------------------
What follows are three propositions, which hopefully will give
you enough clues to understand _why_ the data structures are
defined how we did.

Proposition 1 (handwaving/unproven)

CG and VG are equivalent in the sense that
  (a) both don't loose any information when recording the evolution
      of data,
  (b) each in itself doesn't contain redundant information and
  (c) CG can be computed from VG using a fixed procedure T and
      VG can be computed from CG using a fixed procedure T^-1
      (T and T* define a bijection between VGs and GCs: T ° T^1 = 1).

That is, the two data structures neither differ in expressivness
nor in space complexity.

Proposition 2 (handwaving/unproven)

CG and VG do however differ in terms of run-time behaviour
they expose when used with algorithms for computing solutions to
problems like
 - get a working copy of the latest version of the repository
 - get a working copy of a very old version of the repository
 - replay a cherry picked sequence of changes to a working copy
 - undo some changes, when other changes were already applied
   in the meantime
   ..

Proposition 3 (handwaving/unproven)

We can define evolution operators on VG and CG, which when
applied to concrete VGs and CGs let the following digram commute:

Assume CG1 \in CG, VG1 \in VG and T(VG1) = CG1

   CG1 --Da --> CG2
    ^            |
    |           T^-1
    T            |
    |            v
   VG1 --Da*--> VG2
------------------------------------

There are a couple of more advanced things left. E.g. the red/blue
colored stuff. What if you see that you introduced a bug when inserting
new lines in the past (see the typo: „Ths is\n“), but committed
other stuff in the meantime. Now of course can you go an fix it
and commit right away. That would result in the blue stuff in the
figures. But what if somebody now wants all your change up to step
4, including your last fix for the typo, but excluding all the
stuff you did in between? You do this by fixing the past. Thats
the red stuff.

Another point is: probably you'll asked yourself what precisely
is represented by the _nodes_ in VG and CG. The nodes in VG
represent "pinned locations between lines" (short: "pins"),
the nodes in GC represent "states" between changes.
The outgoing edge set of some node in CG represents a "change"
and the edges in VG represent "line versions" (short: "lines").

What we did was "line based pinning". We introduced a new pin
for every new line inserted. One can imagine other pinning
methods:
  - diff hunk pinning
  - block structured pinning
     - auto block pinning
     - user block pinning
  - character pinning

With "diff hunk pinning", we introduce new pins only on the
switching points between hunks idenitfied by the diff program.
Or we can version structured blocks, like {}-blocks in C ...
Or we can have a special Editor that allows us to press
"create versioned block .." to pin a location in our source.
We could collect private data members and accessor member functions
in one versioned block. That way, changing the private member
will result in a conflict when s.b. else changed the accessor,
even if the accessors are on completely different lines ..

puh, anybody still there ?? any comments appreciated!

cheers,
Tobias

Attachment: change-version-duality.pdf
Description: Adobe PDF document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]