monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: monotone disapprove does not give correct branc


From: Daniel Carosone
Subject: Re: [Monotone-devel] Re: monotone disapprove does not give correct branch cert
Date: Thu, 27 Oct 2005 19:13:53 +1000
User-agent: Mutt/1.4.2.1i

On Wed, Oct 26, 2005 at 08:59:27PM +0200, Wim Oudshoorn wrote:
> > The only model I see now in branches is that "branch is a set of
> > revisions sharing a branch certificate". I miss the single origin and
> > the continuity :(
> 
> That is just my mental model and it is working perfectly.

yup.

> Below I will outline my mental model of what monotone does
> or should do.  Keep in mind that this is JUST my mental model
> and monotone might do things differently.  I will still talk
> as if monotone works the way I expect, so take it with
> a grain of salt ;-)

I'll try to add some salt from my mental model, then.  Forgive me for
extensive quoting, I don't want to lose your details as context.

> MENTAL MODEL
> ------------
> 
> (1) The revisions are a simple direct acyclic graph.  Not 
>     necessary connected.  
> 
>     That is:
> 
>         * You have revisions, which are nodes in a graph.
>           A revision correspond to a collection files/directories
>           with a certain content. However the content
>           does NOT uniquely identify a revision.

Yes, because the ancestor id is part of the revision, and provides the
edit-graph structure.

>         * You have arrows between revisions.  
>           Such that there is at most one arrow
>           between two revisions, and you can not
>           return to you'r starting point when 
>           you walk the arrows.

Yes.  These arrows construct the ancestry graph, which represents the
editorial history of changes. 

They also happen to correspond to deltas that store the actual edits
made across that edge, but that's not important to this view of the
model (you could store each revision in complete form and derive the
diffs later instead, if you wanted).  Furthermore, there are smaller
per-file deltas and graphs that are usually not seen, and are also not
important here.

> (2) Now on these revisions you add some extra data, the 
>     certificates.  These certificates are not fundamental
>     for the working of monotone.

Yes.  A common parlance for such things is decoration, or annotation.
Certificates add descriptions or attest to statements about revisions.

> Now the the combination with version control for me is the following:
> 
> (A) The branch label (or certificate) is used to group revisions together,
>     in some conceptual group of versions (revisions).  This group
>     has some identity that stays the same over time.  
>     Namely it is a version of the software product you are developing.
>     
>     Note: This label is JUST a convenience label, to make
>     the monotone user interface easy to use.

Yes.

> (B) The edges/arrows between the revisions I use to indicate that
>     one version/revision of the software supercedes another revision.
> 
>     so  Rev 1 ---> Rev 2
> 
>     means I think Rev 2 is better than Rev 1, that is the only thing
>     it does.

Hm.  I'm not entirely certain that even this much interpretation
should be ascribed to a revision.  I think all this shows is that Rev
2 was an edit made to Rev 1.  Good, bad or other interpretations
should be described by certificates - and all may change over time or
be subjective.

> One important thing I want to stress here is the fact that for
> me 'branch' and 'arrow' (being better) are not related at all, they
> are orthogonal properties.

Certainly.

> So I can easily have:
> 
>    Rev 1     ---->  Rev 2 
>    branch A         branch B
> 
> And I still think Rev 2 is better than Rev 1, although the are 
> in different branches.

Not really about the 'better' part, as above. It's just edit
history.. Rev 2 might be very much worse, by many criteria. The only
sense of better than really applies is "has had more work done on it",
even if that work ends up being bad work.

It might be the very first step of a rewrite of a subsystem, where all
you've done is rip out the old (working, but a little grotty)
implementation in preparation for replacing it with further work.

You make statements to this effect by decorating the revisions with
certs, especially branch certs.  Firstly, by giving B a name that
indicates its experimental nature - and secondly by *omitting* the A
cert.  This part you clearly have right, as below.

>  EXAMPLE
>    I decided to have my source files for version 1.2.9 of my project 
>    XYZ grouped together under the branch name "XYZ-1.2.9"
>    and the source files for version 1.3.4 of my project 
>    XYZ grouped together under the branch name "XYZ-1.3.4"

A small nit, related to usage and terminology rather than to concepts.
These kinds of names for project versions are usually releases, and
represented by tags, rather than branches.  A branch is often used to
maintain a stable copy of the code for critical fixes, before and
after the actual release point, and those branches are often named
similarly.

My only point here is that to explain these concepts very clearly you
need to use names and examples that very clearly illustrate the
separation between these things, lest users become confused.  It can
take some doing, see http://www.netbsd.org/Releases/release-map.html
for an example of such an explanation.

>    Than it is is perfectly possible I have four versions of 
>    source:
> 
> 
>       Rev 1             ----->     Rev 2
>       XYZ-1.2.9                    XYZ-1.2.9
>                     \
>                      \---->   Rev 3
>       Rev 4                   XYZ-1.3.4
>       XYZ-1.3.4
>     
> 
>   With this I make the following statements:
> 
>   Rev 2 is better than Rev 1
>   Rev 3 is better than Rev 1
>    
>   And I can ask monotone the following questions:
> 
>   - What is the best version of XYZ-1.2.9
>     
>     monotone update/checkout --branch=XYZ-1.2.9
> 
>     and it will answer: Rev 2
> 
>   - What is the best version of XYZ-1.3.4
>     
>     monotone update/checkout --branch=XYZ-1.3.4
> 
>     and it will answer:  Huh, don't know, could be Rev 3 or Rev 4.
>     (actually it will complain about multiple heads)

Spot on.

>   END EXAMPLE
> 
> Now with grouping most people have experience, it is what
> we do all the time:
>    
> * These files are my sources for XYZ
> * These files are my sources for ABC
> * These files are my sources for ABC with experimental feature QQ
> 
> so these are prime candidate for branches.  Just to reiterate, 
> it is not needed to make them branches, however why would you
> want to fight the nice classification system monotone has?

Yes indeed :)

> The more interesting question, where do you want to put arrows?
> As stated above, arrows just tell monotone which version you 
> prefer.   And instead of you telling monotone explicitly which version
> is better, monotone makes an educated guess:

No. Well, yes, but not *quite* like you describe. Read on..

> * If you started working on Rev 1 and after a few
>   changes store the new version in monotone as Rev 2.
> 
>   Monotone knows you started with one and will assume
>   that all your hard work was an improvement on Rev 1,
>   you would not deliberately make it worse would you?
> 
>   so it will happily add an arrow:
> 
>   Rev 1 ---> Rev 2.

As above, this arrow represents purely the edit history.  However,
monotone does assume you prefer the new version, and does make an
educated guess about what kinds of statements you want to add about
this new version. It does this by adding another branch cert the same
as the branch you asked for in the checkout by default, unless you
tell it otherwise.

Let's pretend it didn't. Then commit and approve would be two separate
steps.  If they were, it would be more work and less convenient for
users, but it would make the distinction between ancestry and branch
certification more evident too.

It also gets at the heart of what disapprove should mean; more (rather
a long way) below.

> * Suppose you end up in the situation that you have 
>   two revisions Rev 1 and Rev 2.  And like them both, 
>   so you want to combine all the good of Rev 1 with Rev 2.
>   
>   How do you do that?
> 
>   Well that is where monotone merge/propagate is for.
>   merge/propagate does two things:
> 
>   * It will help you combine the sources from Rev 1 and Rev 2
>     to create a new set of sources Rev 3.  So we have
> 
>     Rev 1
>                   Rev 3
>     Rev 2
> 
>   * It will mark Rev 3 a superior to Rev 1 and Rev 2, so it     
>     will add the following arrows:
> 
>     Rev 1 ----->
>                  Rev 3
>     Rev 2 ----->

Provided Rev 1 and Rev 2 have a common ancestor somewhere back up the
edit graph, yes.  It's worth a discussion about the differences
between merge and propagate, with respect to how they relate to branch
certs and which revisions they will pick for 1 and 2, but without this
in the diagram, yes they look the same.

That discussion is a little like the commit discussion about 'educated
guesses' above.  Monotone provides a convenience to developers by
making educated guesses about which divergences in the graph should be
merged quickly, and which should be allowed to diverge further, based
on branch certs.

> NITPICKING RANT
> ---------------
> 
> I see the whole monotone graph as a expression of what 
> I think are good revisions, the revisions I care about.

Same point again about goodness.  You should decide what you care
about based on the certs as selectors for subsets of the graph.

> So for my personally I like maximal freedom of how
> I arrange my monotone graph, and I want the graph
> to be as explicit as possible.  So with propagate
> I prefer always a new node to be added.
> 
> Example:
> 
>         Rev 1          Rev 2   
>         branch A  ---> branch B 
> 
> Now propagate B A
> the current behavior is:
> 
>         Rev 1          Rev 2
>         branch A  ---> branch B
>                        branch A
> 
> Which completely loses the meaning that propagate has for me.

Either because you're not showing enough of the graph or you're
showing a completely degenerate case for propagate!

In the example you showed before the propagate, there are several ways
we could complete the graph:
 
 * It's complete. Rev 2 has only Rev 1 as an ancestor, and is the only
   rev with cert b:B.

 * Branch B is unmerged: Rev 2 adds a new head, and there are other
   revisions elsewhere in the graph with changes against an earlier
   common ancestor.  These would all have to be merged, and then the
   propagation back would bring other changes for a net new revision.

 * Branch A has more revisions, and a head somewhere further down the
   graph than Rev 1, with more changes along those edges.  The diffs
   from Rev 1 to Rev 2 would be applied to that head rev instead,
   creating a new revision with that head and Rev 2 as ancestors.

 * Both of the above 2 completions of the graph are necessary, or
   something even more complex in terms of more intemediate merges has
   to happen..

You do your propagate, and illustrate that its the first case you had
in mind. This is degenerate; propagate takes the changes from Rev 1 to
Rev 2, and applies them to Rev 1.

Monotone already knows that in terms of checkout contents, this
produces Rev 2, and just adds the cert.
 
>        Rev 1       --------->  Rev 3
>        branch A                branch A
>                  \        /
>                    Rev 2   
>                    branch B
> 
> captures the meaning a lot better.  Because Rev 3 has
> two incoming edges you know it comes from a merge/propagate!

Your point is that the original edits, and the propagate command, even
if they make the same changes to Rev 1, are different occasions on
which that change was made. You can't have two edges directly between
Rev 1 and Rev 2, so you need a new rev (with the same contents) to
represent the different editing commands that occurred.

Another way to look at this: there's no merge happening, because
there's no divergence in the revision graph - only in the metadata
certs.  I'm honestly not sure what to make of this. Does it represent
a flaw vs the way monotone *should* work, to support a clearer mental
model?

Perhaps you only want both developer commands to be represented as
edges in the edit graph, because you see edges as representing
increasing goodness?  Even without the goodness distinction, perhaps
just the work is worth a new revision? Or perhaps just adding the
branch cert is right, because it represents a decision about the new
revision, rather than any further change?

It's a great thought provoker.  I'd like you to consider what it looks
like when changes arrive from another developer, who made exactly the
same edits from Rev 1 (they were clearly a good idea, after all), but
committed on branch A only - or on branch C. 

That seems like the best way to answer the question, to me.  By the
definition of a revision (content+ancestor) the new arrival is also
the same revision, Rev 2, though it carries both developers' log
messages and other certs.  The edge represents the same change, no
matter how many ways and times it was made.  We don't make null merges
as new revisions in this case. Metadata is separate, precisely so this
works. If you want to represent the 'running a propagate command'
event for posterity, use a metadata cert to do so.

> Furthermore, sometimes it would be nice to have an
> empty commit because of the arbitrary meta data you can attach:            
> 
>     Rev 1        
>     branch A      
>     test failed  
> 
> After fixing the test, because it was a bug in the tests I want:
> 
>    Rev 1                     Rev 2
>    branch A        ---->     branch A
>    test failed               test succeeded
> 
> 
> In this case the arrow more or less indicates that:
> Rev 2 is better than Rev 1 because it's meta data is more accurate.
> I you make the case that if meta data is an important part of your
> revision you should include in the revision and don't use meta data,
> I would tend to agree.  However, things like test results are only
> know AFTER a revision exists, so it can not be included in the revision.
> In this case the arrow more or less indicates that:
> Rev 2 is better than Rev 1 because it's meta data is more accurate.
> I you make the case that if meta data is an important part of your
> revision you should include in the revision and don't use meta data,
> I would tend to agree.  However, things like test results are only
> know AFTER a revision exists, so it can not be included in the revision.

Here I disagree at least with the example, for several reasons.

 * test results are metadata, as you say, and can be added any time
   later.  A long time later.  Even for tests that *didn't exist* at
   the time the revision being tested with a result was created.
 * if you change the test, previous results might now be invalid, or
   at least misinterpreted; you have a new test and you should
   probably use a new key for it or find some other way to version the
   tests.

One partial solution to the latter is to store the tests with the
code, and always test with the test revisions that are part of the
tested revision. If you're changing the tests you need to commit a new
revision that represents that change.

That only goes so far. There are external tests. There are new tests
for which you're interested in the results against old revisions,
perhaps to see where a flaw was introduced, or to deprecate the buggy
versions. There are build test failures that are caused by external
factors or bugs in build tools.

There are several discussions getting conflated here:
 * can I make a null commit (a revision with a different id because of
   the different ancestor, but no content delta - no edge?), just
   because it might be useful for various unanticipated purposes.

 * can or how do we do metadata/cert revocation?

 * and, finally, at long last, back to the original subject, does
   'disapprove' mean branch-cert revocation, or does it mean 'revert
   changes'?

Disapprove is presently used for revert changes, a little like
merge/propagate: the user is asking monotone to cleverly select
changes from the revision graph and apply them in specific ways to a
new revision. We've clarified and fixed it for this purpose, including
the metadata it guesses it should add. It or approve is clearly
misnamed, since approve adds a branch cert.

I can also see the need for cert revocation, or something like it.[*]
Perhaps for branch certs (though not so much with their common usage
mostly-aligned with the editing graph), more readily for other cert
types: bad log messages, test results, tags, etc.

> Hm, this e-mail is a lot longer than I anticipated, next time I will
> try to be short and clear :-)

Eerr.. yeah.

--
Dan.

[*] I am not going to put the wild-assed idea I just had on this into
words here. I may post it in njs' wild-assed idea thread, later.
Right now, I clearly need to stop thinking.

Attachment: pgpZ9gD_UnFJU.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]