monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] The dark side of content addressing


From: Nathaniel Smith
Subject: [Monotone-devel] The dark side of content addressing
Date: Wed, 1 Mar 2006 02:40:31 -0800
User-agent: Mutt/1.5.11

I just thought of a fun edge condition!

Imagine:
  Alice and Bob, separated by years and continents, both decide to
    start a project under monotone.  Because they like to do one thing
    at a time, they first create an empty directory, initialize it as
    a new monotone project, commit that, and then start adding content
    files.
  Chuck and Daria both hear about a cool new framework, Beryl on
    Bars, and decide to try it out.  It's so very cool, that it
    includes a project skeleton generator.  (But not quite cool enough
    for the skeleton generator to insert their project name into the
    generated files.  Or something.  It's an example.)  Being
    methodical types with excellent taste in VCSes, the first thing
    they do with their generated skeleton is check it into monotone.
  Erin notices that every time she starts a new little project, she
    has to load in the same files over and over again.  So she makes a
    little template directory, that contains COPYING, a ChangeLog
    stub, etc., and whenever she starts a project she begins by
    copying this stuff into a new directory and committing it.

All of these cases have an interesting thing in common -- several
otherwise independent projects happen to have identical first
checkins.  This has a surprising effect: identical content, plus
identical histories (since they, well, _have_ no history) means that
their initial revision id will be the same.  This means that we
absolutely cannot distinguish this case from one where in fact only
one project is created, which immediately begins diverging in several
directions.

This means that the "independent" projects have, as far as monotone is
concerned, identical objects -- in the sense of the logical identities
monotone tracks for renames and the like!

Suppose Erin later decides one of her projects should be imported into
another as a subdirectory -- the new merge_into_dir command's use
case.  It won't work!  The sub-project and main project share a root
directory.  Or if Alice and Bob finally overcome their communication
issues and decide to do something similar, the same thing will happen.
They find this unexpected and confusing!

The root issue (so to speak) seems to be, if two people make identical
changes starting from the same code base, most people's intuitions are
that these are "the same change" -- the interesting cases are things
like two people applying the same patch, or two people doing the same
merge using the same merge algorithm.  Collapsing such changes is not
a hugely important feature (since you have to get lucky for it to
happen), but it is a feature nonetheless, and quite nice in some
situations.

However, if two people run 'setup' independently, that's a much more
forceful act -- I think most people's intuitions are that every run of
'setup' should probably create a new, unique history line.  (Which is
one reason why 'setup' is misnamed, but anyway, UI overhaul is for
_next_ month.)

The solution that leaps to mind is to salt graph root revisions --
stick 160-give-or-take bits of entropy in each root revision.  (These
already have special syntax, for that matter -- their parent revision
has the unusual hash, "".)  The invariant that this preserves is:
  -- if people have had no communication, direct or indirect, then
     their node_id's shall be distinct
while people who actually are working against the same code base (in a
stronger sense, now -- that they've actually shared the same code,
not just in content but in having a common origin) may overlap and
even "commit the same change".

For comparison, consider a modified version of one of the above
examples.  Suppose Erin keeps her template directory again, but now,
like any reasonable person, keeps it in monotone!  Every once in a
while she tweaks it and commits a new version (changing her email in
the stub README file, or whatever).  Now, when she starts a new
project she has two options.  Option 1 is to do like above -- copy the
files into a new directory, and import it from scratch.  This loses
the history of the template files (presumably not too big a loss), and
allows projects to be merged together -- since the presence of our
highest-quality import salts means that they have logically distinct
root dirs, separate README files, etc.  Option 2 is to start new
projects by branching them off of the template branch.  This has the
advantage that if she makes changes to the template, she can directly
propagate those changes through to her projects; propagate always
passes edits between logically identical objects, and branching
preserves logical identity.  The disadvantage is that if she wants to
combine these projects, she can't do that directly -- they're branched
from the same thing, and so propagate will want to pass edits between
the shared pieces, just like before.[1]

The nice thing about this is that behavior in both options now makes
sense entirely within the model we present to users :-).

I'm curious what people think of this. It seems like a pretty
fundamental point, but I think it's totally new to the discussion.
OTOH, if we _do_ decide to salt initial revisions, that's another
revision format change, we might want to sneak it in before 0.26...

-- Nathaniel

[1] This may be a nice use case to chew on for discussions of "copy"
functionality -- I think it hits a few of the situations where
defining copy's proper behavior is quite tricky.  That's for another
time, though...

-- 
The Universe may  /  Be as large as they say
But it wouldn't be missed  /  If it didn't exist.
  -- Piet Hein




reply via email to

[Prev in Thread] Current Thread [Next in Thread]