monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] RFC: CVS sync design


From: Nathaniel Smith
Subject: Re: [Monotone-devel] RFC: CVS sync design
Date: Fri, 24 Dec 2004 00:44:12 -0800
User-agent: Mutt/1.5.6+20040907i

On Thu, Dec 23, 2004 at 09:41:36AM +0100, Christof Petig wrote:
> Here is the current design I intend to realize to get CVS sync working:

Cool, I think this would be really useful for a lot of people.

> Do not store state information in both trees 
> so that syncing with several CVS servers is possible.

I don't understand the connection between these statements.  We could
certainly store in Monotone one cert that said "this revision
corresponds to a checkin in CVS repo Foo, whose state was ...", and
another that said "this revision etc. in CVS repo Bar, whose state was
...".  Storing state seems like it could significantly reduce
complexity, be very useful for later spelunking (cf. how subversion's
CVS->SVN code stores things like CVS version numbers as metadata, just
in case they're useful later), and I don't see any immediate
drawbacks...

> Preserve 
> changelog and timestamp of every change.

And author?  Or does CVS not let you set that?  (Monotone will let you
set the author field on commits to arbitrary strings, so _that's_ no
problem.)

> syntax:
> monotone pull [--branch foo] cvs://localhost/usr/local/cvsroot 
> module[:branch]

I would strongly prefer that this functionality not overload the
meanings of push/pull/sync.  Synchronizing with a CVS repository is
a significantly different process than synchronizing with another
Monotone repository.  Maybe cvs_pull/cvs_push or something?

> this most closely resembles current monotone standards and should mean:
> take CVSROOT "localhost:/usr/local/cvsroot" (via CVS_RSH) [which covers 
> 95% of my uses] and import the (branch "branch" of) module "module" into 
> monotone branch "foo". I'm not quite sold to this syntax, it simply 
> seemed most similar to the existing sync/commit syntax. Standard for cvs 
> branch is "HEAD", for a _reasonable_ standard for the monotone branch I 
> have no idea. Perhaps "module:branch" (which is nonstandard for 
> monotone) though localhost.usr.local.cvsroot.module.branch would more 
> adherent to current policies.

Explicit is better than implicit.  I think we should just make the
user specify the desired correspondence between Monotone and CVS
branches; the two systems are different enough that there's really no
good way to guess.  (I'd even be fine with requiring them to type HEAD
when they wanted the head branch, rather than defaulting.)

> pull:
> Issue an rlist -Red on the server and note the current state [current 
> cvs-manifest*], the highest timestamp is the timestamp of the latest 
> commit. Get it's changelog [and a list of branches] via rlog 
> -rlatest_revision file.
> 
> Now look whether a matching revision (timestamp+changelog) is found in 
> the monotone database. [Optionally compare every file of the cvs 
> manifest and the monotone manifest by checksum (I don't know yet how to 
> get a checksum without a patch from cvs (update gives patch and 
> checksum), perhaps I can trick update to tell me the (md5) checksum of 
> an existing RCS revision by asking for updating a head revision)]
> 
> If no matching revision is found go back in history (rlist (RCS revision 
> by date) or rlog (changelog by date/revision)) [building it on the fly].
> >how to _most_efficiently_ get the full set of recent commits for every 
> file?< [Perhaps: take the latest timestamp and ask for the version 
> before (with log), take the second latest timestamp and repeat] Repeat 
> until we find a matching cvs-manifest and monotone revision (see above). 
> Go forward in history and (request the patch(es) for every cvs-manifest 
>  and commit to monotone) until we reach the head.

This seems like the time when keeping some state would be really
handy.  What about having a cert that says "this revision corresponds
to the the following files in CVS repository ___:
  file1  1.3
  file2  1.8
..."
(I guess a problem here is what namespace to use for CVS repositories.
I guess I don't have any useful intuition here, since I can't even
think of a situation where one would want to synchronize with two
different CVS repos...)

Then the pull operation becomes:
  1) traverse up from the branches heads until we find such a cert.
     (If we don't find such a cert, then we start from the beginning.)
  2) having found such a cert, we simply request deltas forward from
     each revision mentioned in the cert until the revisions in the
     current tip of the branch.

> The push command will be an alias to sync because to check into a CVS 
> repository you need to have an up to date copy of it. [As we surely all 
> know ;-)]

I'd rather not have a 'sync', and instead have 'push' fail if commits
have occurred since the last 'pull'.  This
  - matches the normal CVS semantics for update/commit
  - is much less surprising than having 'push' actually do a 'pull'
And 'sync' isn't useful anyway, because when you do a 'pull' and then
immediately do a 'push', at least one of them will always be a no-op.
(If the 'pull' is a no-op, the 'push' will succeed; if the 'pull'
actually pulls a new revision, then there will be nothing for 'push'
to do, because that revision will have no children to be pushed.)

> push/sync:
> Take the latest monotone revision matching the latest cvs-manifest and 
> try to go forward from there (unless we reach a point where two 
> descendants are valid). Commit the resulting manifest (asking the CVS 
> server for patches).
>
> Once we reach a fork point where multiple descendants are valid present 
> the user the descendants in a heads like manner and ask him to specify 
> which fork to take for head (using an option to the next sync command)

To I think clarify this, and suggest something _slightly_ different,
here's my version of push:
  - find the latest revision that corresponds to a cvs-manifest
  - check to see whether that cvs-manifest is the tip of the branch we
    wish to sync with; if not, error out, telling the user to perform
    a pull and do some merging
  - now pick a child of that revision, commit it to the CVS server,
    and recurse

The only tricky part is choosing the children to commit; this is the
old 'pick a distinguished linear subbranch' problem.  Some strategies:
  a) pick randomly
  b) let the user choose the revision to end up with, and pick a
     random path to get there
  c) recurse only so long as there is a linear path to follow, and
     then stop when we reach the first fork
  d) check ahead to see whether there are any forks, and if there are,
     abort early and tell the user to specify explicitly which
     revision they want to push to the server (this is similar to
     monotone's 'update' command).  There must be a unique (linear)
     path from the CVS tip to that revision.

It seems like some desireable properties are:
  1) the user doesn't have to do n push's to send n revisions to the
     server.  (So you want to push whole chunks of the graph at once,
     at least sometimes.)
  2) you want to be able to specify which revision ends up as the CVS
     tip
  3) you want to be able to specify exactly which revisions are
     committed (i.e. both which revision ends up as the CVS tip, and
     which path is taken to get there)

I think in practice (b) is best.  The only advantage of (c)/(d) over
(b) is that they force you to specify the exact intermediate revisions
to commit, i.e. they prioritize (3) over (1). In most cases, though,
most people won't care exactly which revisions are committed, so long
as you end up with a branch tip that has all the changes in it.  I.e.,
(3) is more important than (1).  So (b) is better than (c)/(d).

This still leaves the question of, if there's more than one head and
the user doesn't specify which one to end up with, do we abort and
force the user to pick one, or do we pick one randomly?

-- Nathaniel

-- 
"But suppose I am not willing to claim that.  For in fact pianos
are heavy, and very few persons can carry a piano all by themselves."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]