monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] [PATCH] cvs_import connecting branches


From: Nathaniel Smith
Subject: Re: [Monotone-devel] [PATCH] cvs_import connecting branches
Date: Sat, 18 Feb 2006 16:38:02 -0800
User-agent: Mutt/1.5.11

On Fri, Feb 17, 2006 at 04:30:46PM +0100, Markus Schiltknecht wrote:
> after three days of hacking on rcs_import.cc I came up with a very
> simple solution to the cvs_import branch connecting problem.

Great!  Now if only I was confident in my own understanding of CVS
importing... but that's what's been scaring everyone off from trying
this, so I guess I'd better try myself :-).

> My patch makes the cluster_consumer check every revision against the
> branch starting points. Only if a revision has all the same files with
> exactly the same versions as the branch starting point then this
> revision is considered to be the branchpoint for that branch.

When you say "exactly the same versions", do you mean exactly the same
content (like, SHA1), or do you mean exactly the same RCS revision
number?

The latter seems the right thing to do, because it is possible (even
likely) that the same tree contents will appear at multiple places in
history.  (E.g., whenever a patch is backed out.)

> If a branch can not be connected to any revision we fall back to
> importing the branch unconnected as before.

I suspect there are places where we can do somewhat better even than
simply leaving things unconnected; there are horrid things like

> Empty branches are not
> imported anymore (since I consider empty branches useless, to tag a
> release use tags ;-).

I might be missing something, but this doesn't seem like quite the
right behavior.  I don't consider empty branches particularly
interesting either, myself, but maybe we have users who do... they
went to the trouble of actually creating them in their CVS history,
after all :-).  Usually the best/safest thing for cvs_import to do is
to try and translate over the CVS history, as exactly as possible.

> To be able to correctly branch from revisions, the mainline needs to be
> imported first. In consume_cluster a revision is checked against all
> branches and the branch is possibly marked to start from that revision.
> Then other branches, which already got marked can be imported. Finally
> branches which did not get a mark get imported without any connection to
> previous releases (fall-back).

How does this deal with branches-off-branches-off-branches?  Don't you
have to sort all of them somehow?

> Downsides of this algorithm:
> 
> - more CPU intensive because every file in every revision has to be
> compared against every branch. This sounds harder than it really is
> because we only have to compare integer maps. Further this could be
> optimized by taking the branch start range and revision end time into
> account. (By branch start range I mean the time between the last commit
> and the first commit in the branch).

Speed is always nice, but as long as it runs in some vaguely
reasonable time, CPU is not that important for a conversion tool; if
it takes less than a week to import a large repo, then the tool is
usable :-).

> - if you have multiple branches with the very same revision and which
> are branchpoints for further branches, cvs_import might choose the wrong
> branch. I suppose will never happen in reality, but...

When dealing with CVS, _every_ bad thing has happened in reality
somewhere :-(.

On the other hand, though, we never guarantee anything better than
"best effort" for the bizarro self-contradictory stuff that CVS can
spit out, and certainly doing _some_ sort of branch reconstruction is
a large improvement over what we're doing now.


Do you know the cvs2svn branch reconstruction algorithm?  I don't
really know how they choose branch points myself, but you should
probably check it out if you're going to try and make this work.
Importing from CVS is a world of tricky corner cases, so learning from
people who have already tripped over a lot of them is very useful.
cvs2svn is definitely the tool with the best algorithms to look at.

Want to send me a pubkey, so you can commit this stuff as you work on
it? :-)  (Possibly to a branch, for now.)

-- Nathaniel

-- 
Eternity is very long, especially towards the end.
  -- Woody Allen

This email may be read aloud.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]