monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] cvs_import rewrite


From: Nathaniel Smith
Subject: Re: [Monotone-devel] cvs_import rewrite
Date: Thu, 15 Dec 2005 15:53:10 -0800
User-agent: Mutt/1.5.9i

On Wed, Dec 14, 2005 at 12:06:37PM +0100, Markus Schiltknecht wrote:
> Hello monotone hackers,
> 
> what's up with the cvs_import rewrite branches? Anybody still working on
> such a thing?

I think they got finished and merged back in to trunk.  (Assuming you
mean cvs_import and not cvssync, which is still on a branch.)

> I have been looking at cvsps, which can extract patchsets from cvs
> repositories. Unfortunately, it fails in case of inconsistent CVS
> repositories (i.e. the PostgreSQL cvs repo).

Yeah, the rumors are basically that the only actually correct tools
for cvs importing are:
  -- cvs2svn
  -- monotone's cvs_import (but it doesn't link up branches)
  -- Canonical's internal, secret tool

Though I've also heard Canonical people bad-mouthing cvs2svn for
failing on various tests they came up with; unfortunately, they said
they couldn't make the tests available either, so I don't know whether
cvs2svn has been fixed, or how one would go about finding the
problems.

> Anyway, I thought about reimplementing cvs_import. I would use the
> algorithms of cvs2svn to get a more or less consistent view of the cvs
> repository. Due to the nature of monotone, it should be easy to improve
> the algorithm to be able to handle subsequent imports.

The goal of the last cvs_import rewrite was basically, "use cvs2svn's
algorithm to get things right" :-).  The main limitations I'm aware of
with current cvs_import are:
  -- it doesn't do any branch reconstruction.  This would be _really_
     helpful; at the moment, this makes its branch handling almost
     completely useless, because in monotone going forward you won't
     be able to usefully merge between branches that aren't linked up.
     The difficulty is that this is necessarily a heuristic operation,
     and there are states a CVS repo can be in that are simply not
     possible to meaningfully translate.
  -- no incremental re-importing.

> If I'm heading for such a rewrite, what should I be aware of? Would it
> be wise to store results from different processing passes in the normal
> monotone db? This would help subsequent imports a lot, of course. On the
> other hand you then have 'non-monotone-data' in your database, which you
> probably want to delete some day.

Right now we just do things in memory, and it seems to go okay.  We'd
have to look at the argument for how saving stuff somewhere persistent
would be a significant win, I guess.

> What could be different to cvs2svn?
> I.e. you don't absolutely need to sort by date. Overlapping commits in
> CVS would better be handled as two heads which later got merged again in
> monotone.

Intriguing idea!  How do you work out what the merge looks like?

> My goals with this are:
>  * gain speed in subsequent imports

Do you have profiling data showing where the bottlenecks are?

(Right now I'm pretty sure the bottleneck is the changeset sanity
checking, just like for initial pull.  Re-imports don't have to deal
with that; maybe if we added re-import support it would already be
plenty fast.)

>  * (correct) branch support

This would be _really_ good to have, and even an urgent need, as
mentioned above.  It might even be the first thing you should look
at/try, as a way of familiarizing yourself with the issues involved in
CVS importing...

-- Nathaniel

-- 
Eternity is very long, especially towards the end.
  -- Woody Allen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]