[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] cvs_import rewrite
From: |
Nathaniel Smith |
Subject: |
Re: [Monotone-devel] cvs_import rewrite |
Date: |
Thu, 15 Dec 2005 15:53:10 -0800 |
User-agent: |
Mutt/1.5.9i |
On Wed, Dec 14, 2005 at 12:06:37PM +0100, Markus Schiltknecht wrote:
> Hello monotone hackers,
>
> what's up with the cvs_import rewrite branches? Anybody still working on
> such a thing?
I think they got finished and merged back in to trunk. (Assuming you
mean cvs_import and not cvssync, which is still on a branch.)
> I have been looking at cvsps, which can extract patchsets from cvs
> repositories. Unfortunately, it fails in case of inconsistent CVS
> repositories (i.e. the PostgreSQL cvs repo).
Yeah, the rumors are basically that the only actually correct tools
for cvs importing are:
-- cvs2svn
-- monotone's cvs_import (but it doesn't link up branches)
-- Canonical's internal, secret tool
Though I've also heard Canonical people bad-mouthing cvs2svn for
failing on various tests they came up with; unfortunately, they said
they couldn't make the tests available either, so I don't know whether
cvs2svn has been fixed, or how one would go about finding the
problems.
> Anyway, I thought about reimplementing cvs_import. I would use the
> algorithms of cvs2svn to get a more or less consistent view of the cvs
> repository. Due to the nature of monotone, it should be easy to improve
> the algorithm to be able to handle subsequent imports.
The goal of the last cvs_import rewrite was basically, "use cvs2svn's
algorithm to get things right" :-). The main limitations I'm aware of
with current cvs_import are:
-- it doesn't do any branch reconstruction. This would be _really_
helpful; at the moment, this makes its branch handling almost
completely useless, because in monotone going forward you won't
be able to usefully merge between branches that aren't linked up.
The difficulty is that this is necessarily a heuristic operation,
and there are states a CVS repo can be in that are simply not
possible to meaningfully translate.
-- no incremental re-importing.
> If I'm heading for such a rewrite, what should I be aware of? Would it
> be wise to store results from different processing passes in the normal
> monotone db? This would help subsequent imports a lot, of course. On the
> other hand you then have 'non-monotone-data' in your database, which you
> probably want to delete some day.
Right now we just do things in memory, and it seems to go okay. We'd
have to look at the argument for how saving stuff somewhere persistent
would be a significant win, I guess.
> What could be different to cvs2svn?
> I.e. you don't absolutely need to sort by date. Overlapping commits in
> CVS would better be handled as two heads which later got merged again in
> monotone.
Intriguing idea! How do you work out what the merge looks like?
> My goals with this are:
> * gain speed in subsequent imports
Do you have profiling data showing where the bottlenecks are?
(Right now I'm pretty sure the bottleneck is the changeset sanity
checking, just like for initial pull. Re-imports don't have to deal
with that; maybe if we added re-import support it would already be
plenty fast.)
> * (correct) branch support
This would be _really_ good to have, and even an urgent need, as
mentioned above. It might even be the first thing you should look
at/try, as a way of familiarizing yourself with the issues involved in
CVS importing...
-- Nathaniel
--
Eternity is very long, especially towards the end.
-- Woody Allen
[Monotone-devel] Re: cvs_import rewrite, graydon hoare, 2005/12/16