info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: renaming under CVS


From: Paul Sander
Subject: Re: renaming under CVS
Date: Tue, 5 Mar 2002 01:36:12 -0800

>--- Forwarded mail from address@hidden

>--- Paul Sander <address@hidden> wrote:
>> >--- Forwarded mail from address@hidden
>> >Each file and directory are mapped to a ,v archive
>> >file.  The contents of the directory archive files
>> are
>> >the mappings of its elements and the types (eg file
>> or
>> >directory) of those elements.  The basenames of the
>> >archive files will be hex representations of random
>> >256-bit numbers generated with a "secure" version
>> of
>> >the Mersenne Twister algorithm.
>> 
>> Why not just sequentially number the containers?  Or
>> use a timestamp
>> plus random element to name them?

>Sequential numbers have drawbacks.  The computation
>slows down unless you save state.  If you save state,
>the state can get screwed up.

>I'll consider using a timestamp portion in the archive
>name.

Cool.  I'll do the same.  Just keep in mind that there will be
other new state, and this is the least of our problems.

>> >Locking will occur on a per-repository basis. 
>> >Permissions can still be done on a per-directory
>> >basis.
>> 
>> Permissions on a directory basis are tough if files
>> are linked to
>> multiple directories that have different
>> permissions.

>I am satisfied with how permissioning is done now (ie
>per-directory).

Okay, but this will become a problem depending on how the containers
are stored.  If the files move but the containers don't, then setting
directory-wide permissions properly will become difficult.

Example:

Begin with contents of directory /a/b, namely, files c and d.  They
have particular permissions.  Now move file c do directory /e/f.
Suppose that directory has a different group ownership.  How are the
permissions of file c affected by the move, and will the result be
what the user expects?

Now create directories /a/g and /e/g, where "g" is really the same
directory, but shared between two projects.  Move file d into directory
g.  It should appear in both projects, but what are its permissions?

>> >Since the repository structure will no longer be
>> >directory-based, module definitions like "module
>> >path/to/module" won't be supported.
>> 
>> Correct, module definitions become implicit in the
>> directory mappings.
>> However, there's still a gotcha at the top level. 
>> The things you give
>> as arguments to the "cvs checkout" command need to
>> be treated specially
>> in some way so that they can be located correctly. 

>Yes, there'll need to be a repository-level mapping. 
>In a way, the repository is already considered to be a
>module since you can "cvs co .".  Or am I mistaken?

>> Limiting operations
>> to adds and renames (without replacement if the
>> target already exists)
>> is a start when considering this.

>I don't understand, can you elaborate?

My concern here is that what we today think of as modules would never
disappear without substantial effort.  Consequently, "cvs rm" shouldn't
be permitted in the top level.

There are arguments both ways, of course, but enforcing this makes it
a little harder to disconnect large portions of the repository and
inconveniencing the user by forcing a journey through history to find
the parts than aren't often used any more.

>> Also, I was considering using a special container
>> name of "0" to locate
>> the top-level definitions.

>"0" (well, more likely 64 0's) sounds good to me. 
>There also needs to be a way to add/remove from this
>top-level list.  What do you think of switches to
>"add" and "rm"?

I figured it would be treated like any other directory, with the
exception that "rm" would be disallowed.  I have no problem with
people adding new projects, but there should be a trigger on "add"
to enforce local policies.

>Also, I'm not a fan of completely wiping out archive
>files, but I can see a need for it.  This also needs
>more consideration.

I'm also happy to leave them there, though there is an argument to
moving them to a lost+found area if no directory version references
them.

>> >I think a transition from an old repository to the
>> >above shouldn't be too bad assuming people don't
>> have
>> >complicated module definitions.  For those with a
>> >complicated module definitions, a switch could be
>> >provided to use the old style (the default would be
>> to
>> >support backward compatibility).  A tool can also
>> be
>> >provided to convert the old repo into a new repo.
>> 
>> I think that mapping the modules database to the new
>> structure is the
>> easier of two problems faced when converting.  The
>> other is mapping
>> the existing directory-based mapping to the new one,
>> considering dead
>> and resurected revisions and so on.

>Yes, I'll probably attack this in the second phase
>since more will be known to me at that time.

>> >Old clients will still work on new servers but
>> since
>> >the mappings will be done by the server, they'll be
>> >slower than new clients.  New clients will store
>> the
>> >mappings within the CVS directories.  This implies
>> >that the CS protocol will need to be extended in
>> such
>> >a way that a new server will recognize a new
>> client. 
>> >If the client can query the server for its version,
>> >new clients can also work with old servers.
>> 
>> I don't really know enough about the protocol to
>> comment on this, but I
>> suspect that the current mapping is somehow implicit
>> in its implementation.
>> I would assume that the client/server protocol would
>> need to be redesigned
>> as well, thus making current clients incompatible
>> with new servers.

>The details of this will surface during development.

>> >The command "cvs mv" will be added.  Upon checkin,
>> a
>> >mv command will checkin a new version of the
>> archive
>> >file(s) of the affected directory(ies).
>> 
>> A "cvs ln" is needed as well, to copy CVS meta-data
>> from one project
>> to another for when artifacts become shared.  A
>> variant might also be
>> needed that accepts container names and creates the
>> proper mapping for
>> the sandbox.

>I'll have to think about thus one.  I'm inclined to
>say, "one thing at a time".

>I do know that CC uses "ln" to resurrect files.  I
>never really liked this (since it's not so intuitive),
>but this need still exists so I'll try to find some
>other way to address it.

Regardless of whether or not the capability is available on the
command line, the mechanism is needed for the directory merge
algorithm.  Consider also that "cvs ln" need not necessarily
equate to a Unix "ln"; it's merely a means to attach a name to
a container in a new location of a user's sandbox.  At this early
stage, I believe that the implementation will be closer to a text
file edit (that represents the parent directory) followed by a
"cvs update" on the newly attached file.  But that's just a thought,
not a design.

>> I've been considering a few issues with regard to a
>> new implementation.
>> First, it's not necessary to lock the RCS files at
>> all for read-only
>> operations if version numbers are known beforehand,
>> or some other means
>> of identification is available (e.g.
>> branch/timestamp pair).  It might be
>> possible to implement a lock-free mechanism to
>> control access to the
>> repository.

>I'm not sure I understand what you mean by this, but
>it doesn't sound like it belongs on a "cvs mv" patch.

I believe than any implementation of "cvs mv" would require an
overhaul of CVS' locking protocol if it were to perform well.
Once you have directories pulling in containers from many locations
in the repository, or if you pull all of the containers into generic
places, then the existing locking mechanism locks more than you
need.  And depending on the amount of trouble you're willing to
put into locking, you may find that you want locks to cover a
different scope than the existing mechanism provides.  You've
already suggested a repository-wide or module-wide locking system.
I don't believe a repository-wide system is a good idea (locks way
too much at one time), and I don't believe a module-wide locking
system can be implemented, considering the renaming example I gave
above in a different context.

In any case, I believe that adding the ability to rename files
will involve very intrusive changes to the software, even at the
design level.  That means that calling it a "patch" is probably
optimistic.

>> That said, I've come up with a per-file based
>> locking mechanism that might
>> work (but it's inefficient because it's filesystem
>> based).  It involves
>> creating a hard link to an RCS file when we want to
>> commit a change, use RCS
>> on the link to record changes to the container, then
>> rename the updated RCS
>> file back to the original place as the commit
>> completes.

>I have a couple of problems with this:
>1. Hard links aren't portable.
>2. I've crashed OS's with simultaneous hard links.

Fair enough.  The hard link thing is a workaround for RCS' lack
of a two-phase commit hook.  They can be eliminated by adding
to RCS the ability to leave the ,*, file and renaming it back to
the *,v file at a later time using the existing RCS mechanism.
Would that be satisfactory?

>> This is essentially a two-phase commit
>> implementation, which has the
>> potential to make commits truly atomic.  (This
>> actually applies to all
>> changes to RCS files, including tagging!)  What's
>> missing is a transaction
>> log that records all of the affected RCS files and
>> crash recovery tool that
>> either removes or renames the linked RCS files
>> depending on how far down the
>> commit path someone got at the time of the crash. 
>> But that's easy to
>> implement as well.
>>
>> The down side is that for each file affected,
>> there's a requirement of
>> three times the size of the RCS file per file, plus
>> double the aggregate
>> of all of the RCS files updated.
>> 
>> Another annoyance is that things like "cvs log" that
>> operate on sets of
>> revisions may produce unwanted results, particularly
>> if they run concurrently
>> with transactions that later abort.  But I think
>> that this problem can be
>> solved as well.

>I'd rather discuss "cvs mv" at this point (at least on
>this thread).

Fair enough.  I just wanted point out in advance that there will be
some usability issues.

>> I dug up Dick Grune's third release of his original
>> CVS implementation and
>> will try some experiments as time permits.  It's a
>> bit of a shock going back
>> to one's roots like that, and there's not a lot of
>> similarity between it and
>> what we now know as CVS.  But it's a good place to
>> start tinkering.

>Why not use the latest dev snapshot?

There are so many broken features in the existing implementation
that I wanted to start off with something simpler.  It gives the
option of being totally ruthless with the design while testing
various models in a language that lends itself to rapid development
of a decent prototype.  Once the design and user model are hashed
out then it will be worthwhile to see what's salvageable from the
C implementation.  A lot of it will be (and a lot of it not), but for
the sake of this it seems more expedient to ignore it temporarily.

>--- End of forwarded message from address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]