info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: renaming under CVS


From: Paul Sander
Subject: Re: renaming under CVS
Date: Thu, 7 Mar 2002 18:35:26 -0800

>--- Forwarded mail from address@hidden

>--- Paul Sander <address@hidden> wrote:
>> >> Permissions on a directory basis are tough if
>> files
>> >> are linked to
>> >> multiple directories that have different
>> >> permissions.
>> 
>> >I am satisfied with how permissioning is done now
>> (ie
>> >per-directory).
>> 
>> Okay, but this will become a problem depending on
>> how the containers
>> are stored.  If the files move but the containers
>> don't, then setting
>> directory-wide permissions properly will become
>> difficult.

>It just occurred to me that if the archive files
>didn't live within the repo directory (as I had
>intended), then the permissions on the archive files
>would be extremely important (and probably impossible
>to manage).  I'll have to rethink this, but as of now,
>it looks like the archive file will need to live
>within the repo directory and that a repo-wide archive
>location mapping will be needed.

>> Example:
>> 
>> Begin with contents of directory /a/b, namely, files
>> c and d.  They
>> have particular permissions.  Now move file c do
>> directory /e/f.
>> Suppose that directory has a different group
>> ownership.  How are the
>> permissions of file c affected by the move, and will
>> the result be
>> what the user expects?

>The only archive file permissions that matter are the
>read and execute permissions.  The execute permissions
>are inherited by the archived file.  Those that need
>to work with the archive file (ie checkin, checout,
>...) will need read permissions.

Keep in mind that "permissions" include not only the file's mode,
but also its user and group ownerships, plus whatever additional
mechanism (e.g. ACLs) are provided by the operating system.  As
files propagate around the repository then the permissions will
have to behave in ways that the user expects that are not necessarily
easy to implement.

>Assuming that a file is moved from one directory to
>another with a different group, and the SGID bit is
>set on the repo directory, the archive file will be
>moved adopting the new group.  An update to the
>repo-wide archive location mapping will also be
>necessary.

How do you expect this to affect retrieval by tag or datestamp?
The person performing that task may not be a member of the new
group.

>> Now create directories /a/g and /e/g, where "g" is
>> really the same
>> directory, but shared between two projects.  Move
>> file d into directory
>> g.  It should appear in both projects, but what are
>> its permissions?

>Sharing of directories will no longer be supported
>since "complex" module definitions will no longer be
>supported.

What about this case?

mod-a a
mod-b a/b

Anyway, disallowing this type of sharing and that in the example
above is not acceptable to me.  Code reuse by sharing source code
isn't going away any time soon (no matter that the buildmeisters
want it to), and the version control system can't get in the way
of that.

Also note that retrieval by tag or datestamp is required to retain
the old shape of the tree.  Making that work means that a single
RCS file maps to multiple locations in workspaces anyway, so the
kind of sharing that I demand should come for free.

>> My concern here is that what we today think of as
>> modules would never
>> disappear without substantial effort.  Consequently,
>> "cvs rm" shouldn't
>> be permitted in the top level.

>I still don't understand.  Can you give an example?

My thought had been that if a project reaches the end of its life,
should a user be permitted to delete its definition from the top level?
Re-thinking this, I don't think it matters.  It's still retrievable by
tag or datestamp, and the effect is no different from moving the project
into a new directory and removing it from there.

The question remaining is how to query the system for existing modules,
because the ones contained at the top level of the repository are really
just a subset of the total possible.  There's also the issue of ambiguity
of names, because a project might be removed and a new one created with
the same name.

>> >> Also, I was considering using a special container
>> >> name of "0" to locate
>> >> the top-level definitions.
>> 
>> >"0" (well, more likely 64 0's) sounds good to me. 
>> >There also needs to be a way to add/remove from
>> this
>> >top-level list.  What do you think of switches to
>> >"add" and "rm"?
>> 
>> I figured it would be treated like any other
>> directory, with the
>> exception that "rm" would be disallowed.  I have no
>> problem with
>> people adding new projects, but there should be a
>> trigger on "add"
>> to enforce local policies.

>I think the trigger architecture needs an overhaul and
>since the existing one can be used to enforce "add"
>policies, I'd rather concentrate on the meaty issues
>at hand.

>> >Also, I'm not a fan of completely wiping out
>> archive
>> >files, but I can see a need for it.  This also
>> needs
>> >more consideration.
>> 
>> I'm also happy to leave them there, though there is
>> an argument to
>> moving them to a lost+found area if no directory
>> version references
>> them.

>I'm not sure if this is completely true.  I think the
>Attic, especially when coupled with the
>directory-specific mapping, can serve this purpose. 
>Of course, I could be wrong, but I think any issues
>that do arise won't be too severe.

Let's say a user does a "cvs add f; cvs commit f".  Obviously, this
commits the initial contents of f to the repository.  But does it
also commit the addition of f to the parent directory, or is a separate
"cvs commit ." needed?  If the latter, what happens to the RCS file
if the sandbox is released before the directory is committed?

Now suppose that the "cvs add f" is really resurrecting a file, or linking
(sharing) a file from another part of the repository.  This introduces a
condition where the contents of the file are already up to date, but
contents of the directory have been modified.  What should "cvs commit f"
do, and is a separate "cvs commit ." required?

>> >I do know that CC uses "ln" to resurrect files.  I
>> >never really liked this (since it's not so
>> intuitive),
>> >but this need still exists so I'll try to find some
>> >other way to address it.
>> 
>> Regardless of whether or not the capability is
>> available on the
>> command line, the mechanism is needed for the
>> directory merge
>> algorithm.  Consider also that "cvs ln" need not
>> necessarily
>> equate to a Unix "ln"; it's merely a means to attach
>> a name to
>> a container in a new location of a user's sandbox. 
>> At this early
>> stage, I believe that the implementation will be
>> closer to a text
>> file edit (that represents the parent directory)
>> followed by a
>> "cvs update" on the newly attached file.  But that's
>> just a thought,
>> not a design.

>I was thinking that the users' sandboxes would have
>the filename mapping within its CVS/Entries files.

>For example, let's say file asdf.cc is mapped to
>01ef,v (I'll use four nibble archive names to save on
>bandwidth).  CVS/Entries will store this information.

>The client will look up the archive name from
>CVS/Entries and will use only that name when
>communicating with the server.

>The server will then lookup the location of that
>archive within the repo using the repo-wide archive
>location mapping.  Everything else should work as it
>does now, more or less.

That will probably work, and it would seem to solve the evil
twin problem well enough.

>> I believe than any implementation of "cvs mv" would
>> require an
>> overhaul of CVS' locking protocol if it were to
>> perform well.
>> Once you have directories pulling in containers from
>> many locations
>> in the repository, or if you pull all of the
>> containers into generic
>> places, then the existing locking mechanism locks
>> more than you
>> need.  And depending on the amount of trouble you're
>> willing to
>> put into locking, you may find that you want locks
>> to cover a
>> different scope than the existing mechanism
>> provides.  You've
>> already suggested a repository-wide or module-wide
>> locking system.
>> I don't believe a repository-wide system is a good
>> idea (locks way
>> too much at one time), and I don't believe a
>> module-wide locking
>> system can be implemented, considering the renaming
>> example I gave
>> above in a different context.
>> 
>> In any case, I believe that adding the ability to
>> rename files
>> will involve very intrusive changes to the software,
>> even at the
>> design level.  That means that calling it a "patch"
>> is probably
>> optimistic.

>Let's work on making this work before we spend energy
>trying to make it work efficiently.

>My guess is that a repo-wide lock isn't that bad and
>might actually speed up performance since the creation
>of locks right now involves the creation of one (for
>write locks) or two (for read locks) file system
>elements per directory being locked.  Having a
>repo-wide lock will be more efficient in cases where
>archives are being updated in more than one directory.
> It'll be as efficient when only one directory is
>being affected.

On a single-user basis, this is true.  But by granting exclusive
locks on the entire repository to a single user at any given time
will reduce throughput a lot.  Some kind of concurrent locking
is necessary, but for demonstration purposes (i.e. the first hack)
a repository-level lock should be sufficient.

>The repo-wide lock will only be too large a lock
>granularity when many files are being accessed.  This
>occurs mostly during checkout and update.  I don't
>know about you, but the longest checkouts I've had
>weren't so long that it would inconvenience someone
>trying to checkin.  Updates lasted much shorter
>timespans.

I've worked on projects where checkouts took significant amounts
of time (greater than 30 minutes).  If the repository contains
multiple projects of this size that have different schedules, it's
quite likely that developers will try to commit to one project
while another is checking out.  That will introduce unnecessary
delays.

>In any case, I never liked the idea that a checkout
>can give you part of what's currently being checked
>in.  I think this is even more of a bother during
>large checkouts since the likelihood of getting a
>partial checkin is even greater.

Yeah, that's why I propose a system in which checkouts are done
after the versions are identified (either by tag, version number,
or branch/timestamp pair).  That precludes the need for read locks
altogether.

>> >> That said, I've come up with a per-file based
>> >> locking mechanism that might
>> >> work (but it's inefficient because it's
>> filesystem
>> >> based).  It involves
>> >> creating a hard link to an RCS file when we want
>> to
>> >> commit a change, use RCS
>> >> on the link to record changes to the container,
>> then
>> >> rename the updated RCS
>> >> file back to the original place as the commit
>> >> completes.
>> 
>> >I have a couple of problems with this:
>> >1. Hard links aren't portable.
>> >2. I've crashed OS's with simultaneous hard links.
>> 
>> Fair enough.  The hard link thing is a workaround
>> for RCS' lack
>> of a two-phase commit hook.  They can be eliminated
>> by adding
>> to RCS the ability to leave the ,*, file and
>> renaming it back to
>> the *,v file at a later time using the existing RCS
>> mechanism.
>> Would that be satisfactory?

>I'm not sure if "mv" is supposed to be atomic, but we
>can discuss this on another thread.

I'm inclined to think that "mv" should be (it just updates a sandbox,
after all), but that the commits that follow make the entire rename
process not be atomic.

>--- End of forwarded message from address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]