info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CVS diff and unknown files.


From: Paul Sander
Subject: Re: CVS diff and unknown files.
Date: Wed, 2 Feb 2005 01:58:31 -0800


On Feb 1, 2005, at 8:16 AM, address@hidden wrote:

Paul Sander <address@hidden> writes:
On Jan 31, 2005, at 9:30 AM, address@hidden wrote:
That's not to say that we will *always* know at add time that the
commit will fail; failures can occur due to problems in their content
which are clearly not ready to check at add time.

Well, if I understand correctly, your intentionally want to have weaker
checks at add-time than at commit-time. Instead, you can do it in the
commit-time trigger by skipping some of the tests if the file in
question is new one.

Oh, so what you're saying is that rather than making CVS differentiate
between the two sets of triggers, you want the commitinfo script to do
it instead. I suppose that's doable.

Exactly! I took too much time to explain this. Should be either my or
your fault (or both).

But I'll counter with this: Why not combine commitinfo, loginfo, and
taginfo the same way? My argument is that add-time triggers differ
from commitinfo triggers in the same way that commitinfo triggers
differ from those others: They run at different times for different
purposes.

All the triggers you've mentioned have no common tasks to do. On the
other hand, add-time checks you have in mind are subset of commit-time
checks you have in mind. The subset is easily achieved by bypassing some
of checks.

Okay so you understand why it's a bad idea to overload too much functionality. I argue that combining add-time and commit-time triggers is also overloading things too much. There's the established commit-time feature that is being modified so that it now runs twice rather than once. If this were indeed done, then thousands of existing CVS admins are inconvenienced because they must rewrite their existing commitinfo triggers to bypass their current operation at add-time, and plus both the CVS authors and the administrators must work through a change of interface in an existing feature.

If you examine the implementation, you'd realize that CVS implements its trigger support pretty much in a generic fashion, basically invoking a function call with different parameters depending on the trigger. So there's really no invention going on when adding a new trigger. And it turns out that overloading an existing one is demonstrably harmful.

I would have no objection to add-time-script if there were a separate
server operation called "add-file-to-the-working-copy", or
"check-if-file-name-is-ok-for-repository", but AFAIK there is currently
no such operations. I consider inventing new server operation just to
implement some policy that you have in mind and that could be
implemented using existing functionality anyway to be an overkill,
sorry.

If you go back and read Mark's postings, he mentions that my proposal requires no changes to the client/server protocol, so it's really not inventing a new server operation in a very real sense. It changes the behavior of the server, but if at add-time you invoke the option to avoid contacting the server, then why do you care?

You need to get out of the server. The server's not the one we're
trying to make happy here. It's the user that matters, and the tools
must bend to the users, not the other way around.

From an implementation standpoint, there's no difference between the
triggers, other than the conditions in which they fire and the files
in which they're configured. Thus the argument you make in the above
paragraph simply makes no sense.

It's you who needs to get out of the server, not me. You propose server
changes, not me. If the functionality has nothing to do with the server,
then get out of the server indeed.

Yes, you do propose server changes: You propose invoking an existing trigger in a second place. I argue that implementing a second one performs the same function that you propose, but in a cleaner way that preserves existing practice.

If I do in fact understand you to be recommending that commitinfo be
used at add time, then I must disagree. Triggers used at commit time
to check the content of files (like Greg's RCS Id trigger, for
example) are not appropriate to use at add time. Things like checking
that the user has the right to add a file, or checking that the name
of the file complies with policy, are legitimate to check at add time. They can also be checked at commit time, and the current proposal does
that because the add-time triggers are optional.

So the only thing you need to know in the commitinfo is what user is
trying to do and decide what checks are required based on this
information. If commitinfo currently doesn't have all the required
information, maybe it's better to fix it instead of inventing yet
another hack in the form of add-time triggers?

What you're really proposing is an alternative implementation for add-time
triggers. Think about it.

It's how it looks from your point of view. From my point of view I
propose alternative implementation of the functionality you need without
inventing anything new, being add-time triggers or something else.

What you're recommending is to overload something in a way that is not appropriate.

What you suggest is yet another reincarnation of commit-time triggers
"run at different time" where "time" is defined in terms of operations
on the working copy. Think about it.

I have thought about. Apparently so has Greg. Something you should understand is that he and I have some history of adversity in this forum. You may have noticed that he and I both agree that overloading commit-time triggers in the way you describe is a bad idea. That really means something.

If users think the wrappers are in their way, they'll drop down to
CVS to add the file, causing more significant breakage later.

Either trust your users or arrange things so that it won't be possible
for them to invoke cvs commands directly. But even if you choose the
latter way, those that do think the wrappers are in their way will
probably find a way to bypass them anyway.

First, I've already related a horror in which users were trusted and
failed. This is why we need safeguards on the circumstances in which
they can perform certain actions.

They will fail no matter what you do then.

Second, if a wrapper invokes CVS, then there's no way to effectively
hide CVS from the users.

You like to bring wrong statements into discussion very much. If you
don't know a way, it doesn't necessarily mean there is none.

Alright, I know of two ways on Unix to effectively isolate users from parts of the filesystem: Closed directory permissions and chroot jails. Giving users limited access to applications involves hiding the entire application behind these walls and opening holes via setid wrappers or writing a new client/server application to wrap the existing application. These solutions are expensive and easy to mess up. Adding a new trigger to CVS is far simpler, it's generally applicable, shops that don't need it need not use it, and in this case clients can opt out.

This is why policy must be enforced below the level of the CVS command
line.

CVS command line tools are no more than wrappers on top of CVS
client/server protocol. They can't enforce anything that the protocol
itself can't enforce. In turn, the protocol can only enforce what
clients can or can't do *to the server*.

That is all true.

You seem to be thinking that in the client/server model the server is a
boss. The reality is closer to the opposite. In fact it is client who is
boss. It commands what server should do, and then servers does (or
doesn't) what the boss have told him to do. There is no way for server
to force client to do (or not to do) something.

Nope, it's a negotiated effort. The server can refuse to cooperate as long as the client makes unreasonable requests. That's the basis of policy enforcement: In the end, the server really is in control because it can always say "no".

In the case of the add-time trigger, it's really a request for a sanity check in which the server can inform the client that it notices an approaching problem. It's built into the semantics of my proposed solution that the CVS client can either heed the warning (by refusing to record an addition in the Entries file) or ignore the warning (by opting out of making the request). Commit time is when the server decides whether or not to cooperate, based in part on whether or not the client chose to ignore an earlier warning.

Beyond that, there is absolutely no implication that the server has more control over the user's workspace than what I have described in the past two paragraphs.

It's at that time that they finally learn the value of the process.

If somebody prefers to learn on his own mistakes, -- let him do it
unless it doesn't break others work.

Do you really mean, "let him do it unless it breaks others work"? How
do you prevent them from making mistakes that harm others?

By not allowing him to break the repository, obviously. That's the goal
of server-side policies, -- prevent users from breaking others work. You just can't prevent them from doing their mistakes in their working copy,
for example, one can run 'rm -rf' in his working copy after two days of
active development.

Okay, consider my horror story in which the user essentially did a "cvs rm *" recursively from the top of the project, followed by a commit. Technically, the repository wasn't broken, and the action was (eventually) reversible. On the other hand, hiding the entire source base from the rest of the team wasn't the right thing to do because it had a real and serious negative effect on the whole project. In our case, it took hundreds of man-hours to recover. TWICE!!! It's this kind of stupid, foreseeable, and expensive mistake that simple automated policies are good at avoiding.

They won't know which mistakes are harmful until they try them.

Yes, they will learn anyway as soon as they try to commit their changes,
i.e., as soon as they try to break others work.

That's fine, but you've forgotten my argument that there are classes of mistakes that can be caught early, while they're still limited to the user's workspace. And if they are in fact caught at that time, they give the user the opportunity to save the user many hours of work. This is the kind of thing that I'm advocating so strongly in this thread.

That's exactly the point of enforcing policy: It allows the entire
project to learn from the mistakes of others so that when
inexperienced users repeat past mistakes, everybody is protected.

You can't force nobody to learn unless she wishes to learn. You can't
protect user from himself. Either the user wishes to know if she adheres
to policies, in which case she will run whatever commands you tell him
early, or the user doesn't wish to know, in which case you has very
little to do about it.

Actually, you can protect the user from himself in a limited way. (It may be that you can't stop someone from shooting himself in the foot, but you can give him bulletproof shoes, for example. You can also give him an empty gun, but that doesn't help anyone if he's the tribe's hunter.) And you can protect the project from people who don't adhere to policy for whatever reasons.

Tell me something. Do you consider policies to be frivolous exercises by power mongers designed to get in your way for arbitrary reasons, or do you consider policies to be well-considered tools designed to aid the completion of the project more efficiently? What is your experience to back up you belief?

BTW, the definition of "client/server" implies the presence of a
network for communication between the two parts,

Wrong. Client and server can work on the same computer without any
network. Moreover, client/server idiom is heavily used inside
"monolithic" programs as well. Basically, client/server only means that
there is some "server" that makes some operations on behalf of
"clients", how clients communicate with the server is a secondary issue.

While your statement is true, the fact is that client and server
programs typically operate on different machines. The fact that they
do is what gives the client/server paradigm its power. Sure, I can
write client/server programs that use loopback interfaces, Unix domain
sockets, or even named pipes, but they offer no benefit because
function calls are more efficient RPC calls.

Don't you aware that client/server paradigm is widely used inside
"monolithic" programs as well? There are always things left to be
learned in this world.

Frankly, I don't care.  And it's not relevant to the topic at hand.

No, it's very relevant to the topic at hand. It's your ignorance of the
true meaning of the client/server model that makes you believe
everything is fine with your proposed design. The client/server model is
mostly about splitting of responsibilities, not about media through
which client and server communicate.

Indeed it is about splitting responsibilities. But in the context of these few paragraphs, the client/server paradigm is but one means to accomplish a goal of dividing and conquering a problem. The same can be done with well-designed APIs in a monolithic application, and in fact there have been times when such APIs were reimplemented, converted from simple function calls to RPC calls. These facts are still irrelevant to the discussion of add-time triggers, because the argument is still relevant to CVS' local mode, which is not a client/server implementation.

Well, you've asked "why would you possibly want to deliberately
proceed toward a dead end" and I tried to provide an example when
the end is not actually dead. The problem is that unless I add the
files, 'cvs diff' doesn't work on them, and if your proposal is
implemented, and I bypass the add-time checks, I have no way to
repeat them later without committing the files.

Okay, so when you say "I need the files to be added to the working
copy anyway" what you really mean is that you an entry in the CVS
metadata so that you can do other stuff that won't work without it.
Fine, my proposed add-time trigger implementation allows for this.

*How do I repeat skipped checks later with your proposal?* I already
tired asking this same question again and again just to get no answer.

Have you not been reading my replies?

Answer 1: If you don't ignore the error condition (i.e. you care about why "cvs add" fails) then correct it and repeat the attempt to the file.

Answer 2: If you do ignore the error condition (by telling "cvs add" to act locally), then attempt to commit the file. The add-time checks will be redone, followed by a more comprehensive set of checks.

Now I ask you:  Why is this unreasonable?

Well, this time I've found an answer below, the answer is "run 'cvs -n
commit'", but it doesn't fit into your model. That's what I'm
advocating, -- you don't in fact need anything else but "cvs -n commit"
to let user know ASAP about possible problems.

I've stated on at least two occasions related to this thread that after the work is done, it doesn't really matter how you get there. But the add-time checks influence the journey. If you happen to be in a shop where policies are all voluntary to the point that you must ask if you're in compliance, fine. Some of us, and not just myself, think that policies are compulsory, and therefore the tests should run automatically. Therein lies the reason why "cvs -n commit" alone is not sufficient.

Here's another potential benefit: Most projects have some criteria
that submitted patches must have before they will be accepted. These
criteria might include naming conventions (e.g. all files named in
lower-case). If you intend to send a patch to the project, there's a
greater probability that it will be accepted the first time if you
subject yourself to some of their policies.

Then I need full commit-time checks, not your limited add-time checks.
Better spend your brain forces on improvements of the former than on
invention of the latter.

Full commit-time checks are certainly needed. But as I keep stating, more limited add-time checks are also strongly desirable. The fact that *you* don't need, want, or understand them does not mean that I should not have them. You can always leave them turned off.

This is a complementary view of my proposal: Have a set of triggers
that runs at add and commit times, and another set that runs only at
commit time. It so happens that the second set is already implemented.

There is actually no sense to make them different if you indeed insist
on informing users about problems ASAP.

Invoking commitinfo and passing an argument to distinguish between add
time and commit time, which seems to be what you have in mind, is an
alternative implementation.

Yes that's what I have in mind.

I think mine is better, and it certainly has fewer backward
compatibility issues.

Backward-compatible wrong design is not necessarily better than a sane
one even if it's not backward-compatible, though I think there is still
a way to keep backward compatibility in my case either.

Most of us in this forum seem to think that the existing design is a sane one. I happen to think that the implementation is poor, but that's a different argument. Either way, neither can be changed without significant inconvenience to the existing community. This is why I don't recommend changing commitinfo in any way.

Ah, so the sole purpose of invention of add-time triggers is in fact an
attempt to don't break compatibility with the current commit-time
scripts, right? If so, it's better to explicitly state it to avoid
misunderstanding similar to mine.

I thought that was exposed in an earlier message. Sorry for not making it
clear.

No, there was no even such a word, "compatibility" in your earlier
messages.

If I propose a way to implement my variant and make it
backward-compatible, will you agree that mine is better? My
understanding is that no, you won't. So compatibility is not in fact the
main point of our disagreement.

Well, we won't know until you suggest it, will we?  :-)

On the other hand you already know my opinion of overloading functions, which is that doing so is usually bad unless there are *very* compelling reasons to do so. If your recommendation involves overloading commit-time triggers then you may be right. Still, it won't hurt to try.

Seriously, think about what you're saying. You want to do the following,

dramatically oversimplied:

edit foo.h
cvs add foo.h
edit foo.h
cc FOO.C
cvs commit foo.h

It's at this point you want to fail, after you've done all the
work. In my opinion it's better all around you fail after two
steps, not five.

You've missed my point. I have nothing against step2 warns me about the problems, maybe even by default, if I still have ability to suppress the
warnings. I'm strongly against step2 to refuse to add the file as I
believe it would just disturb instead of help: if step2 refuses to add the file, it doesn't prevent me from doing steps 3 and 4. In fact it even doesn't prevent me from invoking 'cvs commit foo.h', but instead of getting comprehensive explanation from the server why my change isn't
accepted, I'll get less useful message "run cvs add first".

Are you saying that if step 2 fails, you would proceed with steps 3
and 4 and maybe attempt 5 without correcting the failure condition?

I said what I said, your suggested _failure_ to add the file to the
working copy doesn't prevent me from doing the rest of steps. It means
that it's not any better than _warning_. And I even showed why it is
slightly worse.

Well, what do you expect if you ignore the warning in step 2?

With your proposal? I expect your "enforced policy" to somehow prevent me
from continuing my way to the "dead end". You claimed you will be able
to enforce your favorite policy on my working copy. You failed.

Under the condition that you ignore the warning in step 2, I never, ever suggested that I could prevent you from wandering down the dead end. Remember, the whole point behind being able to opt out of performing the add-time check is to empower the user to do things that he may need to redo later.

Sure, you can change your behavior and move the add down to right
before the commit, and that is consistent with the working styles
of many people, but then they get what they ask for.

So even with you proposals implemented you fail to actually impose
policies you are trying to impose with the proposals, -- too bad.

That's another thing I'm trying to explain, -- you have no way to
actually impose policies on the client side. Thinking otherwise is no
more than self-delusion.

Well, that's what I get for pandering to the crowd that thinks that
they simply must be able to add a file while working offline. If an
add-time connection to the server were a requirement (i.e. there were
no feature to opt-out of add-time triggers) then this wouldn't be a
problem. Don't blame me for trying to meet your requirements.

I don't. I blame you for the failure to meet your own requirements. Even
if you make invocation of triggers by "cvs add" an absolute must, the
users will simply don't run "cvs add" for days until they need to commit
their changes due to your own expectations of your users behavior.

If they're breaching policy in their file additions, then because the
changes will ultimately be rejected, it's not in their best interest
to delay the discovery of the violations.

If it's not in their interest then warning is more appropriate than
failure to complete operation, I believe. My whole point here is that
failure to add the file doesn't prevent your users from misbehavior any
more than a warning, so why failure?

Because in my experience, users don't heed warnings. Whenever I need to get their attention, I need to hit them over head with a failure. Case in point: When's the last time you shipped a large project that compiled completely without warnings? How long did it take you to tire of reading them and start to ignore them? At least with the opt-out method, you're forced to at least review the condition and perform some discrete action to ignore it.

Also, from a standpoint of process automation, I want the failure as early as possible because I can either stop early and let a human correct the problem and re-run a smaller amount of lost work, or because I can make decisions based on the failure and adjust the control path of my process.

and running commit-time triggers at add-time is not appropriate.

Why? Because it doesn't match the policy you have in mind? Then change you commit-time triggers so that they don't do some tests at add-time.

No, it's not because the triggers don't match the policy that I have
in mind. It's because there are certain tests that simply are not
appropriate to perform at add time. Scanning the contents of the new
files, for example, are not appropriate.

No, it is appropriate. You said yourself many times that you need to
check everything ASAP. If the file has contents, why don't you check for
its validity?

No, I never said "you need to check everything ASAP". I said "you need to perform appropriate checks ASAP". There's a big difference.

At add-time, chances are very high that the file's contents simply aren't ready for review. That's why I don't check its validity. To assume otherwise would be to require the user to defer running "cvs add" until right before running "cvs commit". While that is the work habit of many users, it's not the general case, and I won't intrude into their style in quite that way.

On the other hand, checking certain attributes of the file, like its name, is reasonable to do at add-time. This is why the checks performed at add time are necessarily a subset of those performed at commit time.

Like I said in another post, after the commit has completed, the
effects of the two implementations are identical.

What we're arguing about is what's the right way to make the journey
to that point. My way has merit.

Sorry, but the only real merit I guessed is backwards compatibility with
current commit scripts. Anything else?

Backward compatibility is a good one.

My solution could be made backward-compatible as well.

The scripts will be easier to write if the two sets are kept separate
and invoked separately, also.

No, they will be easier to write only when an administrator will try to
implement a policy similar to those you have in your mind where the
checks are different. As I believe the checks must be the same, I in
fact like it very much that implementing inherently broken policies is
slightly more difficult, -- it will force admins to think twice before
actually doing that.

Hmmmm... Let's see. Suppose for a moment that we implement add-time checks according to my proposal. We now have a set of checks that run both at add-time and at commit-time. Administrators working in your style have a method to do that, too.

Mark also mentioned some good ones.

Sorry, I've somehow missed that. Care to repeat or give a reference?

Look in the info-cvs archives for the following message IDs: address@hidden and address@hidden

You've admitted that you're not familiar with the CVS design.
Please don't argue issues of complexity in a vacuum.

Yes, I'm not familiar with the CVS design, hopefully though it's not
that broken that introducing additional feature is less complex than
tuning an existing one to meet new requirements.

The two implementations are equally complex.

Resulting application complexity is what actually matters in the long
term, not the complexity of turning current implementation to the new
one.

Check the code yourself and consider how you might implement both features. I believe both implementations are equally complex to build and that neither implementation significantly increases the overall complexity of the application. However, I believe that your method increases the complexity of the administration tasks slightly, for the reason that conditional actions within scripts invoked by commitinfo will be more difficult to write. My method increases administrative tasks even more slightly because the administrator must decide which one of two files is the right one in which to register the trigger scripts, but the scripts themselves are simpler.

And that's one of problems with your proposed add-time triggers
solution. The user needs a way to invoke later the checks "cvs add"
didn't do in this case. That's the scenario where user needs separate
"add file to working copy" and "check file for validity against
repository" actions. Every time I ask you how do I repeat the add-time
check, you don't give satisfactory answer.
[...]
I want to know how do I invoke the checks that have been skipped during
off-line work when I go on-line?

cvs [-n] commit

Cool! So that's what "cvs add" in fact didn't do when I worked off-line?
This fits perfectly into my model! That's exactly my point, -- if any,
"cvs add" should perform the same checks "cvs -n commit" does!

Whoa! You're missing something important. The moment a new file has been successfully registered with "cvs add", conditions change. Repeating the same set of add-time checks alone is no longer appropriate. Assuming you don't abandon your work, the next logical step is to commit the work. Therefore, the work becomes subject to the more stringent commit-time checks. The commit-time checks are a superset of the add-time checks because the user is given the option to successfully register a new file while defeating the add-time checks.

Well, suppose you are designer and I'm a user then. I, the user, ask
you, the designer, to explain me why do you think I, the user, will
never need plain and simple "add new file to the working copy" user
operation.

The answer is: You can add anything you want to your workspace, but if
you intend to commit it then you must comply with the design. And I
will tell you at the moment you declare your intent if I think you're
not in compliance.

Well, what is the command to just add the file to my working copy
without intent to commit it? Please don't tell me there is one as then
you agree that "add new file to the working copy" is a useful user
operation. Please don't tell me there is none as I need it indeed.

I have never disputed that "add new file to the working copy" is a
useful operation.

Really?! What then did you mean here when you've answered YESSSSS here:

/begin quote
me> Ah, now I see. I suggest "add new file to the working copy" to be a
me> useful user operation, and you believe it is not? So the minimum
me> semantics of "cvs add" you agree with is something like "add new file to me> the working copy but only after you make sure the file path is OK with
me> respect to the repository".

you> YESSSSS!!!!
/end quote

Don't you even care to understand questions before answering, or you've
already changed your opinion?

In my usage model, I am in 100% agreement that the following is a useful operation: "add new file to the working copy but only after the add-time triggers completed successfully"

In the statements above, my "make sure the file path is OK with respect to the repository" is a special case of your "add-time triggers completed successfully".

I have also been convinced that some users, other than myself, might have legitimate reason turn off add-time triggers at the moment that "cvs add" is invoked. That means I can accept that "add new file to the working copy even under failure conditions detected by add-time triggers" is a valid requirement. I can get over it.

--
Paul Sander | "Lets stick to the new mistakes and get rid of the old
address@hidden | ones" -- William Brown





reply via email to

[Prev in Thread] Current Thread [Next in Thread]