[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

From: Paul Sander
Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)
Date: Tue, 29 Jun 2004 18:50:32 -0700

>--- Forwarded mail from address@hidden

>Paul Sander <address@hidden> writes:

>> >--- Forwarded mail from address@hidden
>> Rather than use a hint to expose an
>> implementation detail, I suggest recording a
>> data type instead. Maybe even a MIME type. Then
>> provide a suitable mechanism to map data types
>> to tools that are appropriate to the
>> environment.

>I have no fundamental objection to saving the MIME
>type. I suggest that it may need to be inside of a
>string to pass the syntax of rcsfile(5). I would
>actually suggest that it might be useful to just
>borrow both of the MIME media-type and charset
>concepts. That might allow for a=20

>  "media-type text/plain;"
>  "charset ks_c_5601-1987;"

>on a given file... the defaults should probably
>be "text/plain" and iso-8859-1 or utf-8

Do you propose that the media-type be valid on its own, for data
types where charsets have no meaning?  Or put another way, is
the charset solely to provide additional processing hints to supplement
the media-type, or is the charset also required?

>> >Given that this would appear to be the desire of
>> >at least a few folks out there who might want to
>> >make CVS do a better job at merging structured
>> >ASCII files such as XML or HTML format. And
>> >further, that you seem to have objections to this
>> >approach. And while I have known you to bring up
>> >points I have overlooked in the past...
>> Not just structured ASCII files as you describe,
>> but any file containing structured data for
>> which a merge tool is available.

>Ahh, but I am not really trying to suggest that
>"binary files" are suitable in the general case
>for CVS control. That is a separate argument.

Fair enough, but the practice is more common than anyone wants to
admit.  The issue must be faced at some point.

>That said, I suppose that a merge utility that
>understands how to merge a file containing lines
>in a non-ISO-LATIN character set might also fall
>into the category of a diff3 replacement and that
>such files might be considered 'binary' by some


>> >This time around I just do not see anything that
>> >would preclude such an approach of using an
>> >external diff3 hint 'replacement' program for
>> >doing a 'cvs update -jtag1 -jtag2' operation.
>> >I will stipulate that such a program will likely
>> >need to live on the server and furthermore that it
>> >would not be interactive. In the absense of
>> >finding such a program, CVS would likely resort to
>> >using diff3 as a fallback, so its arguments would
>> >likely need to match those of the diff3 program
>> >itself... at least to the extent that cvs currently
>> >uses various arguments to diff3.
>> I don't believe that such a program MUST live on
>> the server.

>The changes needed to allow the client-side to do
>a merge are very large. I am not willing to
>stipulate an implementation that would allow CVS
>to deal with an interactive merge operation for a
>random 'cvs update' command. The repository would
>have a lock open for too long in that case.

Yes, to avoid long-lived locks, the necessary files must be
copied to the client before the merge begins.  This would
involve a significant change to the client, but I'm not
convinced that it would be a significant change to the server.
The server already has the ability to send whole revisions
to the client, and it need not be involved with the merge
once it starts.

>> Merge tools, like editors, have a way of
>> becoming religious icons, in situations where
>> users have a choice. Under such circumstances,
>> it becomes important to have client side
>> mappings between data types and merge tools.

>Your arguments almost help to make a case in
>Greg's favor against allowing a diff3 replacement.

Horrors!  I sure hope not!  :-)

>The kind of flexibility you desire is not
>something that I think makes sense to bolt into
>the 'diff3' slot.

Then bolt in a wrapper that reads the user's environment
and invokes a suitable merge tool based on preferences
that are found there.  And provide a default, like diff3,
if such information is missing.

>What you propose would potentially best be handled
>with an entirely new kind of update paradigm.
>Possibly the use of a CVS/Base/file file and a
>'patch' that would bring CVS/Base/file up to the
>latest version would be 'better' in this case...

Whatever's most efficient to get the other contributor
and common ancestor to the client.  Clean-up needs to
be considered as well.

>> Additionally, I don't believe that merge tools
>> necessarily need to be fully automated.

>Here we do not agree. Without such automation,
>lock contention on directories could get very

Again, running the merge after relevant data have been
copied out and freeing the locks would remove this

Actually, the ancestor and contributor are checked-in
versions, and they're known in advance either by version
number or branch/timestamp.  Correct me if I'm wrong here.
If this really is true, then directory locks aren't even
needed in the repository.

This specific issue has been discussed in this forum
once before, and Greg even liked the idea at the time.

>> After the relevant versions have been downloaded
>> to the client (and the repository locks have
>> been cleared), the merge tools can run
>> interactively. However, I believe that CVS
>> current intersperses merges with downloads, and
>> that would need to change before interactive
>> merges can be supported.

>The current CVS operations all occur on the server
>side prior to downloading patches to the client.

>What you are suggesting is a fairly major overhaul
>to the cvs client/server protocol and as such
>there is probably a 'better' way to deal with this
>than a 'simple' alternative table of diff3-style
>programs to do alternative merger algorithms.

The server protocol is capable of sending versions of
files to the client, so it should be possible to do
this with existing features.  The client would need to
be beefed up to invoke the merge tool directly, though.

>> Also, CVS currently relies on diff3-style
>> mark-ups to warn the user when merge conflicts
>> remain present at commit time.

>Yes, I should have stated that a failed merger
>will probably still need to leave markers not
>unlike the existing conflict markers of the
>current diff3 program.

Given the following paragraphs, the mark-ups probably
aren't actually needed.

>> Though strictly speaking such warnings are not
>> necessary, they are incredibly useful. And
>> they'll be lost unless merge conflicts are
>> recorded another way.

>Actually, merge conflicts are already recorded in
>CVS/Entries if the datestamp of the file is not
>touched, it will still show up as a 'conflict'
>on a 'cvs status' command.

This is even better than the suggestion I made in
the next paragraph...

>> One way is to lists conflicts in a file stored
>> in the CVS directory. At commit time, skip the
>> scan for diff3 mark-ups and instead read the
>> conflict list and compare mod times of the
>> relevant files. If they have changed, assume the
>> conflicts have been resolved.

>This is sounding more and more ugly.

Skip it if the conflict is already noted in the Entries file.

>> >Let me state the scope of the thought experiment:
>> >Goal: Provide a means whereby a cvs administrator
>> >may cause a program other than diff3 to be used
>> >when doing merge operations as a part of a
>> >three-way merge of files in a sandbox. This
>> >program might be defined as a keyword used as the
>> >value of a 'diff3hint' followed by an 'id' which
>> >could be looked up in a table that cvs could keep
>> >to determine which executable and any additional
>> >arguments above the diff3 form arguments might be
>> >required.
>> Again, I think that recording a data type is a
>> more straightforward (or at least more easily
>> understood) implementation.

>Sure, that makese sense.

>> >Assertion: The diff3 replacement must handle
>> >all of the args that cvs normally passes to diff3.
>> Yes.
>> >Assertion: The diff3 replacement must not be
>> >interactive in nature for client/server repository
>> >uses.
>> Well, okay for the first implementation.  :-)

>The other requirements you have outlined above
>would take a lot more work and have a high
>potential to get things wrong.

Possibly, but the payoff when it's done right is very high.

>> >Assertion: The diff3 replacement must be able to
>> >run just given the three versions of the file
>> >without any other state.
>> Yes, but it would be nice to be able to pass in
>> the version numbers for column headings or the
>> like, if the tool permits.

>Right. Of course, CVS does pass those arguments to
>diff3, so there are no real problems there. My
>point was that actually passing the MIME type or
>other information into the new program would
>probably NOT be possible.

Isn't the data type implicit in the choice of the merge

Course, the Entries file could be extended to include such
information, but...  naaaahhhh.

>> >Assertion: That cvs continue to write new RCS files
>> >in adherence to the syntax defined in rcsfile(5), but
>> >allowing the introduction of one or more new phrases
>> >and associated id word values as allowed for by the
>> >RCS format syntax.
>> Yes. Should the implementation support changing
>> these values after they've been set initially?

>Possibly. It may also be 'useful' to have each
>version have a MIME-type with the entire file
>having a default MIME-type for a newly added
>version of the same thing as the predecessor
>version. This would allow one branch to have a
>file that is in English and another file that is
>in Chinese.

Well, I still contend that it's a best practice to have a file
contain only one data type for its entire lifetime.  Unfortunately,
CVS doesn't enforce such a policy, so it opens up all kinds of
funny mismatches when people periodically commit all-new content.
This is perhaps as good a solution as any, given conditions.

>> And are the set initially at the time the RCS
>> file is created or at commit time?

>I'd guess it would be handled much as the
>=2D -k<value> set of switches are handled. You can
>use the -k switch on 'cvs import' or 'cvs add'
>or 'cvs admin' or 'cvs checkout' or 'cvs update'
>and have it do something reasonable.

The thing is the -k flags are stored in the admin section,
not with the deltas.  This is the failing that if a data
type changes from text to binary, there's no proper setting
of the -k flag that will work for arbitrary versions.  Storing
it with the deltas opens the opposite problem, which is that
if it needs to change after the fact, it's a pain to change it
on all of the affected versions.

Your call.

>It is less clear how these attributes would need
>to be stored in the CVS/Entries file if they were
>ever used on a 'per checkout' basis.

>> >It would be left to the extension designer to
>> >determine the method whereby such a new RCS
>> >phrase would be written into the CVS repository
>> >versions of the files.
>> It's easier to set it when the file is created.


It is, if it's in the admin section.  :-)

[remainder omitted]

>--- End of forwarded message from address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]