[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CVS corrupts binary files ...

From: Mark D. Baushke
Subject: Re: CVS corrupts binary files ...
Date: Tue, 08 Jun 2004 16:21:03 -0700

Hash: SHA1

Hi Folks,

Greg writes:
> CVS is designed _only_ for tracking changes in
> human written text files.

Paul writes:
> Keep in mind also that there's a difference
> between "binary files" and "mergeable files".
> The two concepts are in fact orthogonal; there
> are mergeable binary types (given a suitable
> tool), and there are unmergeable text types. CVS
> is bad at storing unmergeable files, no matter
> whether or not they're binary files. CVS can be
> easily modified to support mergeable binary
> types, as I've demonstrated, without significant
> impact to its design.

In my view, CVS was designed to add a model of
concurrent modification and automatic merges on
top of the previously existing Revision Control
System representation of files. The removal of
exclusive locking for changes is the fundamental
reason that CVS exists.

It may be that the diff3 algorithm is not always
the best one suited to do such mergers. 

For example, using a UTF16 character set in a file
for example may prove to be difficult to merge
even if the text in the file is only a "simple"
Chinese representation. Perhaps something like
the xcin project will eventually provide a diff3
for use in this case.

It may be desirable to mark UTF8 or UTF16 files as
being 'binary' in order to preserve the text more
exactly across operating systems that are not
(yet) friendly to such text.

For this reason, I take Paul's side on the issue
of the orthogonal nature of the discussion of
files that may or may not be "merged" using
automatic tooling of some sort.

I also share Greg's bias that using CVS to save
arbitrary binary data and/or derived objects is
not something that is a core competence of CVS.

For myself, I have no objection to a few small
icons being checked into a repository that will
also be holding sources that use them (of course,
I would usually favor them being convereted into a
text representation such as xbm format or the
like). I have seen where using very large binary
objects can cause problems for both users and

I have also seen problems where folks checkin
derived objects such as PostScript files that are
pure text files, but normally are not merged
effectively by a diff3 program during a normal
'cvs update' of a file.

I believe that adding flexibility to CVS as to
what program should be used in place of diff3 for
doing a merge operation is desirable.

That said, I do not know the correct approach to
take for allowing the cvs admin or user do such a
merge with a non-diff3 tool. Some such tools are
(by their nature) interactive and this does not
seem to be a good fit with the CVS methodology.

Some such programs may only be available on client
machines while others would potentially be
available on the server. I typically favor that
such programs would be consdiered to be present on
the server and NOT on the client.

The exact semantics and rules under which a
substitution for a different program than diff3
could be used for a merge operation need to be
carefully considered before we jump into a change.

I suspect that we would need to add a filetype
recognizer into cvs as a preliminary step to help
to classify the type of a file that is to be
merged (or added or imported for that matter) in
order to know which of the potentially large
number of three-way merge programs and scripts
should be used or considered for use during a
given cvs merge operation.

I do not consider filetypes driven by the name of
a file to be useful in such deliberations.

If anyone has any suggestions or other patches
for this kind of feature, I would be interested
in hearing about them.

        -- Mark
Version: GnuPG v1.2.3 (FreeBSD)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]