[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: binary files bad idea? why?

From: Paul Sander
Subject: RE: binary files bad idea? why?
Date: Tue, 6 Jul 2004 18:18:00 -0700

>--- Forwarded mail from Greg Woods:

>[ On Friday, July 2, 2004 at 22:25:27 (-0400), Eric wrote: ]
>> Subject: RE: binary files bad idea? why?
>> At 2:11 PM -0400 7/2/04, Greg A. Woods wrote:
>> >Why is it so damn hard for everyone to keep this simple fact in mind?
>> Because it is entirely possible to use CVS in a manner where this 
>> simply isn't an issue.

>Ah ha!  So, we finally get back around to this issue!  What a long trip
>it has been!  ;-)

>Now if you remember what I stated at the beginning then you'll realize
>just exactly why what you've said above is the wrong answer.

>Don't put binary files into CVS and expect it to work 100%.

>CVS cannot detect and manage changes in any meaningful way in files that
>are not organized as lines of text,

In CVS' current implementation, this is true.  But it is possible to
generalize it to support other data types if it can be taught something
about the structure of the files it manages.  One easy way to do this
is to integrate type-specific diff and merge tools that are applied to
reconstructed versions of sources.

>                                    and many of the most common and most
>important delta management operations that CVS does involve three-way
>merges of deltas with the hope and expectation of avoiding conflicts in
>those merges.

But this is false no matter how you look at it.  Remember that a delta
is a specific set of changes that derive one complete version from its
immediate predecessor (or in the case of an RCS trunk, its immediate
successor).  Neither RCS nor CVS use deltas directly in their user-exposed
diff and merge features.  Instead, they reconstruct entire versions and
apply the diff and merge tools to those.  There's a reason for this, and
the implementation is correct.

(It so happens that RCS and CVS use a particular diff tool to create
deltas, and they obviously know how to accumulate the deltas to reconstruct
specific versions.  But those algorithms are for all intents and purposes
hidden from the user.  At least, they are if the user doesn't review the
contents of the repository directly, which is strongly discouraged anyway.)

>               One seldom-changing binary file in a large project
>(e.g. thevery few found in all of the NetBSD source tree) isn't an issue
>provided the human management of the project contributors keeps a sharp
>eye out for problems with these files (e.g. through peer pressure in the
>NetBSD group, combined with the fact that most/all of those binary files
>are "owned" by one developer).  However the more binary files your
>project has, the more times they are changed, the more diverse the
>working directory hosts, then the more problems binary files will cause
>if they are committed to a CVS repository.  Putting binary files in CVS
>is a bad idea, always was, and always will be.

>I.e. unless you have extremely pressing reasons for including binary
>files in your repository (e.g. as in NetBSD they are very rare and
>extremely stable and "owned" by one developer and because NetBSD also
>strives to use CVS as a source distribution tool), then it's best to use
>other tools and procedures for managing your binary files outside of CVS.

Again, this is all true if the contents of the binary files are opaque.
The nature of the binary data that are part of the NetBSD sources appears
to be of this nature.

But you continue to ignore the situations where the binary files have
structure and therefore can be differenced and merged with appropriate

>Don't put binary files into CVS and expect it to work 100%.

For unstructured binary files, this is true.

For structured binary files that have effective differencing and merge
tools, fix CVS so that it will work 100%.

>Use the most appropriate tools for the job.

We are, but we want them to be better.

>CVS is not a complete software configuration management system.

Nobody's asking it to be.  It does version control nothing more and nothing

>> Furthermore, CVS does provide some facilities to use it in a 
>> non-concurrent manner adding further protections.

>No, not really -- some of what's there doesn't work right and the rest
>is a bunch of half-baked add-on hacks that don't meld with the design or
>goals of CVS and which, as proven by this ever repeating cycle of
>discussion, causes more confusion and more headaches to naive users than
>could ever be made back in long-term benefits to anyone.

I assume you're talking about cvswrappers here.  True, it's a partial
solution, but the marshalling capability it supplies for managing aggregate
data types is appropriate.  Unfortunately, no one has thought the problem
through sufficiently to produce a good general solution.

On the other hand, features like the modules database (and especially the
Checkin.prog file capability built into it), the vendor branch, the history
file, and certain other features are much more broken than support for binary

>--- End of forwarded message from address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]