info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: How well does CVS handle other types of data?


From: Paul Sander
Subject: RE: How well does CVS handle other types of data?
Date: Fri, 20 Jul 2001 15:20:41 -0700

>--- Forwarded mail from address@hidden

>[ On Friday, July 20, 2001 at 08:41:53 (-0400), Ralph Mack wrote: ]
>> Subject: RE: How well does CVS handle other types of data?
>>
>> > Note also that CVS uses RCS files.  RCS uses diff and diff3.  All
>> > these things together imply that CVS only handles text files.  Q.E.D.
>> 
>> Not necessarily. RCS uses diff and diff3 in specific operations. It also
>> does a lot of other things, like keeping track of what content belongs in
>> each revision. To support binary files it is only necessary to prevent the
>> performance of those RCS operations which perform diff and diff3 on those
>> files in ways that are harmful to them.

>No, sorry, you are very wrong.

>The diff program is always used to figure out what the delta between two
>revisions is, and RCS simply logs that delta.

True, but keep in mind that the use of diff in this context is simply for
the storage and retrieval of files.  CVS is good at this, and there's no
debate to change this aspect of it.

Where we get into trouble is when the diff and diff3 algorithms are
applied to data for which they are inappropriate.  This is why the
concurrency features of CVS are unsuccessful for what we call non-mergeable
files.  The truth is not that these files cannot be merged, but that CVS
applies inappropriate merge algorithms to these files.

The argument we've been making has been that we must relax this accident
of history that CVS relies exclusively on the diff and diff3 algorithms for
all merges, and introduce additional tools to better handle files that cannot
be merged in a meaningful way using diff and diff3.

One proposal was to overload -kb and swap in a trivial selection algorithm
in its presence.  Another is to store a data type in the RCS file and
somehow register a variety of tools in a way that's senstive to the stored
data type.  I'm sure there are others.

>All RCS does (with -kb) to support the storage of binary files is to
>suppress keyword expansion and to do I/O in "binary" mode (where that
>really just means adding "b" to the flags argument of fopen()).  The
>rcsmerge program (which uses diff3) also refuses to merge files when
>'-kb' is in effect.  The delta stored in the RCS file is still
>calculated with diff, (well, with "diff --binary" if '-kb' is in
>effect).

>from the co(1) manual:

>       -kb    Generate  a binary image of the old keyword string.
>              This acts like -ko, except it performs all  working
>              file  input  and output in binary mode.  This makes
>              little difference on Posix and Unix hosts,  but  on
>              DOS-like  hosts  one  should use rcs -i -kb to ini-
>              tialize an RCS file intended to be used for  binary
>              files.   Also,  on  all hosts, rcsmerge(1) normally
>              refuses to merge files when -kb is in effect.

>from the ci(1) code:

>#                   if OPEN_O_BINARY
>                        if (Expand == BINARY_EXPAND)
>                            *++diffp = "--binary";
>#                   endif

>Note also that "diff --binary" doesn't do anything magical either.  It
>just tells diff to not bother checking for binary attributes of the
>input files and to do I/O in "binary" mode.  It still does per-line
>comparisons, but since it handles really long lines really well (at
>least until you run out of VM), the result is effectively the same.

>None of the binary handling of diff and diff3 and RCS makes it possible
>to merge files with anywhere near the degree of success required by a
>concurrent versioning tool (or a versioning tool which tries to automate
>as much as possible of branching, which is why even with plain RCS,
>which is not implicitly a concurrent versioning tool, its rcsmerge
>program will flattly refuse to try to merge files you've declared to be
>"binary").

Fine.  Let's implement one of the proposals to replace the diff and diff3
algorithms with something more appropriate for the data type.  Note that
I don't mean to totally rip out the existing code and stick in something
unknown in its place.  Instead, sense the data type in some way and use
the right tool for the merge.

>Finally where you're most wrong is in ignoring this expectation CVS must
>have of being able to normally perform automatic merges.  If you look
>into the research and practical experience reports for any and all
>concurrent versioning systems you'll learn that the sole reason they
>succeed as well as they do is because they can norally do a fully
>automatic and 100% successfull merge of changes to one or more files.
>Even Berliner's CVS-II paper shows this quite clearly.

I think you're inflating the claim a bit here.  They're successful because
the merges, if kept small and frequent, are easy to complete quickly.
Keeping them automatic minimizes the duration of locks in certain
implementations (such as CVS), but there are other ways accomplish this
by changing the locking rules.  In general, the speed and success of the
merges depend on a couple of factors:  Interactive merges are successful
when small merges are done frequently and the users can resolve conflicts
immediately.  Batch merges (like CVS does now) are successful when the
user can keep a log and go after the conflicts in bulk.

>> > You already have a build system (since of course CVS is not a build
>> > system).  It already deals with software configuration components that
>> > are not source files stored in CVS.
>> 
>> It does? My general expectation is that I can do a single command to
>> checkout the entire base of required files, mergable or not, from an
>> appropriate label.

>Hmm...  well, getting the "required files" out of the version control
>system (presumably at the "required revision level") is only part of the
>whole SCM process, now isn't it!

The key here is to give a single command to produce the whole source
tree.  Users should not be bothered to invoke multiple commands to
produce a single source tree, regardless of what the sources are composed
of.

>> I can then perform another single command to execute
>> the makefile or antfile stored under that label to perform operations on
>> files stored under that label to produce a set of artifacts.

>Yes, that's what I said:  Your build system deals with software
>configuration componets that are not source files stored in CVS.
>I.e. it calls a compiler (which is a software configuration component)
>to creates the product files.  If you're careful you've specified an
>explicit version of the compiler in your makefiles.

Keep in mind that you use an out-dated definition of source files; we're
not in Kansas anymore, and source files are not necessarily ASCII text.

Yes, the build system is expected to produce a product from its sources.
Sources are files that can't be reproduced automatically.  The rest are
built.

Oh, and by the way, identifying the correct tools to be used by the build
can be done in many ways, one of the worst of which is to embed version
numbers in the makefiles.

>> If I have
>> set the label properly and included all relevant files, this should
>> repeatably and reliably produce artifacts exhibiting the same features
>> and bugs every time I perform the operation against that label.

>Your build is only 100% repeatable if you use the same tools.

I think you're in violent agreement here.

>> I have
>> even worked in environments where the tools used to perform the build
>> (compilers, linkers, 3rd-party libraries) were stored under revision
>> control and executed from the sandbox.

>Yes, that's more what I'm talking about.  TCCS is an existing (freely
>available) change control system with integrated release management and
>it sort of has an integrated build system too (at least logically).

I think Ralph was saying that in that case he kept is tookit under version
control and called out the proper version when necessary.  There was no
mention of an integrated environment.

>> Software configuration management is only effective if one system manages
>> the entire codebase.

>Well now, that's where you're wrong, or at least being very misleading.

>A "system" doesn't have to be one integrated tool, such as TCCS.  A
>system can be just a set of rules, guidelines, and procedures!

Yes, but in the end, everything must be tracked and controlled, the processes
must be repeatable and have reproducible results.  And ideally it should
be possible to bootstrap the entire process for any product release in three
steps given only a fresh install of the OS on a clean machine and a backup of
the source code repository (the first step being to restore the backup, the
second step being to check out the bootstrap script, and third step to invoke
the bootstrap script).

>My "SCM" for some projects includes not only CVS, but GNU Make, GNU
>Automake, GNU Autoconf, and a half decent unix-like system with a good C
>compiler and related tools.  On top of that I've specified certain
>procedures to be used for defining everthing from how to create a new
>release, down to how to choose the name for a new release.

>> Therefore, this is the minimum support for binary files
>> that I would want to see. For change management, I can see arguments in
>> either direction, provided that CVS performed the actual versioning and
>> storage.

>CVS can only really be effective at versioning and storing mergable
>content and therefore it cannot ever become an integrated SCM system.

Ralph has been arguing (and stated above) that CVS should perform only the
actual versioning and storage.  He never once hinted that CVS could or
should be a complete SCM solution.  CVS doesn't work well for all types of
source files, so it should be improved.

>--- End of forwarded message from address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]