info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xdelta and CVS


From: Greg A. Woods
Subject: Re: Xdelta and CVS
Date: Thu, 19 Apr 2001 19:19:32 -0400 (EDT)

[ On Thursday, April 19, 2001 at 15:47:14 (-0500), David H. Thornley wrote: ]
> Subject: Re: Xdelta and CVS
>
> More accurately, it meets requirements in a rather bad way, using
> a lot of disk space and offering little benefit you wouldn't get
> by gzipping and backing up the data regularly.

Yeah, whatever!  :-)

> Um, what's so sacred about RCS file format?  I realize that file
> formats are to be changed only with caution, but since the entire
> functionality is internalized into CVS (as of 1.10, I believe)
> there is no reason why it cannot be changed for a good purpose.

Those two points are totally orthogonal.

Some people have even argued that the CVS repository format is
irrelevant and all that matters is the network protocol, though they've
missed some of the inherent issues in the protocol's design that
effectively require text-based diffs as the delta storage format.

The RCS format is important because it makes it possible to retrieve the
contents of the repository with third-party tools (eg. RCS itself).  RCS
file format is well documented and tools for handling it are widely
implemented, the canonical definition being publicly and freely available.

The RCS format is also important because it guarantees forward and
backward and sideways compatibility and interoperability with other
releases and variants of CVS.  One can rewrite CVS from scratch and
still *interoperate* with the exact same repository.

Finally the RCS format is important because it means that many RCS users
can migrate to using CVS without losing version history or trying to
figure out how to convert it to some new format.

I.e. I sure don't want to change my CVS repository format, though if I
were to do so it would only be to some other well known text-based delta
storage format (and I only know of one other:  SCCS).

> > The second idea is just plain wrong in claiming that it would not change
> > the CVS repository format since it would, by definition.  RCS uses
> > "diff" and only "diff" for delta storage.  What it really proposes is to
> > change RCS.
>
> No, what it proposes to do is to replace RCS.  I thought that the
> essence of CVS was something other than its file format.

Well, OK, yes, it replaces RCS (the change would be "complete" :-).

The essense of CVS is more than just its file format.  However
historically CVS was just an RCS wrapper and its file format was defined
by RCS.  Many features of CVS are also just RCS features.

At one point back in the not so distant history of CVS (i.e. prior to
RCS and diff integration) this kind of "replacement" of RCS would have
been relatively easy (not trivial -- I actually investigated the level
of difficulty a few years ago).  One need just change the implementation
of the underlying RCS commands it used.  For example one could have
dropped BitSCCS in and with a relatively few hacks to CVS and you would
pretty much have built an SCCS-based CVS (BitSCCS has RCS command-line
compatability).  With only slightly more hacks you could have built a
CVS that used AT&T SCCS (or GNU CSSC, MySC, etc.).  If the hacks were
done carefully you'd even end up with a CVS that could use either
storage system, and maybe even any random tool with similar underlying
capabilities.  Some of the hacks required would have revolved around
branch numbering issues, keyword expansion differences, etc.

Obviously none of that would have made CVS suitable for *binary* files
(i.e. data files with opaque internal structure that cannot be merged
with diff3 and a text editor to resolve marked conflicts), since CVS is
still a "concurrent versioning system".

Note also that changing the delta format involves changing the remote
protocol support (or at least dropping support for sending patches to
the client, which may totally destroy the efficiency and make it
unusable for many people).

> I'm not all that familiar with CVS internals (not having had to
> mess around with it like I did Gnats), but it seems to me that
> we're talking about changing the repository format, nothing else.
> If this is a really large project, then CVS is very badly
> designed.

Indeed.  :-)

CVS, having been once just a wrapper around the CVS commands, is
inhernently tied tightly to many RCS features and semantics, and now
that it has its own internal implementation of RCS handling
functionality it's even more tightly integrated into the RCS way of
doing things.

> (Now, test and validation would be time-consuming.)

Well, some would, but it's that's another part of CVS that desparately
needs re-writing anyway....  Many of the current tests though would
still be valid and suitable for regression testing.

> There is the obvious need for both-way conversion programs, but
> after that I think the Xdelta version would see fairly rapid
> acceptance.  (How rapid depends partly on how effective the
> merging was, which is to say whether two changes in a file
> can be merged to produce another useful file.  This would
> obviously depend partly on the file format, and I'm not
> an authority on common binary file formats.)

I think the only way one could achieve rapid acceptance of Xdelta into
CVS would be to first re-write RCS (and rcsdiff and patch!) to use
Xdelta, to make sure there's some clear way of marking merge conflicts
for easy resolution with any random text editor, and then to change CVS
to use the new version of RCS (either by going back to being a wrapper,
or perhaps by linking against the same library of code used to implement
the new RCS commands).

Whether such a project would be something any sane person might dream of
tackling, or not, is another question....  Like I say I did once
investigate the viability of replacing RCS with BitSCCS and decided
against it.

As for Xdelta vs. RCS, well let's face it:  Lines of text are the single
best possible way to delimit records of almost anything language based,
especially if you merge a computer languages and human languages under
the same umbrella and decide to support multiple languages
simultaneously.  Portably and generically handling changes to
"documents" written in multiple arbitrary languages can only be done
(using today's readily available technology) with text.  And speaking of
"readily available", there are probably still more tools available for
handling text than all of the tools for all of the other non-text
formats combined.  Any format for storing deltas to text files, and all
of the tools that can built upon it, is what makes it possible to build
a concurrent versioning system like CVS.  The fact that the original CVS
designer (Dick Grune) chose RCS as the specific delta format to use
(over say SCCS) is an accident of history as much as anything else.  CVS
is, for all intents and purposes, stuck with that choice now.

> Given a copy of CVS, and a copy of XCVS, with the ability to use
> both but not examine repository format or source code, could
> you necessarily tell the difference?  If properly implemented,
> it seems to me that the changes could be mostly invisible.

The difference would be invisible only to the end user (and perhaps, but
not necessarily, to the remote client developer -- see the issue of
sending patches to clients).

The repository manager would clearly see the difference when RCS
commands failed to reveal the structure of the files in the repository.

(Yes I use RCS commands in my repositories very regularly!)

In other words anyone wanting to use Xdelta for the deltas in their
repository should probably just use PRCS.  Even writing a new tools with
CVS-like capabilities to use Xdelta would be a non-trivial undertaking
and would probably even be a lot more work than rewriting CVS and
keeping RCS as the repository file format.

-- 
                                                        Greg A. Woods

+1 416 218-0098      VE3TCP      <address@hidden>     <address@hidden>
Planix, Inc. <address@hidden>;   Secrets of the Weird <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]