[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: binary files bad idea? why?

From: Paul Sander
Subject: Re: binary files bad idea? why?
Date: Mon, 19 Jul 2004 10:27:48 -0700

>--- Forwarded mail from address@hidden

>Paul, Could you answer the following restricted question...

>IF we assume that the 'cvs update' of a particular file in a user's
>sandbox needs to do a three-way merge (checked-out version,
>latest-version and locally modified version) AND we assume that there is
>a "hint" for the CVS server to use some program that looks just like
>diff3 as to arguments, but (possibly) interprets (say a canonical HTML
>structure ignoring whitespace) the file differently than the default
>diff3, BUT the "diff3-like-progam" for the checked-out version and the
>latest-version specifies DIFFERENT diff3-like programs, THEN Greg's
>objections to adding a "diff3-like-program" per delta runs into a
>problem with not really having a good solution to jump from one diff3
>alternative to another, so the default diff3 would need to be used.
>However, the propagation of "diff3-like-program" adds extra complexity
>to generation of new deltas and would seem to add a lot of overhead for
>a given file. If a file really does change "types" over time, there are
>boundary conditions that are difficult to handle and dealing with extra
>information such as the file "character set" or "mime-type" which may
>lead to the heuristic selection of a diff3-like prgram is rendered more
>complex. If a simplifying assumption is made to use a diff3-like program
>for all revisions of a file, then putting that hint into the admin
>section for the file (just like keyword expansion) might make more
>sense, but is it a general enough solution to really deal with the
>situation correctly?

Mark, you understand the problem perfectly.  It's not possible to
have a plug-in merge tool unless there's a guarantee that all of the
possible combinations of data fed to it are compatible.  CVS doesn't
make this guarantee as implemented, so we have the following alternatives:

- Make such a guarantee by policy.
- Modify CVS to make such a guarantee.

The first choice is unacceptable because it limits the extent to which
the contents of a file can be rewritten.  The second choice has the
following alternatives:

- Modify CVS so that every revision stored in an RCS file contains the same
  data type.
- Modify CVS so that the data type of each revision is stored with each
  delta, recognizing that every revision might contain data that are
  incompatible with every other revision.

>What if your diff3-like program needs to be
>interactive, how much impact does that make to locked directories and
>the like? Would it make more sense to just use something like this:

>   cvs -n status foo.ext
>   ...notice that it is in need of being updated
>   cvs up -rBASE -p foo.ext > foo.ext.base
>   mv foo.ext foo.ext.mine
>   cvs up foo.ext
>   mv foo.ext
>   diff3-like-program \
>     -E -am -L foo.ext -L <BASE-REV> -L <HEAD-REV> \
>     -- foo.ext.mine foo.ext.base \

>"by hand" rather than trying to have cvs do it for you behind your back?

>In this world view outlined above, are there other problems or solutions
>that are not being considered?

Performing that procedure by hand is not acceptable from a usability
standpoint.  The user should be able to invoke a single command and have
it do the expected thing regardless of the data.  Treating non-text files
as special cases comes back to bite you when the text files become the
special case.  At that point, the manual procedure becomes the norm, and
the procedure above is too error prone for that.  The next step is to
write a script that performs this procedure, but then what's the
difference between such a script and absorbing the procedure directly
into CVS, except that there's one less thing to remember?

As for directory locks, they are not needed in a context where read-only
operations are done and all of the contributing versions can be identified
uniquely in advance.  They can be identified uniquely by version number or
by branch/timestamp pairs.  So given an RCS branch number (or CVS magic
branch number), you can co the versions directly from the RCS files without
even bothering with CVS directory level locks.

I believe that all of the necessary info is available in the Entries
file.  From it, you can read the BASE version directly.  Lop off the
last component to get the branch number and use the current time (at the
moment the update command commenced) to identify the latest version.

The notable corner case occurs after a branch spawns and before the first
version is committed.  But under that condition, you're up to date already.

>I can sympathize with your desire for a non-diff3 program to do
>user-level merges, but I am still looking to find such programs as might
>exist. There are still a lot of unknowns about how such a new feature
>should be implemented and there are not a plethora of programs that can
>do what is needful unless I have missed them.

I agree that there are not a lot of tools out there.  The vendors often
don't recognize the value of 3-way merges.  Add to that the fact that
hierarchical diff algorithms are relatively new and have not yet been
exploited much.  I'm considering opening an archive of such tools, maybe
calling it, and writing development tools that could assist
in building merge tools.  Imagine a yacc-like tool that accepts a
description of a data format and a collection of UI callbacks to produce
a context-senstive merge tool.  This would be extremely cool, but
unfortunately such technology is years away.

>--- End of forwarded message from address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]