[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: cvs update; merge

From: Paul Sander
Subject: RE: cvs update; merge
Date: Thu, 30 Aug 2001 16:23:33 -0700

>--- Forwarded mail from address@hidden

>> -----Original Message-----
>> From: address@hidden [mailto:address@hidden
>> I haven't been following this thread, but this happened to 
>> catch my eye:
>> >--- Forwarded mail from address@hidden
>> >CVS merges using a line-based merge.  This has been found, over years
>> >of experience, to work very well on such things as program 
>> source files.
>> >It isn't the fact that there is such a thing as diff3 that makes CVS
>> >work, but the fact that using it gives meaningful, and 
>> usually correct,
>> >results.  While it is possible to have a diff/merge (aka diff3) for
>> >other formats, it at best isn't obvious that the result 
>> would be worth
>> >having.
>> >--- End of forwarded message from address@hidden
>> As usual, Greg speaks of CVS as if it's not maliable to the 
>> will of its
>> users.

>I think this is the first time I've been mistaken for Greg.

Please accept my sincerest apologies.

>  There has been a great deal of discussion in the past 
>> about the
>> possibility of replacing the diff3 merge algorithm with 
>> something that's
>> useful for merging more than ASCII text files.
>Right.  The usual assumption seems to me to be that we need a binary
>diff/merge capability, then we can change CVS to use that alternatively
>and everything is going to be fine.  One common counterassumption is
>that this is going to be overly difficult and break things.

CVS' internal interface to the 3-way merge algorithm is pretty clean.  It
would be a trivial matter to slip in something that can sense the type of
data stored in the file and apply alternative algorithms using the same
interface.  The next question is how to sense the data type, which can be
done any any of a number of ways, such as matching content with a magic
file, examining file extentions, or setting a newphrase in the RCS file.
Then all we need is a lookup table to map tools to file types, and
simple wrappers to match the actual merge tool to CVS' interface.  This
isn't exactly rocket science.

As for writing the wrappers, there's concern that the maintainers of CVS
will have to provide them.  That's not true, either.  The community will
supply the merge tools that they need for the kinds of data they use.  Some
will add theirs to the contrib directory, which is unsupported.

>What I want to know is if it is going to work in a way that makes it
>useful to do.

Based on what I've described, why do you think this would not be useful?

>To repeat, given two versions of a given original source code file, changed
>in different ways in different places, diff3 will usually produce a merged
>file that, as is, is a valid source file and which will compile to an
>executable that has both behavior changes.

>It is entirely possible that this will fail to work.  Suppose that one
>change used a global variable in a new place and the other change removed
>it from the program.  It is possible to think of text formats in which
>this sort of merge would normally produce an unusable result.

>The fact that diff3 works so often on the sorts of text files people usually
>use with CVS is what makes it possible for CVS to manage concurrent
>development so well.  If diff3 usually didn't produce a usable merged file,
>CVS wouldn't really work.

>So, in order to decide whether adding an extended merge capability is a good
>idea, we have to determine whether having that capability is worthwhile;
>that is, if the merging saves a great deal of time over manual merging for
>types of files people would use often.  Until we have some assurance of
>this being worthwhile, I don't think it's wise to spend a whole lot of
>work making it possible.

You should also consider the types of data that are being stored in CVS,
and the usage patterns of the specific data.  It is possible to produce
a useful merge tool for a great many data types.  Frame Maker and Microsoft
Word are two data formats that come up often in this list.  Diff3 can't
produce meaningful results for either tool, but it's possible to come up
with counterparts that will work for these specific data formats.

Other data formats, such as GIF or JPEG images in general don't have
meaningful merge algorithms, other than trivial selections.  And then
there are the areas where people just don't expect an automated merge to
work at all and expect to do something else no matter what CVS does.

I think that all of these needs can be met, to the degree that automated
merging as a concept proves useful.  And even in those cases where automated
merging per se does not offer value then CVS' other capabilities can still
be brought to bear.

>> One possibility is to hook a trinary switch in with the -kb keyword
>> expansion mode as a quick implementation to support 
>> unmergeable file types.
>> Another is to hook in a selection algorithm that examines the file and
>> invokes the proper merge tool.
>Given a satisfactory diff/merge tool, this would be good.  I don't think
>it's the right place to start, though.

What do you suggest instead?  There is no general-purpose merge tool that
will work, so I don't see any alternative than to write an extensible
context-senstive one.

>> But so far no one has had the wherewithall to implement 
>> either of these
>> and certain parties have argued vehemently that such features 
>> must not be
>> added because they don't happen to need them.

>I'm not going to claim that everybody here is reasonable all the time, but
>this is not a simple issue.  It has the potential to be a black hole for
>time and effort, break RCS compatibility (which is still useful for some
>people) and accomplish nothing useful.

I don't see how any of this can come to be.  You build the hook so that new
merge algorithms can be integrated.  The framework is relatively simple and
easy to maintain.

RCS is capable of storing arbitrary data types in its delta format.  The RCS
file format standard is also extensible and is capable of storing whatever
additional meta-data are needed to facilitate the new capability.  RCS lacks
the tool to create arbitrary newphrases, but that's not barrier for a tool
that already writes RCS files from scratch.

>Without a good deal of work, nothing is going to happen.  I would suggest
>that people interested in this ability demonstrate a tool that does a good
>job of diff3 on popular file types CVS can't merge, and propose a repository
>file format that would retain RCS compatibility as much as possible.  If
>this project is shown to be useful and feasible, I think we could drum up
>some enthusiasm for it.

Okay, I've cobbled up a demonstration of what I've been talking about.
There are two scripts:  The first is the 3-way merge abstraction layer,
the second is an alternative merge algorithm that happens to be a 2-way

Here's the first script.  It takes as arguments the name of the working
file to be merged, the common ancestor, and the other contributor.  It
performs a lookup in a regular expression table and invokes a command
appropriate to some naming convention listed there.


# Usage:  w1 work ancestor contributor

cat <<EOF |
.*\.gif         pick3
.*\.jpg         pick3
.*                      diff3 -E
sed -e 's/^/^/' -e 's/          */$     /' |
awk '-F ' '
match(p1,$1)    { print $2 }
' "p1=$1" |
head -1

if [ "x$cmd" = "x" ]
        echo "Failed to located the proper merge tool" 1>&2
        exit 2

echo "Invoking $cmd $1 $2 $3" 1>&2
$cmd "$1" "$2" "$3"
exit $?

A real implementation would not use a hard-coded table.  The table could be
stored in $CVSROOT/CVSROOT, or the correct command might be written as a
newphrase in the RCS file and grepped out, or the table might be indexed by
some string that's stored in the newphrase, or the command might be read from
a magic file, or perhaps a completely different method could be used to
map each file to the correct merge command.  The point here is that the
merge tool is not hard-coded as diff3, but instead is configurable by the
CVS admin.

The wrapper tool above invokes another program called "pick3" if the working
copy of the file matches either wildcards *.gif or *.jpg.  That tool is
as follows:


# 2-way selection merge program

if cmp "$1" "$2" > /dev/null
        if cmp "$2" "$3" > /dev/null
                # Files are identical, pick one
                cat "$1"
                # $3 differs from $1 and $2, pick it
                cat "$3"
        if cmp "$1" "$3" > /dev/null
                # $1 differs from $2 and $3, pick it
                cat "$1"
                # Conflict
                echo "Conflict in $1" 1>&2
                cat "$1"
                exit 1
exit 0

This is a very simple tool that writes out the one contributor that differs
from the common ancestor, or writes out the working file with and reports a
conflict.  A real tool should be a little more intelligent about the proper
result, e.g. having the user specify the proper handling of the conflict

Now, all that's needed is to have CVS invoke the above wrapper, replacing
the hard-coded invocation of diff3.

I hope that I've convinced you that this doesn't break backward
compatibility in any way, doesn't require significant changes to the
existing design of CVS (and in fact does not even require extensive
modifications to its existing implementation), and yet it provides a means
of extending CVS in a way that many of its users find useful and perhaps
even necessary.

>--- End of forwarded message from address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]