[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CVS bug: merge includes same lines twice, doesn't mark conflict

From: burckhardt
Subject: CVS bug: merge includes same lines twice, doesn't mark conflict
Date: Mon, 16 Dec 2002 13:17:00 -0800

Ken Ballou writes:
> I believe I have found a bug in the CVS merge algorithm, as demonstrated
> by the attached sample files.  
>  the resulting file has two copies of the rule to make target
> "cleanest", and there are no merge conflict indicators!

I reproduced your results, but I do not think that it is a bug.  

bug-1.1 says (I added line number labels to the file):

line 1: clean-foo:
line 2:         rule to make clean-foo
line 3: 
line 4: cleaner clobber: clean clean-foo
line 5:         rule to make cleaner clobber
line 6: 
line 7: ####################

It is reasonable for cvs to interpret that file bug-1.2 added the
following lines all after line 6:

cleanest: cleaner
        rule to make cleanest

It is also reasonable for cvs to intercept that file bug- added the
following lines all above bug-1.1's line 6:

        second rule to make cleaner clobber
        third  rule to make cleaner clobber

cleanest: cleaner
        rule to make cleanest

So one version added text before line 6, but the second version added
text after that line 6.  But that one line 6 was not changed at all.
Therefore it considers the two parallel versions to be changing
different parts of the file.  So there is no conflict, and since they
changed different areas of the file, cvs uses both of those changes.
Therefore, the merged file has the added text from both of the two
parallel versions.  Since both versions had a cleantest added, then it
has to be added to the merged file once for each version.  I.e., it is
added a total of two times to the merged file.

If both parallel versions have added the cleantest to the same place
in the file, then it really would be a conflict.  The two authors
probably did intend to add it to the same place, but diff does not
interpret it as adding it to the same place.  Above, I showed how it is
possible to make such an interpretation.  However, below I will show
how it is also possible to interpret the change as being added to the
same place.  Someone could interpret that bug- added the
following lines after bug1.1's line 5:

        second rule to make cleaner clobber
        third  rule to make cleaner clobber

Someone might also interpret that bug- added the following
lines after bug-1.1's line 6:

cleanest: cleaner
        rule to make cleanest

For this second alternative, further suppose that someone interprets
the bug-1.2 change in the same way as the first alternative.  Then cvs
would see that only one person modified the area after line 5, so
there would be no conflict there.  It would also see that both people
modified the area after line 6.  But since they modified it in the
same way, it would not consider it to really be a conflict, and then
it would add it only once.

So both interpretations of the changes are valid interpretations, but
one leads to a desirable merge and the other does not.  I don't think
cvs has a way to know which interpretation is better.  A human who
understands makefiles, can obviously see which interpretation is more
desirable, but since cvs does not know the makefile language, it
cannot know this.

The reason cvs prefers the first interpretation is because the two-way
diff command prefers that interpretation and cvs uses that two-way
diff algorithm as a part of its merge algorithm.  diff prefers the first
interpretation because it it is easer for a human to read.  It is
easier for the human because it has fewer blocks of added text.  The
total number of lines added is the same in both interpretations, but
the first one is simpler since it has only one block of added text
instead of two blocks.

When the merge algorithm was written, it made use of the diff output.
Since the diff output was already designed for human understanding,
then the merge algorithm also got this type of output.  The merge
algorithm is a computer algorithm so it should not care about how many
blocks there are.  Only humans benefit from that type of
simplification.  So perhaps the merge algorithm does not need this
simplification.  And in your example, the simplification even does
harm.  In the past, some people have tried to make diff stop the
simplification when it is being used by the merge algorithms.  I
cannot remember if that improved things or not.  It might be worth
more investigation.

However, I do not think we can ever get things perfect since there
will be some cases where the people changing the files really did
intend the first interpretation to be correct.  For example, sometimes a
file really should have some lines repeated in it.  If two people added
the same block of lines to slightly different locations, they may really
want the merged file to have the lines in two places.  I.e. if we do
fix your case, we will break other cases.  So it might not even be an
improvement at all.

In fact, an earlier version of cvs, version 1.11, does put conflict
markers in your file.  So the earlier version would have given you
more desirable results for your makefile example, but the cvs
regression test, sanity.sh, has many examples in which the newer
version of cvs does better.  cvs 1.11 gives results for many other
cases that are much worse than the results you got with 1.11.1p1.  So
I defiantly do not want to go back to 1.11, but I would be glad if
someone did improve your case while keeping the existing cases in
sanity.sh working.

Instead of trying to improve cvs's merge algorithm, there might be a
way for you to improve the way you are using cvs: In your makefile
example, I think it would have been better if only one of the cvs
users had added the "cleantest" target.  The second cvs user should
not have added it himself.  Instead, he should have waited until the
first user committed, and then that user should update to get the
"cleantest" target.  Perhaps the first user was not ready to commit
yet, and second user did not want to wait to update, so the second
user decided to just add the code himself.  But I don't think you can
have it both ways.  Something bad is inevitable in this example.
Either the first user is inconvenienced by having to commit before he
is ready or else everyone else is inconvenienced by getting an
undesirable merge result.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]