monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Merge3


From: Torsten Rueger
Subject: [Monotone-devel] Merge3
Date: Wed, 21 Apr 2004 13:28:56 +0300
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7b) Gecko/20040316

Moi,
I've been researching merge algorithms for xml recently and together with a colleague have found an algorithm that merges text with very good results.

I am afraid that I can't publish my code (it's ruby anyway), due to licensing reasons. But I'm happy describing the algorithm and if someone wants to implement it in c++, I can help (offline). I just think monotone is such a fresh wind to the topic of version control, i'd be happy to contribute at least a little to this merging topic.

In merging one can must take one of the two aproaches:
-- add/remove
-- add/copy

Quite seperate from that is the issue of line based versus byte based. The second is obviously more general (works on binary files) but more importantly also more acurate (produces less conflicts for changes on one line that can still be merged)

Add/remove is what unix diff/patch does (on a line basis). Add/copy is what more recent algorithms do, namely xdelta and vcdiff.

Now the problem with add/remove is that almost by definition is does not detect copies or moves. So that when a part of text is moved in one branch and edited in another, a conflict occurs. Often quite unneccessarily.

Xdelta and vcdiff not only work on binary data, but detect copies. They do this for efficiency reasons (the produced delta is smaller) and only for diffing 2 files. Unfortunately, both can only apply diffs to the exact original, so no 3 way merging. So that's what I wrote and decribe here.

I'll make an example (if you're still reading, great, keep going), the numbers are for explaining the algorithm later. The example is short, the strings are actually too short, but it's just an example. The merge is perfectly feasable and unambiguous. You can put each word on a line and run it through diff3 to see it fail.

Original:  <html><body>Stuff</body><head>Merge Example</head></html>
             1     2     3     4      5    6     7      8       9
One:       <html><head>Merge Example</head><body>Stuff</body></html>
             1  <--5     6      7      8  <--2     3    4   <--9
Two:       <html><body>Algorithm</body><head>Example</head></html>
             1     2   <--10    <--4     5 <-- 7      8      9
Merge:     <html><head>Example</head><body>Algorithm</body></html>
             1     5     7       8      2     10       4     9

Matching phase: Find the string in One and Two to map to the original. Add added strings
         One:  1   5-8   2-4  9
         Two   1   5     7,8    2  add:10  4 9
Merging phase: go backwards through original following the change pattern:
      Start with 9 in either
      Go to 4, because of change in One
      Go to 10 because of change in Two
      Go to 2 because of change in Two
      Go to 8 because of change in One
      Go to 7 because no change in original order
      Go to 5 because of change in Two
      Go to 1 because of change in One

Output in reverse order and get Merge!

While going through the original matches of the matching phase, one has to recognise the "changes" inside the matches. It's either that or splitting all matches into the non overlapping pieces that are the numbers. This second option has proven to be more complicated.

Matching is done al la xdelta, by splitting files into chunks, calculating hashes for each chunk. Then looking for equal hashes and expanding the match as far as possible. At the end one can add the strings in the "gaps" that have not been matched.


I hope the example makes it clear that the algorithm is relatively easy to understand (maybe more so than existing ones) and creates good merges or avoids unneccessary conflicts.

So if you got this far, and are intested in implementing go for it. I'd be happy to help.

Torsten






reply via email to

[Prev in Thread] Current Thread [Next in Thread]