gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: File-tpye plug-in architecture for Arch?


From: michael josenhans
Subject: [Gnu-arch-users] Re: File-tpye plug-in architecture for Arch?
Date: Sun, 21 Dec 2003 16:07:47 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031016

Tom Lord wrote:

In general these sorts of things aren't really an arch issue.  The
thread should really be called "file-type plugin-in architecture for
_diff_and_patch_?".

Interesting point. Should diff be able to open zip files and call arch recusively?

1) When applying diff to two OO-file 'test1.sxw' and 'test2.sxw'.
Diff needs to be aware, ath it needs to open these files as ZIP-files.
2) Diff request Arch to diff the content of the directories
3) Arch determies which files are new, have been deleted, have been moved and have been modifed 4) Arch applies diff to all modified files. (In case XML-files are used, xml diff might be is used) and diff creates single file patches
5) Arch collects the single file patches to patchsets
6) Arch returns the patch sets to the diff above
7) Diff returns the arch patchset


Let me answer this for the the xml-diff and xml-patch tools:

The problem breaks down into four parts:

a) Can you do do exact diffing and patching?

   Given files A and B can you write:

        % xdiff [options] A B > B.diffs

This was the case for the xml diff tools e.g. diffxml and patchxml. At least for one tool I have seen so far that it does not support XML-namespaces. However this is just an implementation limitiation.
>
   such that B.diffs is at least smaller than B, ideally a useful
   (browsable) description of the changes, and critically such that:

% xpatch -o B_2 B.diffs A % cmp B B_2 && echo yes
        yes

I am not sure that this is always needed.

In XML the following terms are devared as equivalent:

a) <nodename attiribute='5656'></nodename>
b) <nodename attiribute='5656'/>

Spaces outside the nodes are irrelevant. Thus according to standard after reading and saving a XML-file, the XML-file might look different, even if its content has not changed.

b) Can you do inexact patching?

   Suppose that A is modified to produce A_changed.

   Will:

% xpatch -o B_changed A_changed B.diffs
   produce useful output?  If the merge can't be fully automated, will
   it at least produce useful output?

   (How we are doing, for example, on merge tools for word-processor
   documents?  Are there some around that reliably produce a valid
   output document using formatting and mark-up to present the merge
   conflicts to users in an easy-to-resolve format?)

   Extra credit if your xpatch can do something reasonable with a
   `--forward' option.

The tool (http://www.cs.wisc.edu/~yuanwang/xdiff.html) claims to achieve this by using hashes on XML nodes.

Alternatively, if we would havel the file format under control, we could tag the XML nodes.

c) Can you do diff3-style merging?

   Will:

        % xdiff3 -o merged MINE OLDER HIS

   produce properly merged output, perhaps with useful conflict
   markers?

Have not seem such a tool yet. But might be possible.

d) How should "file type" be represented?

   Arch _might_ want to help with that -- but I'm not so sure
   it really should.    It might be better to make the "standard"
for recording file type entirely separate from Arch so that xdiff, xpatch, and xdiff3 will work well "stand-alone" and when invoked even from outside of arch trees.

   I understand that it's tempting to say "Well, arch already
   maintains a little database of `file properties' so file-type might
   as well go in that database."  Except that that wouldn't be true:
   arch maintains no such database -- only file-ids.  And anyway, that
   would make xdiff and friends and arch mutually dependent tools
   where currently there is just a one-way dependency of arch on diff.

   Ideally, xdiff, xpatch, and xdiff3 will work correctly (like
   diff, patch, and diff3) on regular text files and tla can be
then be configured to use the x* programs rather than ordinary GNU diff/patch/diff3.

Maybe to have a superdiff, which uses the diff according to the file type. Superpatch would use the patch according to the diff tool used during creation.

However I am not sure, if we can set general rules when to use a diff.
There are several xml-diff tools out there. In the moment likely all have advantages and disadvantages. They are currently not as stable as diff3.

Note that for some file types (e.g,. images) fancy support for
"inexact merging" is unlikely anytime soon.  What should you do?

At least jpeg is a container with many files inside. E.g. creation date and thumbnail preview. Etc. Likely all those files are small compared to the image itself.

Being able to open the container, might provide simlar diff solutions like oo-files.

One idea is that the diffs for such files should include a checksum of
the ORIG file ("A" in the examples above), apply themselves exactly to
copies of that file, and otherwise invoke a configurable sub-program
to just extract a copy of the MOD file ("B", in the examples above)
so the tool would leave behind a conflict consisting of two files:
A.orig and A.mod, leaving it to the user to merge them by hand.  In
the context of arch, that configurable sub-program can be arch itself
(roughly `tla file-find').

Archives created using xdiff (and containing whatever special file
types you want to handle) will be readable only by other people who
have configured arch to use the corresponding xpatch.    So if there
were very good progress on the x* programs, one way arch could help is
to endorse them -- to say "use xdiff" rather than "use GNU diff".

The the important thing is that with the _possible_ exception of
mechanisms for recording "file type" information, tla doesn't need to
be changed at all to handle file types needing a special diff/patch
algorithm.   If you want these kinds of features, you need to hack
diff, patch, and diff3 -- not arch.

This makes sense to me.

Would be modifications to arch needed to enable working with compressed files? Does diff in this cases need to call Arch recurively?

Michael






reply via email to

[Prev in Thread] Current Thread [Next in Thread]