gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: File-tpye plug-in architecture for Arch?


From: michael josenhans
Subject: [Gnu-arch-users] Re: File-tpye plug-in architecture for Arch?
Date: Sun, 21 Dec 2003 13:44:39 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031016

Tom Lord wrote:

As a practical matter, the foundational problem to solve first is the
 "diff/patch" problem.

True. This is likely the best way to go forward.

There is however one minor difference. OO-files are zipped directoies
(with subdirectories) containing XML-files.

I think it should be Archs job to deal with the directory and files
inside (adding files, deleing files, subdirectory handling, ...) I would
like the xml-diff/patch tools to focus soley on doing diffs to the
XML-files.

When seeing an oo-file not as  a file, however as a compressed directory
then changes to the oo-files, become patchsets to the XML-files in its
compressed directory.

E.g. adding an jpg-image to the oo-file would be a patchset adding a the
jpg-file to the directory and changing 2 xml-nodes in 2 different files.

Changing the style, would be adding changing 2 XML nodes in the file
'content.xml' and adding a word would mean mainly just a change to one
particular node the file 'content-xml'.

Can you make a diff-like and patch-like pair of tools

I have googled around the internet and various xml-diff and patch tools.
Particulary the following tools seem to be interesting:

XML diff-Tools
--------------

1) http://diffxml.sourceforge.net/
- Java based            
- diffxml and patchxml tool, allows diffs for revesing of patchs
- seems to work, however
        
2) http://apps.gotdotnet.com/xmltools/xmldiff/
- Microsoft tool
- Try the web interface to compare 2 XML files, gives a good impression,
how xml diff works

3) http://www.cs.wisc.edu/~yuanwang/xdiff.html
- C++ and Java implementation
- Uses hashes for the XML nodes. Comparing XML files is done by
deteriming the least distance node distances for hashes

4) http://www.logilab.org/projects/xmldiff/project_view
- Python based tool
- Create output as in Xupdate-format for use by XML database
(www.xmldb.org)


XML-diff approaches
-------------------

In the documentation I found 2 different diff approaches to compare
XML-files. XML files are seen as trees of the XML nodes. (Very much as
arch sees directoies as file innodes.)

a) Compare trees by deleting identical parts. The rest is the diff.
b) Apply hashes to the nodes an its content. Determine the minimum
distance for hashes.

I see also the following option, I did notice so far.:

c) Apply tags to the XML-nodes. When a node is moved or edited, the tag
is preserved. In this case a tool like OO-would need to generate using
tags to all nodes and preservce the tags, when the file is changed.
(Very much like Arch's file tags today.)

I did not find any information that any patch tools are taking advantage
of the DTD, which describes how nodes are allowed to become nested.

XML-patches
-----------

XML patches seem to be mainly a discription of:
- xml-node add, move, delete or modify operations on the XML tree
  (Similar as we know from archs inode operations add, mode, delete or
modify)
- identification of the path of the xml-node with the xml-tree
  (simialr as arch uses the file path to identify the file location with
the directory)
- all patches use a simlar format, which are closely related to the
XML-DOM parser interface format or to XPath.
- Automated patching works well with the original files. Thus it could
for e.g. used for OO-change tracking.
- Branching might be difficult, due to lack of usable visual
presentation of the tree differences. Automated branching would likely
work best, when the nodes were tagged.

(and, ideally, a diff3-like tool) that are specialized for OO files?

I did not find any diff3 tools by now. Without any appropriate visual 3
dimensional reprensentation I am not sure that a generic diff3 is very
useful.


That interact very well with OO editors and so forth?

A diff3 tool then actually belong to OO and their experts. For now I
would consider it as up to them, as they control the tool and the
OO-format.

For the beginning I would recomment to start with xml-patch and to keep
format specific branching aside. As we understand the xml-patch
implication better, have studied serveral patch tools, it will become
easier to find a solutions.

Do that part first --- the arch part, after that, is pretty easy.

I so far manged to do the following:


- Created a OO-writer file 'test1.sxw'
- Changed some text, underlined a word,made a word bold and saved as
test2.sxw

- Opened the file 'test1.sxw' and 'test2.sxw' with the tool 'ark' and
got the following directory:

content.xml
META-INF/manifest.xml
meta.xml
office.dtd
settings.xml
styles.xml

- Used the tool 1) diffxml to generate the xml patch:

  diffxml test1/content.xml test2/content.xml > test1_2_xmlpatch

  patchxml  test1/content.xml test1_2_xmlpatch > content.xml

- Used 'Ark' to copy content.xml back in test1.sxw
- Opened test1.sxw with OO and it looked like test2.sxw

I think this should demonstrate feasibility for the beginning.

Michael






reply via email to

[Prev in Thread] Current Thread [Next in Thread]