guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sxml simple, sxml->xml and namespaces


From: tomas
Subject: sxml simple, sxml->xml and namespaces
Date: Wed, 8 Apr 2015 22:55:27 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Gentle guile folks,

I'm playing around with (sxml simple) and stumbled upon something
I think might be a bug. Consider the following snippet:

  #!/usr/bin/guile -s
  !#
  (use-modules (sxml simple))
  
  ;; An XML with two namespaces (one default)
  (define the-svg "<svg xmlns='http://www.w3.org/2000/svg'
       xmlns:xlink='http://www.w3.org/1999/xlink'>
    <rect x='5' y='5' width='20' height='20'
          stroke-width='2' stroke='purple' fill='yellow'
          id='rect1' />
    <rect x='30' y='5' width='20' height='20'
          ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'
          xlink:href='#rect1' />
  </svg>")
  
  ;; Note how SXML handles QNames (just concatenating NS and
  ;; local-name with a colon):
  (define the-sxml
    (with-input-from-string the-svg xml->sxml))
  (format #t "~A\n" the-sxml)
  
  ;; If we try to serialize this: kaboom!
  (sxml->xml the-sxml)
  
The parsing into SXML goes well, the (format ...) outputs what
I'd expect. But the (sxml->xml ...) dies with:

  ERROR: In procedure scm-error:
  ERROR: Invalid QName: more than one colon http://www.w3.org/2000/svg:svg

I had a look at sxml simple and think the problem is that the
function check-name (which is the one throwing the error) expects
the name to be a QName (i.e. either a Name or a namespace abbreviation
plus a colon plus a Name).

But SXML tacks the whole namespaces to names (i.e. the whole
"http://www.w3.org/1999/xlink";, for example -- not the "xlink").

When serializing to XML, we should go the way back, finding abbreviations
for the namespaces used, prefixing the names with those abbreviations
and issuing namespace declarations for those abbreviations (those funny
xmlns:foo attributes).

I've tried my hand at a patch which "works for me". Basically, what it
does is to thread an extra parameter "nsmap", representing a mapping
(namespace -> ns-abbreviation) valid at "this" position and below in
the tree. When new, unseen namespaces come up, new abbreviations are
"invented" (ns-abbrev-new), collected and the corresponding declarations
printed. When recursing to sub-elements, the new mappings are added to
the nsmap passed down.

The result after the patch for the above example (a bit embellished)
looks like this:

  <ns1:svg xmlns:ns1="http://www.w3.org/2000/svg";>
    <ns1:rect y="5" x="5" width="20" stroke-width="2"
              stroke="purple" id="rect1" height="20" fill="yellow" />
    <ns1:rect ns2:href="#rect1" y="5" x="30" width="20" stroke-width="2"
              stroke="purple" ry="5" rx="8" height="20" fill="blue"
              xmlns:ns2="http://www.w3.org/1999/xlink"; />
  </ns1:svg>
  
Pretty clumsy, but basically correct.

The attached patch is against "GNU Guile 2.0.5-deb+1-3". The relevant
code hasn't changed up to the current development version.

I'm not very happy with the patch as-is. Among other things,

 - I had a hard time doing what I wanted in a non-clumsy way.
   Especially, ns-abbr is a strange function and not very clear
   because it tries to do several things at once: replace the
   namespace by its abbreviation, signal a new mapping item
   whenever this abbreviation was new. But how to achieve this
   elegantly without doing several look-ups?

 - The namespace declarations are tacked at the end of the attribute
   list. This is plain opportunism: the tag may carry a namespace,
   and each of the attribute names too. Thus, it's very handy to
   collect all the unseen mappings (new-namespaces in element->xml)
   and output them at the end of the attribute list.

   But in XML it is usual to put the namespace declarations before
   the attributes (the "canonical" XML order even prescribes that).

 - The sxml code is pretty careful to not munge around too much
   with strings, but to output things ASAP to the port. I think
   I might be a bit more careful in that department.

 - In other XML libraries the user gets a choice on preferred
   namespace mappings (e.g. I'd like http://www.w3.org/2000/svg
   to be the default namespace -- or http://www.w3.org/1999/xlink
   to be abbreviated as 'xlink'). This could be achieved by
   passing a function as an optional parameter which gets a try
   at a new namespace before ns-abbr-new gets at it.

I'd be happy to prepare a patch against whatever version makes
sense once we get some consensus on how to do it right.

Thanks & regards
-- tomás

Attachment: simple.diff
Description: Text Data

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]