help-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[help-texinfo] xml id characters


From: Karl Berry
Subject: [help-texinfo] xml id characters
Date: Sun, 31 Dec 2006 19:19:10 -0600

Hello Per,

Since you've done so many improvements in the makeinfo Docbook output, I
thought I'd ask you about this.  A Texinfo user has 8-bit characters in
his node names.  They are being munged to dashes in the Docbook output.
For example:

    Here is what's in the French XML file:
    <sect1 label="" id="Pr-requis-pour-Debian">
    The accented characters are replaced by "-". It should have been:
    <sect1 label="" id="Prérequis-pour-Debian">

This is happening in the xml_id function in makeinfo/xml.c:

    { /* Check if a character is allowed in ID attributes.  This list differs
         slightly from XML specs that it doesn't contain underscores.
         See http://xml.coverpages.org/sgmlsyn/sgmlsyn.htm, ``9.3 Name''  */
      if (!strchr 
("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.", *p))

In the reference given, I don't see LCNMCHAR being defined, but ok, I
guess I can believe it is just good old ASCII.

However, I had thought that XML, being based on Unicode, allowed more or less
anything in its id's.  E.g.,
http://www.w3.org/TR/2000/WD-xml-2e-20000814#sec-common-syn
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Letter

Can you shed any light on this?  Can we just allow anything (except ")
in the Docbook/XML and Texinfo/XML id values?

Thanks,
Karl




reply via email to

[Prev in Thread] Current Thread [Next in Thread]