[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[help-texinfo] xml id characters
From: |
Karl Berry |
Subject: |
[help-texinfo] xml id characters |
Date: |
Sun, 31 Dec 2006 19:19:10 -0600 |
Hello Per,
Since you've done so many improvements in the makeinfo Docbook output, I
thought I'd ask you about this. A Texinfo user has 8-bit characters in
his node names. They are being munged to dashes in the Docbook output.
For example:
Here is what's in the French XML file:
<sect1 label="" id="Pr-requis-pour-Debian">
The accented characters are replaced by "-". It should have been:
<sect1 label="" id="Prérequis-pour-Debian">
This is happening in the xml_id function in makeinfo/xml.c:
{ /* Check if a character is allowed in ID attributes. This list differs
slightly from XML specs that it doesn't contain underscores.
See http://xml.coverpages.org/sgmlsyn/sgmlsyn.htm, ``9.3 Name'' */
if (!strchr
("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.", *p))
In the reference given, I don't see LCNMCHAR being defined, but ok, I
guess I can believe it is just good old ASCII.
However, I had thought that XML, being based on Unicode, allowed more or less
anything in its id's. E.g.,
http://www.w3.org/TR/2000/WD-xml-2e-20000814#sec-common-syn
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Letter
Can you shed any light on this? Can we just allow anything (except ")
in the Docbook/XML and Texinfo/XML id values?
Thanks,
Karl
- [help-texinfo] xml id characters,
Karl Berry <=