[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: modernizing html output

From: Gavin Smith
Subject: Re: modernizing html output
Date: Wed, 2 Jan 2019 22:00:14 +0000
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Jan 02, 2019 at 05:32:15PM +0000, Gavin Smith wrote:
> On Tue, Jan 01, 2019 at 05:46:11PM -0800, Per Bothner wrote:
> > On 1/1/19 1:49 PM, Gavin Smith wrote:
> > >Thanks for working on this.  What else needs to be changed so that the
> > >output is valid for HTML 5?
> > 
> > It's worth clarifying that all of the changes in my patch produce valid 
> > HTML 4:
> > Using 'id' attributes on arbitrary elements is part of HTML 4, and I believe
> > it has been supported by all non-toy browsers for 20 years.
> In that case it seems better to make the changes unconditionally.  I am 
> not keen on the idea of having an option to output to different versions 
> of HTML.

No sooner had I pushed the change from 'name' to 'id' to the repository, 
that I came across this message:


> Using id instad of name leads to validation errors with the 4.01
> doctype:
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/html4/loose.dtd";;>
> That's because in this dtd, id type is ID which is too restrictive,
> while in html5 it is much less restricted.

I had forgotten that that was the reason we never changed it before.

Nevertheless this issue is never going to go away, so we may as well 
deal with it now.  (If me or somebody else comes back to this in the 
future, they will probably have forgotten the details again.)

First, is the ID type really too restrictive?


"The value that is assigned to an attribute of type ID must be a valid 
XML name."

What is a valid XML name?


> You can use any name you like for your elements as long as they adhere to 
> the following rules:
> Element names can contain any character (including letters and numbers)
> Element names must not contain spaces
> Element names must not begin with a number or punctuation character (for 
> example a comma or semi-colon etc)
> Element names must not start with the letters xml (whether lowercase, 
> uppercase, or mixed case)

I believe there is a problem here, as a Texinfo node name could easily 
begin with "XML".  Otherwise, it seems okay.

Looking at https://www.w3.org/TR/html4/types.html#type-id, it doesn't 
say anything about "XML":

> ID and NAME tokens must begin with a letter ([A-Za-z]) and may be 
> followed by any number of letters, digits ([0-9]), hyphens ("-"), 
> underscores ("_"), colons (":"), and periods (".").

In the Texinfo manual, node "HTML Xref Node Name Expansion":

> As mentioned in the previous section, the key part of the HTML cross
> reference algorithm is the conversion of node names in the Texinfo
> source into strings suitable for XHTML identifiers and filenames.  The
> restrictions are similar for each: plain ASCII letters, numbers, and the
> '-' and '_' characters are all that can be used.  (Although HTML anchors
> can contain most characters, XHTML is more restrictive.)

That may not account for all the anchors that texi2any outputs, as 
Looking at the test suite, there are "id" attributes beginning with "-".  
For example, in tp/t/results/converters_tests/at_commands_in_refs.pl 
there is an id "-_007b-_007d", which is invalid because it doesn't start 
with a letter.

In the Texinfo manual in the same place, it says:

"If the node name does not begin with a letter, the literal string
'g_t' is prefixed to the result."

So maybe this a bug.  (The node name in question here is a strange case 
because it begins with a space character.)  So maybe there is no 
validation error at all, and texi2any just needs to be fixed not to 
output id names that begin with underscores.  Patrice, are you there?  
Do you remember why there were validation errors?

In the case that validation errors are unavoidable with the 4.01 doctype 
declaration, we'd have to change the doctype declaration, but to what?

<!DOCTYPE html>

is the HTML 5 DocType, but we are not going to output standards-conformant
HTML 5.  At the least, <tt> is still output in some cases.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]