bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ability to disable URL mangling in makeinfo 4.7?


From: Jamie Lokier
Subject: Re: Ability to disable URL mangling in makeinfo 4.7?
Date: Fri, 17 Sep 2004 15:13:10 +0100
User-agent: Mutt/1.4.1i

Gerald Pfeifer wrote:
> >http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements/Lines
> >instead of
> >http://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html#Statements_002fLines
> >
> >Would this help so much?
> 
> Is guess that would be somewhat confusing since it looks like a 
> subdirectory structure.

Some programs may count the fragment's "/" when resolving
relative URLs such as "..".

I know that is problem with a few old browsers when "/" appears in the
query part of a URL.  I don't know if it's ever a problem when "/"
appears in a fragment identifier.

However, "/" isn't allowed in some kinds of fragment identifiers for
another reason: XML and HTML "id" syntax is restricted.  From HTML 4.01:

   ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
   followed by any number of letters, digits ([0-9]), hyphens ("-"),
   underscores ("_"), colons (":"), and periods (".").

The "id" attribute, which is one way to name target anchors in HTML,
must be an ID token in correct HTML 4.

The "name" attribute, which is another way to name target anchors in
HTML, has CDATA syntax ("name" is not a NAME token, in case the above
quote confused matters).  That syntax is much more flexible.  The
following ASCII characters are fine in that syntax, as are most
non-ASCII characters.

    [-_:.A-Za-z0-9;/?@&=+$,!~*'() <>#%"{}|\\^\[\]`]

Of those, the followed ASCII printable characters and all non-ASCII or
control characters must be %-escaped when they're used in a fragment
reference: %-escape these characters as well as control characters and
non-ASCII characters:

    [ <>#%"{}|\\^\[\]`]

In other words, __in HTML__ (and _not_ XHTML), you can use any
non-control chracter in a "name" anchor, including spaces, "-" and
"*".  You are very restricted with "id" anchors.

This difference is mentioned in the HTML 4 spec, as one reason why you
might choose to use "name" anchors instead of "id".

In XHTML 1.0, the "name" attribute must have NmToken syntax.
In XML 1.0, the "id" attribute must have Name syntax.

It is an amusing set of inconsistencies, which means that if you want
to serve your document as XHTML 1.0, then you can only use these
characters in a "name" anchor (XML NmToken syntax):

    [-_:.A-Za-z0-9] plus some non-ASCII characters (CombiningChar | Extender)

If you want to do something with XML "id" attributes, which is where
XHTML is heading, and you still want the page to by valid HTML, then
you're restricted to the intersection of HTML's constraint on "id"
(the ID token in the first quoted paragraph above) and XML's
constraint on "id" (XML Name syntax: as "name" for XHTML above, with
additional constraints on the first character).

That intersection is strictly:

    [A-Za-z] for first char, followed by [-_:.A-Za-z0-9]

I would stick to that for compatibility with everything including
current HTML and future XML/XHTML - but if you don't care about
XML/XHTML, just HTML, then you can use nearly all printing characters
as anchor names.

Enjoy,
-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]