[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: texi2html output validity

From: Patrice Dumas
Subject: Re: texi2html output validity
Date: Tue, 23 Dec 2014 17:49:11 +0100
User-agent: Mutt/1.5.20 (2009-12-10)


First of all it is a bit unclear where this html comes from.  In
general, both texi2html and texi2any/makeinfo, especially for makeinfo
starting at version 5 render properly nested html tags.  

On Tue, Dec 23, 2014 at 09:29:07PM +0700, Yuri Khan wrote:
> >
> >         • the <code /> element is /always/ used instead of <tt />;
> Cursory reading of HTML.pm seems to indicate that <tt> is currently***
> used for @key, @t, @verb, and some kinds of tables possibly related to
> @example, @smallexample, @lisp and @smalllisp.

Use of <tt> in @example, @smallexample, @lisp and @smalllisp is for very
special case, something like a @table nested in those formats.

> *** 5.2.0.dfsg.1-2 as packaged in Ubuntu 14.04
> @key should be rendered as <kbd>, possibly with an additional class.
> Yes, even when inside @kbd — HTML allows and encourages nesting <kbd>.

I am not convinced.  @key is semantically very diferent from @kbd which
is the same as <kbd>.  Indeed <kbd> is not for a keyboard key in HTML,
but for typed keyboard input.

> @t is a non-semantic command in Texinfo and should probably be
> discouraged the same way <tt> has been discouraged in HTML since at
> least 1997. It probably should become a <span class="t"> styled with
> .t { font-family: monospace }.

@t and other non-semantic commands are already discouraged in the manual.
But I see no point in not using <tt> for @t, as long as browser support
it (which is likely to be until the end of times). CSS is not supported
by every browser.

> @verb is syntax sugar for escaping characters which have special
> meaning in Texinfo, and has a non-semantic side effect of fixed-width
> rendering. It probably should become a <span class="verb">.

Once again this will only work if there is CSS support.  <tt> should
always work.  That being said, the best could be <tt class="verb">.

> Code examples are a good match for <code>.
> >         • <img align="ALIGN" border="0" … /> is replaced with
> >           <img style="text-align: ALIGN; border: 0;" … />;
> No. { border: 0 } should just be specified in CSS for all img, while
> alignment should be handled by classes.
> >         • unless there’s a really good reason to nest <p /> inside an
> >           <a />, – do it in reverse: <p ><a …></a>; for one thing, this
> >           makes it possible to simply omit any </p>s on output.
> +1 for nesting <a> within <p>. -1 against omitting closing tags.

If non valid HTML is emitted by makeinfo it is a bug, so no closing tag
omissions, no invalid nesting.  A <p> in <a> should be pretty rare, I
cannot really imagine Texinfo code that would lead to that.  But if you
have such code, don't hesitate to report it, we'll see what we can do.

> Note also that <tt> and <a>/<p> nesting order are just the tip of the
> iceberg. The wider problem is that the Texinfo HTML generator
> generally assumes HTML ≈3.2 even though it declares 4.01 Transitional:

No, there is a special code for HTML 3.2 compatibility, in
init/html32.pm, but in HTML.pm many 4.01 features are used.

I may have missed something, but in the specification of 4.01
Transitional all the element you describe as needing to be dropped
are accepted?  I see them all in

Also, I think that, at least in the past, I checked the validity of
texi2html/makinfo output with some program and the HTML was valid.

We in fact choose that DTD in order to have a good rendering both in old
and new browser, with or without CSS.

> * <a name> should be dropped in favor of placing an id on the parent element;
> * alignment should be handled by classes;
> * <table border=… cellpadding=… cellspacing=…>, <tr valign=…>, <td
> align=…> should be replaced with CSS;
The previous 3 items seems to me to be ok in 4.01 Transitional.

> * tables should be generally avoided unless actually representing tabular 
> data;
Agreed, but sometime we want to do some non-semantic formatting, for
instance for node lines, or indices.  <table> is very practical in that

> * table cells containing only non-breaking spaces indicate some
> problem that should be solved, not worked around;
> * a non-breaking space immediately adjacent to a normal space is nonsensical;
> * more than one contiguous non-breaking space is a kludge;

Same as above, the use of non breaking spaces is for cases when we want
non-semantic formatting.

I agree that these are kludges, but current output is rendered ok on all
browsers we know about.  If there are proposals of other formatting that
can be used optionnally and do not involve tables and non breaking
spaces, and involve, for instance CSS, don't hesitate to propose.  But
it will be optional, the default is likely to be the kludges, as long as
they render nicely.

> * <br> are fit for poetry and postal addresses and almost nothing else;

We use it for @* and @sp as the semantics correspond.  Otherwise it is
rarely used, and always in cases when we do some non-semantic formatting
(author name, in indices formatting).

> * <font size=…> should be replaced with CSS;
> * OUTPUT_ENCODING_NAME should be deprecated in favor of UTF-8;

Default is utf-8 already, so no need to prevent setting
OUTPUT_ENCODING_NAME.  But otherwise we must stick to what the user
provides as @documentencoding.

> * the encoding declaration <meta> should be the first thing in <head>;

What would be the reason for that?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]