[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: simplifying configuration of encoded characters/entities output

From: Gavin Smith
Subject: Re: simplifying configuration of encoded characters/entities output
Date: Wed, 29 Dec 2021 15:50:50 +0000
User-agent: Mutt/1.9.4 (2018-02-28)

On Wed, Dec 29, 2021 at 01:35:05PM +0100, Patrice Dumas wrote:
> Here is my proposal for HTML
> * remove FALLBACK_TO_NUMERIC_ENTITY, always setting it for HTML (and
>   never for TexinfoXML, or always set, not sure about it, and probably
>   does not matter much).
> * if ENABLE_ENCODING is set, try to output unicode points encoded
>   characters for every output, be it accents like @'e, @-commands like
>   @l{} or dashes and quotes.

I'm happy with this.

I couldn't find much information online about whether using the
entities or using raw UTF-8 was better.

I did find this page:

and I do remember seeing that some old browsers gave you the choice of
which encoding to use for a page.  Hence, using entities seems like
a more reliable way of specifying a character, in case the page encoding
is set/detected incorrectly by some old browser.

If a document has a lot of non-ASCII characters (e.g. if it's written
in Chinese), then the behaviour you state with ENABLE_ENCODING would
be better.

Agreed that the choice for TexinfoXML doesn't matter.

> That would mean 3 possibilities for HTML
> * default, use named entities if possible, fallback to numeric entities
> * --enable-encoding triggers outputting encoded characters
> * with USE_NUMERIC_ENTITY output numeric entities
> Note than in most if not all cases, the actual output would still be
> guarded by the OUTPUT_ENCODING_NAME value, such that the conversions
> with ENABLE_ENCODING set are only done when they are known to be
> possible.
> Opinions, ideas?
> -- 
> Pat

reply via email to

[Prev in Thread] Current Thread [Next in Thread]