bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: url protection


From: Patrice Dumas
Subject: Re: url protection
Date: Thu, 4 Aug 2022 00:15:46 +0200

On Wed, Aug 03, 2022 at 02:36:58PM -0700, Per Bothner wrote:
> On 8/3/22 13:46, Patrice Dumas wrote:
> > This is not what we do in general for html/xhtml.  For epub we always
> > emit utf8, as it is mandated by the standard, but for html/xhtml, we
> > use, in the default case, the input encoding for the output encoding.
> 
> I think that is a mistake.
> It seems clear that in 2022 all publicly-visible html pages (i.e. on a public
> web server) should use utf8.
> It is also clear that a practical html-reading program is able to read 
> utf8-encoded
> html files (assuming a correct charset declaration), regardless of the local
> character encoding, even for local file: urls or an internal web-server.
> Ergo, always emitting utf8 (with a charset declaration) is safer and very 
> unlikely to
> lead to problems. while using a native or input-base encoding is fragile and 
> dangerous.

I agree that UTF-8 is the way to go for the future, and the default
output encoding could be set to UTF-8 irrespective of input encoding for
HTML, and even more for XML based formats.  I do not have a specific
opinion on that matter, and I defer to Gavin on that matter.

Also, my wild guess, although I haven't tested, is that a browser,
without any charset information, for a local file, should use the locale
encoding.

In any case, it does not mean that using another encoding is fragile nor
dangerous.  There is a list of supported encodings in the Texinfo
manual
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040documentencoding.html
I think that we support them well, in a robust way in texi2any.  And if
it is not the case, it should be a bug.  We always emit a charset
information, too.

Also this is quite off-topic, we can discuss the default output encoding
for HTML, but it should not be in that thread.

> > The conversion should not have already been done at that point, we are
> > still character strings in internal perl unicode encoding.  But that was
> > not really myquestion, my question was more on whether we should use the
> > output encoding to encode string before doing the URI::Escape call, or
> > always use UTF-8, even if the document encoding is not UTF-8.
> 
> The question is irrelevant: we should always emit utf8 in both urls and in 
> the body
> of html/xhtml files.  That should certainly be the default (regardless of
> native or input encoding) - and it is almost certainly a waste of time to
> support anything else.

I think that we should support setting the output encoding explictly to
a Texinfo supported encoding for a long time, even it UTF-8 becomes the
default output encoding for HTML.  I do not imagine dropping that
feature anytime soon.  This question will therefore be relevant for this
setup for a long time, too. 

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]