[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: url protection
From: |
Patrice Dumas |
Subject: |
Re: url protection |
Date: |
Thu, 4 Aug 2022 00:15:46 +0200 |
On Wed, Aug 03, 2022 at 02:36:58PM -0700, Per Bothner wrote:
> On 8/3/22 13:46, Patrice Dumas wrote:
> > This is not what we do in general for html/xhtml. For epub we always
> > emit utf8, as it is mandated by the standard, but for html/xhtml, we
> > use, in the default case, the input encoding for the output encoding.
>
> I think that is a mistake.
> It seems clear that in 2022 all publicly-visible html pages (i.e. on a public
> web server) should use utf8.
> It is also clear that a practical html-reading program is able to read
> utf8-encoded
> html files (assuming a correct charset declaration), regardless of the local
> character encoding, even for local file: urls or an internal web-server.
> Ergo, always emitting utf8 (with a charset declaration) is safer and very
> unlikely to
> lead to problems. while using a native or input-base encoding is fragile and
> dangerous.
I agree that UTF-8 is the way to go for the future, and the default
output encoding could be set to UTF-8 irrespective of input encoding for
HTML, and even more for XML based formats. I do not have a specific
opinion on that matter, and I defer to Gavin on that matter.
Also, my wild guess, although I haven't tested, is that a browser,
without any charset information, for a local file, should use the locale
encoding.
In any case, it does not mean that using another encoding is fragile nor
dangerous. There is a list of supported encodings in the Texinfo
manual
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040documentencoding.html
I think that we support them well, in a robust way in texi2any. And if
it is not the case, it should be a bug. We always emit a charset
information, too.
Also this is quite off-topic, we can discuss the default output encoding
for HTML, but it should not be in that thread.
> > The conversion should not have already been done at that point, we are
> > still character strings in internal perl unicode encoding. But that was
> > not really myquestion, my question was more on whether we should use the
> > output encoding to encode string before doing the URI::Escape call, or
> > always use UTF-8, even if the document encoding is not UTF-8.
>
> The question is irrelevant: we should always emit utf8 in both urls and in
> the body
> of html/xhtml files. That should certainly be the default (regardless of
> native or input encoding) - and it is almost certainly a waste of time to
> support anything else.
I think that we should support setting the output encoding explictly to
a Texinfo supported encoding for a long time, even it UTF-8 becomes the
default output encoding for HTML. I do not imagine dropping that
feature anytime soon. This question will therefore be relevant for this
setup for a long time, too.
--
Pat
- url protection, Patrice Dumas, 2022/08/03
- Re: url protection, Per Bothner, 2022/08/03
- Re: url protection, Eli Zaretskii, 2022/08/04
- Message not available
- Re: url protection, Eli Zaretskii, 2022/08/05
- Re: url protection, Patrice Dumas, 2022/08/05
- Re: url protection, Per Bothner, 2022/08/05
- Re: url protection, Gavin Smith, 2022/08/04
- Re: url protection, Patrice Dumas, 2022/08/04
- Re: url protection, Gavin Smith, 2022/08/05
- Re: url protection, Patrice Dumas, 2022/08/05