bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: url protection


From: Eli Zaretskii
Subject: Re: url protection
Date: Thu, 04 Aug 2022 09:03:09 +0300

> Date: Wed, 3 Aug 2022 14:36:58 -0700
> From: Per Bothner <per@bothner.com>
> 
> On 8/3/22 13:46, Patrice Dumas wrote:
> > This is not what we do in general for html/xhtml.  For epub we always
> > emit utf8, as it is mandated by the standard, but for html/xhtml, we
> > use, in the default case, the input encoding for the output encoding.
> 
> I think that is a mistake.
> It seems clear that in 2022 all publicly-visible html pages (i.e. on a public
> web server) should use utf8.
> It is also clear that a practical html-reading program is able to read 
> utf8-encoded
> html files (assuming a correct charset declaration), regardless of the local
> character encoding, even for local file: urls or an internal web-server.
> Ergo, always emitting utf8 (with a charset declaration) is safer and very 
> unlikely to
> lead to problems. while using a native or input-base encoding is fragile and 
> dangerous.

Isn't the main issue here about encoding _file_names_, and the
encoding of HTML is secondary?  I mean file names we produce from
Texinfo sources, for files that are part of the output from texi2any
processing.

Encoding file names in UTF-8 is not always a good idea.  At least on
MS-Windows, that is currently not supported; the program (in this
case, Perl and its extensions written in C) needs either (a) convert
UTF-8 to UTF-16, and then call "wide" APIs that accept wchar_t
strings, or (b) convert to the system codepage (which could be lossy).
Otherwise functions that call 'open', 'fopen' and the likes will fail
or will produce garbled file names.

On other systems, if the locale's codeset is not UTF-8 (which is
indeed rare nowadays, but not non-existent), encoding file names in
UTF-8 will produce files whose names are unreadable by human users in
applications that manipulate file names.

So if we agree that encoding of file names we produce should not
always be UTF-8, the next question is: how to encode those names in
the produced Texinfo output when we need to reference such a file.  It
is possible to use an encoding in the produced output that is
different from the actual encoding of file names on disk, but AFAIU
the issue at hand was about the former first.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]