bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skip filename recoding tests on MS-Windows


From: Eli Zaretskii
Subject: Re: Skip filename recoding tests on MS-Windows
Date: Tue, 25 Oct 2022 22:32:31 +0300

> Date: Tue, 25 Oct 2022 21:18:35 +0200
> From: pertusus@free.fr
> Cc: GavinSmith0123@gmail.com, bug-texinfo@gnu.org
> 
> > It should work if the document is indeed in the expected encoding.
> > But if the file is actually encoded in something other, especially if
> > the encoding is multibyte (like UTF-8), it will not work.
> 
> Indeed, it is not reliable, but what would be the best default?  It
> seems to me that Windows adds additional possibilities for anything to
> fail.  However, on the issue of using the codepage to encode file names
> in texi2any versus using the input file encoding, it does not seems to
> me that Windows is special.  If we use the input file encoding on other
> platforms, assuming that the use case is converting manuals from
> archives where all the files are similarily encoded, consistently with
> manuals, it seems to me that Windows is not very special.  It will
> fail in some cases on Windows, but using the user codepage will decrease
> even more the possibility that the result is correct (files with encoded
> characters in their names are found).  Are you still sure that using
> the user current codepage is the best in this situation?

For the encoding of the document, @documentencoding should work on
Windows as it does elsewhere.  So I'm not sure why we use a different
default.  is that only for the case where there's no @documentencoding
in the Texinfo source?  If not, when will this default be used?

The only part that is I think different on Windows is the encoding of
file names, because Windows doesn't treat file names as opaque
bytestreams.  But anything that comes from a Texinfo source, even the
name of an included file, should be interpreted according to
@documentencoding.  When accessing included files on Windows, we
should re-encode the file names to the locale's encoding, because
nothing else will work reliably.  Is that what we do?

> I can't imagine a situation where the file name would end up being
> encoded in UTF-8 in Windows, even with a codepage in UTF-8

Windows doesn't yet allow users to set up the system to use UTF-8 as
the default system codepage.  This is only available on latest Windows
versions, and only if the user turns on a special "for developers"
feature.  Even then it is not yet 100% reliable.  So bottom line,
UTF-8 cannot yet be used on Windows as the locale's codeset.

> Even though we do not need to skip the test on Windows, I think that
> it is better to skip it, as in case of multibyte coodepage, the file
> created, supposed to be encoded in Latin1 will not be as expected,
> and the test will not succeed, but not for the expected reasons and
> does not test what it is supposed to test.

I agree.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]