bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path


From: pertusus
Subject: Re: Non-ASCII characters in @include search path
Date: Sun, 27 Feb 2022 11:40:51 +0100

On Sun, Feb 27, 2022 at 09:01:51AM +0200, Eli Zaretskii wrote:
> > From: Gavin Smith <gavinsmith0123@gmail.com>
> > Date: Sat, 26 Feb 2022 22:23:04 +0000
> > 
> > Don't use the locale encoding by default for encoding filenames.
> 
> I think this is a mistake, and at least on Windows we must use the
> locale's encoding of file names by default (unless Perl has the
> ability to support the entire Unicode range of characters in file
> names on Windows -- does it?).

The plan indeed is, for windows, to use the locale.

> As a data point: Emacs uses the locale's codeset as the default
> file-name encoding for the last 15 years, on all supported systems,
> and we have yet to hear about any significant problems with that.  (On
> MS-Windows, we switched to UTF-8 several years ago, but that required
> to write replacements for every libc API that accepts file names, and
> in that replacement to convert from UTF-8 to UTF-16, then call the
> corresponding "wide" API that can accept wchar_t strings as file
> names.)
> 
> I think @documentencoding is only relevant for file names that come
> from the Texinfo source, and it's only relevant for _decoding_ those
> file names into the internal representation.  When encoding them
> before passing them to file-related APIs, those file names should be
> encoded using the locale's encoding (by default).  IOW,
> @documentencoding just tells us how the file names are encoded in the
> document, not how they are encoded in the filesystem.

I agree with you, but Gavin scenario (explained in 
https://lists.gnu.org/archive/html/bug-texinfo/2022-02/msg00111.html
is also possible).  It would indeed be possible that people unpack
tar files, maybe convmv on the main file, but leave the include file
names as is.  In that case, using @documentencoding is the best bet.

In any case, this should be modifiable with a customization variable, so
easily changed in the future if there are more informations on the most
likely scenarios.  And on windows, the plan is to set the customization
variable such as locales are used.

It would also be possible to use the @documentencoding for include
files, but use the locale for file output.  A bit similar to what we do
for Texinfo manuals, for which we can have an output encoding different
from the input encoding.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]