bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path


From: Eli Zaretskii
Subject: Re: Non-ASCII characters in @include search path
Date: Wed, 23 Feb 2022 21:52:21 +0200

> From: Gavin Smith <gavinsmith0123@gmail.com>
> Date: Wed, 23 Feb 2022 19:31:52 +0000
> 
> Whatever we do, it should be concordant with TeX's filename handling.
> I imagine that TeX (except possibly on MS-Windows) would just use the
> bytes, so so should we.

AFAIK, TeX uses bytes everywhere.

> In any case the cases we are dealing with a very rare here, but I just
> don't see that the situation is very common where somebody works in
> a non-UTF-8 locale, has all their filenames in this encoding, and
> recodes any files they download from the Internet or extracted from a tar
> file into that encoding.

If the file names are non-ASCII, the _only_ reasonable way of
downloading them is to recode their names.  Otherwise, you will get
garbled names at best, and at worst (on MS-Windows) can have the file
names rejected by the OS, i.e. you will be unable to unpack the
downloaded archive or to save locally the fetched file.

> It seems much more likely to me that somebody would be using a
> non-UTF-8 locale for whatever reason, and would download Texinfo
> files with UTF-8 names without recoding the names, and still
> expect to be able to build them.

This might simply fail on MS-Windows, if the UTF-8 byte sequences
include bytes that don't exist in the locale's encoding (a.k.a. "ANSI
codepage").  It will definitely produced garbled file names, and might
also break makeinfo.

> E.g. - UTF-8 Texinfo file, processed under KOI-8 locale on Windows,
> accessing filenames named with UTF-16 filenames on Windows filesystem.
> Then the UTF-8 filenames would be encoded to KOI-8, and then some file
> access layer would convert the KOI-8 to UTF-16 and find the filenames.
> Is that how it works or am I way off?

Are you describing what we will do in makeinfo, or are you describing
how the current makeinfo, which doesn't re-encode file names, works?

If the latter, then Windows file-related APIs will assume that the
file names we pass to them (taken from the Texinfo source's @include
or @image directives) are KOI-8 encoded, and will attempt to convert
the UTF-8 byte sequences to UTF-16 as if they were KOI-8 encoded.  The
results will never be pretty, and if some byte doesn't exist in the
KOI-8 encoding, the conversion will yield a question mark '?' or a
space character; in the former case, the API call will likely fail
because '?' is not allowed in Windows file names.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]