bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path


From: Gavin Smith
Subject: Re: Non-ASCII characters in @include search path
Date: Sun, 20 Feb 2022 14:44:13 +0000

On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
> > This means that any non-ASCII characters in a filename in a Texinfo source
> > file are sought in the filesystem as the corresponding UTF-8 sequences.
> 
> This will not work on Windows.

I can see that there could be an issue if files are copied onto a
Windows filesystem, or extracted from a tar file.  This
would lead the byte sequences representing the filenames to change.

Do you know if TeX distributions for Windows do any handling of filename
encodings?  A file could be in UTF-8 but need to refer to file names that
are in UTF-16 on the filesystem.  Would this work?

> > A more thorough fix would obey @documentencoding and convert back to the
> > original encoding, to retrieve the bytes that were present in the source
> > file in case the file was not in UTF-8.  I think it would be the most
> > correct to always use the exact bytes that were in the source file as the
> > name of the file (I assume that is what TeX would do).
> 
> This assumes that the file name is encoded the same as the Texinfo
> source.  But that assumption is only true on the system where the
> Texinfo file was written, and even there it could be false.

I would only favour having special handing for Windows.  On GNU/Linux we
should assume that the byte sequence in the Texinfo file matches the
filename exactly.  This is the easiest behaviour to understand and what
TeX would do.

> The only thorough solution, IMO, is to assume the file names are
> encoded in the filesystem as specified by the locale's codeset.  That,
> too, can be false, but at least in the absolute majority of use cases
> it will be true.  The only better solution is to let the user specify
> the file-name encoding.

The locale codeset could very easily be incorrect.  Suppose somebody sets
a Latin-1 locale, should they then be unable to build Texinfo manuals
with non-ASCII UTF-8 filenames?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]