[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-ASCII characters in @include search path
From: |
Eli Zaretskii |
Subject: |
Re: Non-ASCII characters in @include search path |
Date: |
Sun, 20 Feb 2022 17:07:32 +0200 |
> From: Gavin Smith <gavinsmith0123@gmail.com>
> Date: Sun, 20 Feb 2022 14:44:13 +0000
> Cc: pertusus@free.fr, trash.paradise@protonmail.com, bug-texinfo@gnu.org
>
> On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
> > > This means that any non-ASCII characters in a filename in a Texinfo source
> > > file are sought in the filesystem as the corresponding UTF-8 sequences.
> >
> > This will not work on Windows.
>
> I can see that there could be an issue if files are copied onto a
> Windows filesystem, or extracted from a tar file. This
> would lead the byte sequences representing the filenames to change.
It can cause problems with any system. Although on Posix hosts most
people use UTF-8, but some still don't.
> Do you know if TeX distributions for Windows do any handling of filename
> encodings?
I never used non-ASCII variant of TeX (XeTeX, I guess?) on Windows, so
I don't know, sorry.
> A file could be in UTF-8 but need to refer to file names that
> are in UTF-16 on the filesystem. Would this work?
Not unless we change the encoding.
> > > A more thorough fix would obey @documentencoding and convert back to the
> > > original encoding, to retrieve the bytes that were present in the source
> > > file in case the file was not in UTF-8. I think it would be the most
> > > correct to always use the exact bytes that were in the source file as the
> > > name of the file (I assume that is what TeX would do).
> >
> > This assumes that the file name is encoded the same as the Texinfo
> > source. But that assumption is only true on the system where the
> > Texinfo file was written, and even there it could be false.
>
> I would only favour having special handing for Windows. On GNU/Linux we
> should assume that the byte sequence in the Texinfo file matches the
> filename exactly. This is the easiest behaviour to understand and what
> TeX would do.
But that won't work on systems whose locale's codeset is not UTF-8.
> > The only thorough solution, IMO, is to assume the file names are
> > encoded in the filesystem as specified by the locale's codeset. That,
> > too, can be false, but at least in the absolute majority of use cases
> > it will be true. The only better solution is to let the user specify
> > the file-name encoding.
>
> The locale codeset could very easily be incorrect. Suppose somebody sets
> a Latin-1 locale, should they then be unable to build Texinfo manuals
> with non-ASCII UTF-8 filenames?
They will see garbled file names.
You can try this yourself. E.g., Emacs lets you control the file-name
encoding, so you could create a file with Latin-1 encoded name on a
system whose locale's codeset is UTF-8. Then ask 'ls' or some GUI
file manager to display the file's name.
- Re: Non-ASCII characters in @include search path, (continued)
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/24
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/24
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/21
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/25
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/20
- Re: Non-ASCII characters in @include search path, pertusus, 2022/02/20
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/20
- Re: Non-ASCII characters in @include search path, pertusus, 2022/02/20
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/20
- Re: Non-ASCII characters in @include search path, pertusus, 2022/02/20
- Re: Non-ASCII characters in @include search path,
Eli Zaretskii <=
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/20
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/20
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/21
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/21
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/20
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/20
- Re: Non-ASCII characters in @include search path, Gaël Bonithon, 2022/02/19
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/20
- Re: Non-ASCII characters in @include search path, Gaël Bonithon, 2022/02/20
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/19