Re: Non-ASCII characters in @include search path

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path

From:	Eli Zaretskii
Subject:	Re: Non-ASCII characters in @include search path
Date:	Sun, 20 Feb 2022 15:45:37 +0200

> Date: Sun, 20 Feb 2022 14:32:01 +0100
> From: pertusus@free.fr
> Cc: Gavin Smith <gavinsmith0123@gmail.com>, trash.paradise@protonmail.com,
>       bug-texinfo@gnu.org
> 
> On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote:
> > 
> > The only thorough solution, IMO, is to assume the file names are
> > encoded in the filesystem as specified by the locale's codeset.  That,
> > too, can be false, but at least in the absolute majority of use cases
> > it will be true.  The only better solution is to let the user specify
> > the file-name encoding.
> 
> I do not think that it is a good solution either.  On Linux, unless I
> missed something, the file name encoding should be utf-8 irrespective of
> the locale, or the Texinfo document encoding.

No, that's incorrect.  Linux filesystems don't care about the
file-name encoding, so any byte sequence will do; it just assigns
special meanings to 2 bytes: the null byte and the slash.

However, most users specify file names in the locale's encoding,
because otherwise they might be unable to see them correctly, type
them on the keyboard, etc.  And since most users use UTF-8 as the
locale's codeset, you get the effect that you thought is enforced by
the filesystem; it isn't.

> Maybe on Windows the locale and the file-name encoding match, then
> we should use that.

On Windows, file names on disk are actually encoded in UTF-16
(assuming NTFS filesystem).  However, since makeinfo and Perl are
console programs, their Windows builds can generally only use the
single-byte encoding of the locale (the so-called "system codepage");
Widows file-related API calls, like 'open', 'rename', 'delete', etc.,
convert the locale-encoded file names into UTF-16 internally.

> In any case, I do not think that letting the user specify the file-name
> encoding would be any good, we should manage to get it right.

The next best solution is IME to use the locale's codeset.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Non-ASCII characters in @include search path, (continued)

Prev by Date: Re: Non-ASCII characters in @include search path
Next by Date: Re: Non-ASCII characters in @include search path
Previous by thread: Re: Non-ASCII characters in @include search path
Next by thread: Re: Non-ASCII characters in @include search path
Index(es):
- Date
- Thread