locale encoding and core functions

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

locale encoding and core functions

From:	Markus Mützel
Subject:	locale encoding and core functions
Date:	Sat, 23 Feb 2019 10:12:56 +0100

TL;DR: Is there a way to get information whether an .m file is from Octave core 
or from a user function?

Some background:
With the upcoming Octave 5 it will be possible to set the mfile_encoding that 
will be used to read .m files. This is important because Octave has to know 
which encoding is used in the .m file to correctly display non-ASCII characters 
in strings (e.g. in the "workspace" view or in plots). This is done by 
converting from whatever encoding the user set up to UTF-8 and convert to 
whatever encoding necessary at any interfaces.
However, there is a problem when we read core .m files which are always encoded 
in UTF-8 (and not in the encoding the user set up). On conversion of these 
files from the locale encoding to UTF-8, non-ASCII characters result in garbled 
text. 
E.g. the German character "ä" encoded in UTF-8 is represented by two bytes: c3 
a4. Assume that users would set the mfile_encoding to "ISO 8859-1" (Latin1). 
Then these two bytes are interpreted as representing the two letters "Ã¤". This 
means that a string from a core .m file that contained the letter "ä" would 
display as "Ã¤" for those users.

None of the core .m files contain any non-ASCII characters at the moment. 
However, there are a few help texts in some Octave Forge packages that do. See 
also bug #55195 [1].

The conversion to UTF-8 is done in "file_reader::get_input" in the file 
"input.cc".
If we knew in that function that the file we read from was from the core (or an 
Octave Forge package), we could skip the conversion from the locale encoding to 
mitigate the problem.

So back to the initial question: Is there a way to pass this information down 
to that function?

Markus

PS: This problem mostly affects Windows users where the default mfile_encoding 
depends on the locale of Windows (see also bug #49685). But in general any user 
who would prefer to use an encoding other than UTF-8 in their .m file code 
would be affected by this bug.

[1]: https://savannah.gnu.org/bugs/index.php?55195
[2]: https://savannah.gnu.org/bugs/index.php?49685

[Prev in Thread]

Current Thread

[Next in Thread]

locale encoding and core functions, Markus Mützel <=

Prev by Date: Re: Build a portable linux binary?
Next by Date: Re: Build a portable linux binary?
Previous by thread: Help with properties()
Next by thread: octave-4.4.1-w64-64 equivalent on linux
Index(es):
- Date
- Thread