octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handle encoding of Octave strings


From: mmuetzel
Subject: Handle encoding of Octave strings
Date: Sun, 15 Apr 2018 04:40:23 -0700 (MST)

Octave Developers,

At the moment Octave strings are parsed as if they were a simple byte
stream. That means a (non-ASCII) character can be represented differently
depending on the encoding of the file the string comes from.
However, generally the user doesn't want (and shouldn't need) to care about
byte representation of a character. A character should always represent that
character no matter the encoding of the source file.

At the moment, we don't know the encoding of an Octave string when we handle
its content. That can lead to problems (e.g. bug #51210, bug #53646, ...).

To get things more consistent, I'd like to propose that the parser (or
lexer?) should take care of converting any source string to an encoding that
covers all Unicode characters when parsing m-files. Matlab uses UTF-16 (or
more specifically UCS-2). But since UTF-8 seems the predominant encoding on
Linux-y systems, I'd like to propose, we use that.

In a next step, we could take care of converting the strings to whatever
encoding we need when we pass it on (e.g. to UTF-16 for FreeType or Qt).

Any opinions? Hints where that should go?

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]