octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handle encoding of Octave strings


From: John W. Eaton
Subject: Re: Handle encoding of Octave strings
Date: Sun, 15 Apr 2018 07:38:23 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

On 04/15/2018 06:40 AM, mmuetzel wrote:
Octave Developers,

At the moment Octave strings are parsed as if they were a simple byte
stream. That means a (non-ASCII) character can be represented differently
depending on the encoding of the file the string comes from.
However, generally the user doesn't want (and shouldn't need) to care about
byte representation of a character. A character should always represent that
character no matter the encoding of the source file.

At the moment, we don't know the encoding of an Octave string when we handle
its content. That can lead to problems (e.g. bug #51210, bug #53646, ...).

To get things more consistent, I'd like to propose that the parser (or
lexer?) should take care of converting any source string to an encoding that
covers all Unicode characters when parsing m-files. Matlab uses UTF-16 (or
more specifically UCS-2). But since UTF-8 seems the predominant encoding on
Linux-y systems, I'd like to propose, we use that.

In a next step, we could take care of converting the strings to whatever
encoding we need when we pass it on (e.g. to UTF-16 for FreeType or Qt).

Any opinions? Hints where that should go?

I agree that we need to do something about this issue.

Should we care about exact compatibility with Matlab?

Is there a way to make this change incrementally?

jwe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]