Handle encoding of Octave strings

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handle encoding of Octave strings

From:	mmuetzel
Subject:	Handle encoding of Octave strings
Date:	Sun, 15 Apr 2018 04:40:23 -0700 (MST)

Octave Developers,

At the moment Octave strings are parsed as if they were a simple byte
stream. That means a (non-ASCII) character can be represented differently
depending on the encoding of the file the string comes from.
However, generally the user doesn't want (and shouldn't need) to care about
byte representation of a character. A character should always represent that
character no matter the encoding of the source file.

At the moment, we don't know the encoding of an Octave string when we handle
its content. That can lead to problems (e.g. bug #51210, bug #53646, ...).

To get things more consistent, I'd like to propose that the parser (or
lexer?) should take care of converting any source string to an encoding that
covers all Unicode characters when parsing m-files. Matlab uses UTF-16 (or
more specifically UCS-2). But since UTF-8 seems the predominant encoding on
Linux-y systems, I'd like to propose, we use that.

In a next step, we could take care of converting the strings to whatever
encoding we need when we pass it on (e.g. to UTF-16 for FreeType or Qt).

Any opinions? Hints where that should go?

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html

[Prev in Thread]

Current Thread

[Next in Thread]

Handle encoding of Octave strings, mmuetzel <=
- Re: Handle encoding of Octave strings, John W. Eaton, 2018/04/15
  - Re: Handle encoding of Octave strings, mmuetzel, 2018/04/15

Prev by Date: Re: 4.4 Release Checklist - Item 3 - Must Fix Bugs
Next by Date: Re: Handle encoding of Octave strings
Previous by thread: Improving BISTs that are known to fail with LLVM libc++
Next by thread: Re: Handle encoding of Octave strings
Index(es):
- Date
- Thread