[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with
From: |
Markus Mützel |
Subject: |
[Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf() |
Date: |
Fri, 22 Dec 2017 07:31:41 -0500 (EST) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 |
Follow-up Comment #12, bug #52681 (project octave):
@Dan: Thank you for double checking for side effects. After re-evaluating, I
still think that the patch does the correct change:
The double variable "data" is the "fortran_vec" of an Octave "Matrix" type.
As you already wrote, the range of values in character conversions comes out
to be [-128:127] (without the patch). Later on "convert_to_str" is called on
that matrix because there is no numeric data type in the conversion string
("%s") in the example in comment #0. That came through the lines of code in
comment #1 where all values outside the range of [0:255] are set to 0. This is
where the second bytes of the double byte UTF-8 characters were lost.
With the cast to "unsigned char", the range in "data" is [0:255], matching
what Octave expects for chars.
This also means that for mixed conversion string (e.g. "%s %s %f") where the
output of (f)scanf is a double vector, the range for characters changes with
the patch. E.g., if one byte of a string ("%c", "%s" or "[]") was read as -20
before the patch, it will be read as 236 after the patch.
But I think that will be more consistent than the current behavior because
Octave's character type ranges from 0 to 255, too. So a user would probably
expect that the results of the scanf family of functions were of the same
range (and would not depend on the used compiler).
Hence I also think that the change should apply unconditionally to "%c", "%s"
and "[]" conversions (as it does with the supplied patch).
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?52681>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), (continued)
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Andreas Weber, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Markus Mützel, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/17
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Markus Mützel, 2017/12/21
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/21
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(),
Markus Mützel <=
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Markus Mützel, 2017/12/23
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Rik, 2017/12/26
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Dan Sebald, 2017/12/26
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Rik, 2017/12/26
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), John W. Eaton, 2017/12/27
- [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), Markus Mützel, 2017/12/30