[Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with

From:	Markus Mützel
Subject:	[Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf()
Date:	Fri, 22 Dec 2017 07:31:41 -0500 (EST)
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0

Follow-up Comment #12, bug #52681 (project octave):

@Dan: Thank you for double checking for side effects. After re-evaluating, I
still think that the patch does the correct change:

The double variable "data" is the "fortran_vec" of an Octave "Matrix" type. 
As you already wrote, the range of values in character conversions comes out
to be [-128:127] (without the patch).  Later on "convert_to_str" is called on
that matrix because there is no numeric data type in the conversion string
("%s") in the example in comment #0. That came through the lines of code in
comment #1 where all values outside the range of [0:255] are set to 0. This is
where the second bytes of the double byte UTF-8 characters were lost.
With the cast to "unsigned char", the range in "data" is [0:255], matching
what Octave expects for chars.
This also means that for mixed conversion string (e.g. "%s %s %f") where the
output of (f)scanf is a double vector, the range for characters changes with
the patch. E.g., if one byte of a string ("%c", "%s" or "[]") was read as -20
before the patch, it will be read as 236 after the patch.
But I think that will be more consistent than the current behavior because
Octave's character type ranges from 0 to 255, too. So a user would probably
expect that the results of the scanf family of functions were of the same
range (and would not depend on the used compiler).
Hence I also think that the change should apply unconditionally to "%c", "%s"
and "[]" conversions (as it does with the supplied patch).

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?52681>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf(), (continued)

Prev by Date: [Octave-bug-tracker] [bug #52725] text() expects different signs for semilogy() and plot() for negative data
Next by Date: [Octave-bug-tracker] [bug #52724] text() expects different signs for semilogy() and plot() for negative data;
Previous by thread: [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf()
Next by thread: [Octave-bug-tracker] [bug #52681] Bad reading for UTF-8 characters with fscanf()
Index(es):
- Date
- Thread