octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #55452] fopen() does not support encoding argu


From: Andrew Janke
Subject: [Octave-bug-tracker] [bug #55452] fopen() does not support encoding argument
Date: Sat, 9 Mar 2019 09:54:10 -0500 (EST)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36

Follow-up Comment #8, bug #55452 (project octave):

Got a build of the current default and ran my test. A couple failures:


>> run_bug_55452_tests
Running fixed-text encoded file test ex-001:
Reference text: Hello, world! (13 chars)
running: ex-001 ISO-8859-1
  decoded: Hello, world! (13 chars)
  ok: ex-001 ISO-8859-1
running: ex-001 ISO-8859-15
  decoded: Hello, world! (13 chars)
  ok: ex-001 ISO-8859-15
running: ex-001 KOI8-R
  decoded: Hello, world! (13 chars)
  ok: ex-001 KOI8-R
running: ex-001 SHIFT_JIS
  decoded: Hello, world! (13 chars)
  ok: ex-001 SHIFT_JIS
running: ex-001 UTF-16
  decoded: ��Hello, world! (28 chars)
  FAIL: ex-001 UTF-16
running: ex-001 UTF-16 no-bom
  decoded: Hello, world! (26 chars)
  FAIL: ex-001 UTF-16 no-bom
Running fixed-text encoded file test ex-002:
Reference text: ありがとう丸 (18 chars)
running: ex-002 SHIFT_JIS
  decoded: ���肪�Ƃ��� (12 chars)
  FAIL: ex-002 SHIFT_JIS
running: ex-002 UTF-16
  decoded: 0B0�0L0h0FN8 (13 chars)
  FAIL: ex-002 UTF-16
Running fixed-text encoded file test ex-003:
Reference text: Kaßner Ökonom Schöps Übermut Müller (40 chars)
running: ex-003 ISO-8859-1
  decoded: Ka�ner �konom Sch�ps �bermut M�ller (35 chars)
  FAIL: ex-003 ISO-8859-1
running: ex-003 UTF-16
  decoded: ��Ka�ner �konom Sch�ps �bermut M�ller (73 chars)
  FAIL: ex-003 UTF-16


Looks like a couple things going on here:

- The BOM in UTF-16 files looks like it's being propagated to the decoded
string. That probably shouldn't happen.

- UTF-16 encoded text is being turned in to too many chars. Looks like a \0
char is getting inserted between each ASCII-like char.


>> fh = fopen('encoded-files/ex-001/txt-UTF-16.txt'); line = fgetl (fh);
fclose (fh);
>> line
line = ��Hello, world!
>> line == 0
ans =
  0  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0
 1  0

>>


- ISO-8859-1 doesn't seem to get converted to UTF-8.

This brings up another question: How can I read an entire text file in,
without having to iterate over doing a fgetl() on each line? `fscanf (fid,
"%s")`? Would `fread (fid, '*char')` be expected to work? (What _are_ the
semantics for reading chars with fread() on a stream with a non-native
encoding?)

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?55452>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]