octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #55452] fopen() does not support encoding argu


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #55452] fopen() does not support encoding argument
Date: Sat, 9 Mar 2019 11:10:59 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

Follow-up Comment #10, bug #55452 (project octave):

Thanks for your testing.
I only checked with fprintf (fid, "%s", string), and fscanf (fid, "%s")
before. I didn't have a look at "fgetl" yet. It looks like these functions
take different code paths.

If I replace the function "slurp_file_one_line" in your test suite with the
following, the results look a little bit better:

function out = slurp_file_one_line (file, encoding)
  try
    [fh, msg] = fopen (file, "r", "native", encoding);
    if fh < 0
      error ("Failed opening file for reading: %s: %s", msg, file);
    endif
    # out = fgetl (fh);
    out = fscanf (fh, "%s");
    fclose (fh);
    out = out(:)';
  catch err
    err
    out = "";
  end_try_catch
  
endfunction



On Windows, there seem to be at least two more different bugs:

>> run_bug_55452_tests
Running fixed-text encoded file test ex-001:
Reference text: Hello,world! (12 chars)
running: ex-001 ISO-8859-1
  decoded: Hello,world! (12 chars)
  ok: ex-001 ISO-8859-1
running: ex-001 ISO-8859-15
  decoded: Hello,world! (12 chars)
  ok: ex-001 ISO-8859-15
running: ex-001 KOI8-R
  decoded: Hello,world! (12 chars)
  ok: ex-001 KOI8-R
running: ex-001 SHIFT_JIS
  decoded: Hello,world! (12 chars)
  ok: ex-001 SHIFT_JIS
running: ex-001 UTF-16
  decoded: Hello,world! (12 chars)
  ok: ex-001 UTF-16
running: ex-001 UTF-16 no-bom
err =

  scalar structure containing the fields:

    message = fopen: conversion from codepage 'utf-16' not supported
    identifier =
    stack =

      3x1 struct array containing the fields:

        file
        name
        line
        column
        scope


  decoded:  (0 chars)
  FAIL: ex-001 UTF-16 no-bom
Running fixed-text encoded file test ex-002:
Reference text: あありりががととうう丸丸 (18 chars)
running: ex-002 SHIFT_JIS
  decoded: あありりががととうう丸丸 (18 chars)
  ok: ex-002 SHIFT_JIS
running: ex-002 UTF-16
  decoded: あありりががととうう丸丸 (18 chars)
  ok: ex-002 UTF-16
Running fixed-text encoded file test ex-003:
Reference text: KaßnerÖkonomSchöpsÜbermutMüller (36 chars)
running: ex-003 ISO-8859-1
  decoded: KaßnerÖkonomSchöpsÜbermutMüller (36 chars)
  ok: ex-003 ISO-8859-1
running: ex-003 UTF-16
  decoded: KaßnerÖkonomSchöpsÜbermutMüller (36 chars)
  ok: ex-003 UTF-16


There doesn't seem to be a convenient function to get the number of characters
in a string straight away (or I forgot about it). "numel" returns the number
of bytes in the char array. Maybe "max (unicode_idx (str))" would be more
correct.

Back on topic:
In the f* family of functions, I think that "fwrite" and "fread" should ignore
the encoding and just handle "pure bytes".
"fputs" and "fprintf" (%s format arguments and the format string itself)
should probably convert to the specified encoding.
"fgetl" and "fscanf" (%s format arguments) should be converted from the
specified encoding.
I am not sure how to handle "fgets": Should we just read one byte and return
that? Or should we make sure that we read one character (whatever the number
of bytes necessary)?

Please let me know if I'm missing something.

I never worked with multi-byte encodings like SHIFT-JIS. How do they encode
ASCII characters? I am wondering if fprintf correctly treads the format string
on current default.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?55452>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]