octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" a


From: Nicholas Jankowski
Subject: [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?
Date: Tue, 18 Feb 2020 17:18:43 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36

Follow-up Comment #3, bug #57596 (project octave):

TL;DR - the LEN argument in matlab specifies characters, even for multibyte
characters. octave should probably try to emulate that for compatibility
reasons. 


from a compatibility standpoint - Matlab file says fgets(FID, NCHAR).  it does
specifically use the word character to describe behavior of that input
parameter.  The help says it will read characters using the encoding scheme
associated with the file as per fopen. 

Using a UTF-8 test file [1], the first multibyte line is:


You should see the Greek word 'kosme':       "κόσμε"   


checking in Matlab 2019a:

>> abc=fopen("UTF-8 test file.html",'r','n',"UTF-8");
>> for idx=1:45,disp(fgets(abc)),end

<trimming output to reach multibyte test chars>

>> disp(fgets(abc,47));
You should see the Greek word 'kosme':       "κ
>> disp(fgets(abc,3));
όσμ


without reading file in as UTF-8, reading in that whole line looks like:


You should see the Greek word 'kosme':       "κόσμε"


[1] https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57596>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]