octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #59203] [octave forge] (io) Problem with xlsre


From: Philip Nienhuis
Subject: [Octave-bug-tracker] [bug #59203] [octave forge] (io) Problem with xlsread importing accent marks
Date: Thu, 1 Oct 2020 14:47:09 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0

Update of bug #59203 (project octave):

                  Status:                    None => Confirmed              

    _______________________________________________________

Follow-up Comment #8:

David, I can confirm your issue with the file you uploaded.
ISTR similar issues and yes: this heavily smells of bug #49222.

As to your issue, it's only an issue with the COM interface; all other
interfaces that can read .xlsx, i.e. OCT (built-in), POI (Apache POI - Java
based) and UNO LibreOffice, using Java-based UNO bridge), do the right thing.

Anyway xlsread using COM does read the strings OK but doesn't convert them
into UTF8. Easily seen in the Variable Editor, see your own attached pic. A
related issue is that in Windows, Octave's command window still cannot show
unicode characters.

As a provisional workaround you can do the following after invoking xlsread:

texto = cellfun (@(x) char (unicode2utf8 (x)), texto, "uni", 0)

(unicode2utf8.m is included in the io package.)

I have to think about how to get this fixed; maybe not in time for a next io
release. Reading back bug #49222 from 4 years back, it looks like character
encoding in Octave is still a bit of a minefield.
cc'íng our encoding guru:
@Markus: what do you think, can / should we also unconditionally apply
unicode2utf8 to strings when reading with the COM interface? IIRC in the end
you weren't enthusiastic about this.

BTW: I hit & fixed a separate bug in xlsread, it should recognize options
structs but I forgot to implement that when I restructured the .ods/.xls for
io-2.6.0.
I'll think about how along that way xls2oct.m could invoke unicode2utf8 rather
than unicode2native, as is done now near the end of xls2oct.m if the
"convert_utf" option is set to true.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59203>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]