octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #49222] octave-io 2.4.3: xls2oct with "OCT" i


From: Philip Nienhuis
Subject: [Octave-bug-tracker] [bug #49222] octave-io 2.4.3: xls2oct with "OCT" interface lost the ability to read german umlauts or °
Date: Thu, 29 Sep 2016 18:25:37 +0000 (UTC)
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40

Update of bug #49222 (project octave):

                Priority:              5 - Normal => 3 - Low                
                 Release:                   4.0.3 => other                  
        Operating System:               GNU/Linux => Any                    

    _______________________________________________________

Follow-up Comment #1:

Yes the change was done deliberately, indeed already for io-2.4.1 (see the
NEWS).

For the OCT interface spreadsheet I/O scripts there is no way to distinguish
double-byte chars from single-byte chars. I found that the OCT interface
returned gibberish for several spreadsheets and produced illegible stuff when
reading or (re)writing .xlsx spreadsheets produced or processed by LibreOffice
or Excel. Those programs often changed the encoding to unicode.

So I guess in a way you were lucky that you didn't hit problems. But I did.

Maybe there is a trick using the encoding pointer in the first line of the
various XML files in the .xlsx archives. All I ever saw there was "UTF-8". 
I'd be happy to include a fix but it must be proven thoroughly that it works
reliably in all cases.

The fact that OCT reads xml files using regexp may be interfering here. regexp
is much faster and has less memory overhead than a "real" XML parser but of
course every advantage comes at a price. 
There is an xmlread in the io package but it's Java-based and has Java
overhead. If Java is considered a good option after all then I'd suggest to
turn to Apache POI anyway as the POI interface works fine and reliably, has
much less maintenance overhead, and Apache POI itself is actively maintained.

As there are workarounds (Apache POI, UNO) I'll lower priority. 
I'll leave this report open but if no fix comes up in -say- 2 months I think
I'll close it with "won't fix".

BTW I see that you (Andreas) submitted the report but your name isn't listed
in "Originator name" ?

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?49222>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]