octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpect


From: Dennis
Subject: [Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpected
Date: Fri, 16 Oct 2020 10:53:07 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36

Follow-up Comment #6, bug #59277 (project octave):

As I have been really troubled by speed issues, I dived into the problem some
further. I know understand why additional sheets can have impact on the
performance. This is due to the way xlsx sheets are constructed. All plain
strings of all worksheets are stored in one xml file. When this gets big, many
cell2mat calls are made.

I have tested a very simple solution. In '__OCT_xlsx2oct__' I have replaced a
single line of code, namely line 146 by the new line 147 (see attached). This
line uses two cell2mat calls, and is part of a for loop. However, as far as I
can tell, the list of strings that is being processed here ALWAYS contains
single cells. That is why we could simply use {1}{1} instead of
cell2mat(cell2mat()). This is MUCH faster.

Using the attached test script and Excel, and uncommenting line 146 (with line
147 commented) (i.e. the original code), the result is:

Elapsed time is 0.393326 seconds.
   #         Function Attr     Time (s)   Time (%)        Calls
---------------------------------------------------------------
  10         cell2mat             0.156      41.26          537
   9           regexp             0.088      23.23          285
  24 __OCT_xlsx2oct__             0.050      13.20            3
  14          cellfun             0.017       4.38         2439
   2          xls2oct             0.015       3.92            3
  36       str2double             0.009       2.35           18
  50   parse_sp_range             0.008       2.15            3
  29            fread             0.005       1.34            6
  33              cat             0.004       1.15          285
   4         prefix !             0.002       0.62         2487
  27            fopen             0.002       0.61            6
  19             sort             0.002       0.60          255
  44          col2num             0.002       0.55          742
  15              all             0.002       0.50         1384
   7           ischar             0.002       0.44         1563
  57           strrep             0.001       0.39           18
  16             size             0.001       0.38          546
  40          reshape             0.001       0.29          282
  11           nargin             0.001       0.26          559
  63           strtok             0.001       0.22            3


Commenting line 146 and uncommenting line 147 in '__OCT_xlsx2oct__' yields:

Elapsed time is 0.174529 seconds.
   #         Function Attr     Time (s)   Time (%)        Calls
---------------------------------------------------------------
   9           regexp             0.079      47.75          285
  24 __OCT_xlsx2oct__             0.028      17.02            3
   2          xls2oct             0.013       7.55            3
  36       str2double             0.008       5.07           18
  50   parse_sp_range             0.006       3.88            3
  10         cell2mat             0.006       3.77           21
  14          cellfun             0.005       3.12          117
  29            fread             0.004       2.23            6
  33              cat             0.003       2.02           21
  27            fopen             0.002       1.24            6
  44          col2num             0.002       1.05          742
   7           ischar             0.001       0.86         1563
  57           strrep             0.001       0.77           18
  63           strtok             0.001       0.51            3
  41            clear             0.000       0.28            3
  51          deblank             0.000       0.26            3
  58            index             0.000       0.24            3
  31           fclose             0.000       0.24            6
   8       chknmrange             0.000       0.23            3
  35          strncmp             0.000       0.18           12


In my real life script, which uses a bigger Excel sheet, differences are much
more pronounced. As an example, using the original '__OCT_xlsx2oct__' code
yields:

Elapsed time is 10.1298 seconds.
   #                             Function Attr     Time (s)   Time (%)       
Calls
-----------------------------------------------------------------------------------
  81                             cell2mat             5.105      51.74       
20903
 101                     __OCT_xlsx2oct__             1.249      12.66        
  15
  80                               regexp             1.148      11.64       
10657
  29                              cellfun             0.434       4.39       
94317
  73                               system             0.402       4.07        
   1


While the newly proposed code yields:

Elapsed time is 3.81288 seconds.
   #                             Function Attr     Time (s)   Time (%)       
Calls
-----------------------------------------------------------------------------------
  80                               regexp             1.252      33.20       
10657
 101                     __OCT_xlsx2oct__             0.775      20.55        
  15
  73                               system             0.404      10.71        
   1
 143               @Recipe/calc_magistral             0.169       4.48        
   1
  28                                 load             0.145       3.84        
  16


That really makes a difference, the duration went from 10s to 7s.

@philipnienhuis, can you please check that this single updated line can be
implemented (i.e. the cell with strings indeed always contains a single nested
cell)? If so, could you please release a new version of io as soon as
possible?

NB: this solves the issue with additional sheets having an effect on speed,
but it doesn't solve the issue that xls2oct sometime uses UNO even tough OCT
is specified.



(file #49996, file #49997, file #49998)
    _______________________________________________________

Additional Item Attachment:

File name: __OCT_xlsx2oct__.m             Size:9 KB
    <https://file.savannah.gnu.org/file/__OCT_xlsx2oct__.m?file_id=49996>

File name: BLAAAAA.m                      Size:0 KB
    <https://file.savannah.gnu.org/file/BLAAAAA.m?file_id=49997>

File name: test.xlsx                      Size:65 KB
    <https://file.savannah.gnu.org/file/test.xlsx?file_id=49998>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59277>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]