octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #59277] [octave-forge](io) xls2oct is slow whe


From: Dennis
Subject: [Octave-bug-tracker] [bug #59277] [octave-forge](io) xls2oct is slow when a spreadsheet contains many text cells
Date: Sun, 25 Oct 2020 07:53:58 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36

Follow-up Comment #23, bug #59277 (project octave):

Philip, regarding comment #20:
> Would you please look
> 1. if it still works
> 2. if you still have the expected speed gain
> ?

I tested you file and it still works. Also, it still has the performance
improvement that I had before.

In addition, I created another test Excel sheet and small script (attached).
With the __OCT_xlsx2oct__.m from v2.6.2 the output is:

Elapsed time is 161.678 seconds.
   #         Function Attr     Time (s)   Time (%)        Calls
---------------------------------------------------------------
  32         cell2mat            83.546      52.37       200060
  23           regexp            34.835      21.84       100060
  13 __OCT_xlsx2oct__            18.686      11.71           10
  24          cellfun             7.233       4.53       910340
  44       str2double             2.667       1.67           30


By far most of the time is spend on cell2mat

With the update you posted in comment #20, the output is:

Elapsed time is 53.296 seconds.
   #         Function Attr     Time (s)   Time (%)        Calls
---------------------------------------------------------------
  23           regexp            30.263      57.15           70
  33         cell2mat             8.937      16.88        20080
  13 __OCT_xlsx2oct__             3.050       5.76           10
  47       str2double             2.482       4.69           30
  26              cat             2.148       4.06        20090


That is a major improvement. I think we can consider this sufficiently fixed,
as most time indeed is spend on regexp.

I had another look, which regexp is responsible, and it is NOT L.99. I changed
the test script to loop only once and measured the time spend on all regexps I
could find. Here are the results:

L.99:  Elapsed time is 0.353959 seconds.
L.142: Elapsed time is 0.201676 seconds.
L.145: Elapsed time is 0.665056 seconds.
L.175: N/A
L.197: Elapsed time is 0.694173 seconds.
L.200: Elapsed time is 0.726311 seconds.
L.204: Elapsed time is 0.601505 seconds.
L.205: Elapsed time is 0.548254 seconds.


Clearly L.99 is not the biggest time consumer. Next, I profiles L.197-200 and
L.204-205. In both cases, the profiler shows that almost one second is spent
on regexp. So in these line, regexp is indeed the function responsible for
time consumption, not any other function. 

In conclusion, if you want to optimize the speed any further, this seems to be
the lines of code where most effort should go.

Regarding you question of credits, that is fine, thanks :-).

Cheers,
Dennis

(file #50093, file #50094)
    _______________________________________________________

Additional Item Attachment:

File name: Excel2019 stringTest1.xlsx     Size:82 KB
    <https://file.savannah.gnu.org/file/Excel2019
stringTest1.xlsx?file_id=50093>

File name: Test2.m                        Size:0 KB
    <https://file.savannah.gnu.org/file/Test2.m?file_id=50094>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59277>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]