octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow


From: count
Subject: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Date: Wed, 30 Aug 2017 20:02:28 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

Follow-up Comment #14, bug #51871 (project octave):


== Response to the NA test (#6 #7 #8 #10) ==

Converting a string to a double *correctly* is not an easy task, see here for
a good explanation
<http://www.exploringbinary.com/how-strtod-works-and-sometimes-doesnt/>, you
may also have a glance at the glibc strtod implementation
<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_l.c;h=9fc9e4c0130f0ae97f29a9b343f18a2599e8ffcf;hb=HEAD>.
Thus some extra work to check the special value should not affect the speed.
What surprised me is the high overhead of stream based IO operations (I
optimized my own program then found that the bottleneck now on the Octave
side, which lead to this bug report).

I made a mistake in octave_read_fp_value(), which compares std::string to
char*. I have fix it in a newer patch based on the patch in #10.

The behaviour of octave_read_fp_value() is also tuned to match old one as
possible, specially the fallbit and stream pointer. Exception:


Input           old             this patch
1e1000          1.79769e+308    +inf
-1e1000         1.79769e+308    -inf
0X1.23P+45      0               3.99947e+13


But the sscan will still fail, see the end.

== Response to #12 ==

The line returned by std::getline() is scanned twince, one for comments, one
for convert the numbers. retval.back() and retval.pop_back() are fast. Not too
slow I think.

Ideally, scan once is enough: get one char and if it is not comment, push_back
to a std::string, then skip comment and '\n' or '\r' until reach a new line.
This is how the old way works, the big drawback is that "is.get()" is
incredibly slow compare to reading from a cache (e.g. buf[ptr++]).

Combining get_lines_and_columns () and get_mat_data_input_line () is a
possible optimization, since that reduce "malloc" for std::string.

----
Let's review what can be speed up (measure by tic; b = load('dat.txt'); toc,
baseline is 3.31289 sec):

With patch in #10, plus without get_lines_and_columns() (by hand input the nc
and nr): 0.981622 sec.

And no removal of any comment in get_mat_data_input_line(): 0.735889 sec.

And put get_mat_data_input_line() into get_mat_data_input_line() that
eliminate the use of std::sstream: 0.641812 sec.

The load-text mode is still faster for the same data set: 0.429227 sec. Note
that load-text also get benefit from the new strtod implimentation. Speed of
old one is: 0.582489 sec.

Look into load-text, the essential step is:


// libinterp/corefcn/load-save.cc (do_load) ->
// libinterp/corefcn/ls-oct-text.cc (read_text_data) ->
// libinterp/octave-value/ov-re-mat.cc (octave_matrix::load_ascii)
Matrix tmp (nr, nc);
is >> tmp;


But if I use it for get_mat_data_input_line(), the speed is slower somehow:
0.828483 sec.

----

The uploaded patch preserves get_lines_and_columns(), run time is about 1.04
second. Also a duplicated function for string to double conversion.

The failed tests are:


[val, count, msg, pos] = sscanf ("3I2", "%f");
ASSERT errors for:  assert (pos,2)

  Location  |  Observed  |  Expected  |  Reason
     ()           4            2         Abs err 2 exceeds tol 0

[val, count, msg, pos] = sscanf ("3In2", "%f");
!!!!! test failed
ASSERT errors for:  assert (pos,2)

  Location  |  Observed  |  Expected  |  Reason
     ()           5            2         Abs err 3 exceeds tol 0



(file #41703)
    _______________________________________________________

Additional Item Attachment:

File name: speed-up-load-ascii-v3.patch   Size:10 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]