octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Date: Sun, 26 Nov 2017 13:39:12 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Follow-up Comment #43, bug #51871 (project octave):

OK, another rev attached (speed-up-load-ascii-v8.patch).  I did pursue the
alteration of a stream buffer filter, which I think is very elegant, but I
abandoned that approach and stuck with the custom-getline stand-alone function
approach.  There is an advantage to the custom-getline, which I will
illustrate, plus I think it is easier to follow for any programmer who might
otherwise not follow that there is some type of filter in stream.

The key piece of information to speeding the custom-getline (and
filter-stream-buffer underflow routine...pretty much the same code concept),
is the understanding that working with the istream's stream buffer directly is
faster than working with the istream's next higher level of methods.  This is
explained in the comment of the example given here:

https://stackoverflow.com/a/6089413
https://stackoverflow.com/questions/6089231/getting-std-ifstream-to-handle-lf-cr-and-crlf

In the attached patch I have pre-processor "#if 0" around three different
variations of the custom-getline, the first in which I applied a trick from
ASCII table properties, i.e., most characters of interest are greater than '%'
> '#' > '\r' > '\n' > EOF and two other variations that are patterned on a
switch statement similar to the example in the link above.  Take your pick,
and maybe the variations will give someone else an idea for efficiency
improvements.  [Why count's custom getline based on a stream buffer directly
turned out slower, I don't know.  Post the code if you like.]

So, the advantage of custom-getline?  As you might guess, seeing as I listed
some comment characters above, it is that we can move the handling of comments
within the custom-getline and eliminate the inefficient secondary search
(through the whole string), i.e.,


      // Remove any comment.
      size_t pos_comment = retval.find_first_of ("#%");
      if (pos_comment != std::string::npos)
        retval.erase (pos_comment);


Hence, I've called the custom-getline


  getline_alleol_sanscomment (std::istream& is, std::string& str)


Rik, you'll have to let me know if # or % can be valid characters within some
string field of the ASCII data.  If so, then the custom comment removal will
need to be made more complex, but I don't think it would hurt performance
any.

So, with that.  The new version now outperforms the variation that uses the
standard library getline():

current octave: 3.7360, 3.7440
octave + speed-up-load-ascii-v5.patch: 1.3640
octave + speed-up-load-ascii-v6.patch: 2.1980
octave + speed-up-load-ascii-v7.patch: 3.1040 
octave + speed-up-load-ascii-v8.patch: 1.2240

With that, I think there isn't too much more inefficiency that can be squeezed
out of this one, without going to some lower level methods.  This ASCII load
is just one of about four methods currently using an istream object as an
input, so I would tread lightly with trying to use C-level I/O and alter the
input object of all those other methods.

Oh, I should add that both of the test files pass for version
speed-up-load-ascii-v8.patch:


octave:13> x = load('EOL_tst.txt')
x =

   1   2   3
   4   5   6
   7   8   9
   1   2   3
   4   5   6
   7   8   9

octave:14> x = load('ERANGE_tst.txt')
x =

   1.0000e+128  -1.0000e+128
   1.0000e+256  -1.0000e+256
           Inf          -Inf
           Inf          -Inf


(file #42492)
    _______________________________________________________

Additional Item Attachment:

File name: speed-up-load-ascii-v8.patch   Size:13 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]