octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Date: Wed, 8 Nov 2017 16:34:54 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Follow-up Comment #34, bug #51871 (project octave):

A lot of this is no longer fresh in my mind.  But if you can get the patch
back on track I can look more closely.  I believe you are referring to
speed-up-load-ascii-v4.patch.  Also, looking at your EOL_tst.txt in hexedit, I
see that

0A
0D
0D,0A

are the valid EOLs.  So, my thinking, looking at the current state of the
patch, is that rather than

+      std::getline (is, retval);

all that need be done is devise a custom version of that function that handles
all EOLs, call it getline_alleol(is,retval).

According to the documentation

http://www.cplusplus.com/reference/string/string/getline/

"
Each extracted character is appended to the string as if its member push_back
was called.
"

Is this push_back() routine efficient?  That is, does the string object expand
its buffer on something other than a linear way, say doubling the size it
needs?  If so, it seems to me that one could write a string-based routine
utilizing push_back() in a fairly straightforward fashion:

istream& getline_alleol (istream& is, string& str);

i.e., extract a character at a time from istream is and test against the three
EOL scenarios above and if none of those then use str.push_back(c).  Something
like (there are other variants of the istream.get() routine):


istream& getline_alleol (istream& is, string& str) {
    int c;
    while ((c = is.get ()) != EOF) {
        if (c == 0x0A)
            break;
        elseif (c == 0x0D) {
            // Maybe this next check is extraneous if the
            // consequence of an empty line, i.e.,
            // 0x0D<emptyline>0x0A is simply that it is
            // ignored at higher levels, in which case the
            // 0x0A will be processed in the next pass.
            // Otherwise, there is slight ambiguity of
            // accepting any of the three EOL formats.
            if ((c = is.peek ()) != EOF) {
                 if (c == 0x0A)
                     is.ignore ();
            }
            break;
        } else
            str.push_back(c);
    }

    return is;
}


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]