octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #50619] textscan weird behaviour when reading


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #50619] textscan weird behaviour when reading a csv
Date: Sat, 25 Mar 2017 06:06:22 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0

Follow-up Comment #8, bug #50619 (project octave):

I've tracked this down a bit, so I'm just writing some notes here for
reference:

I printed out the "is.tellg()" for:


  void
  textscan::scan_string (delimited_stream& is, const textscan_format_elt&
fmt,
                         std::string& val) const
  {
    if (delim_list.is_empty ())
      {
        unsigned int i = 0;
        unsigned int width = fmt.width;

fprintf(stderr, "width=%d\n", width);
        for (i = 0; i < width; i++)
          {
fprintf(stderr,"+%d",i);
            if (i+1 > val.length ())
              val = val + val + ' ';      // grow even if empty
            int ch = is.get ();
            if (is_delim (ch) || ch == std::istream::traits_type::eof ())
              {
fprintf(stderr, "address = %u\n", is.tellg());
                is.putback (ch);
                break;
              }
            else
              val[i] = ch;
          }
        val = val.substr (0, i);          // trim pre-allocation
      }
    else  // Cell array of multi-character delimiters


Here's the result for the test case:


+0+1+2+3+4+5+6+7+8address = 7867337
+0+1+2+3+4+5+6+7+8+9address = 7867347
+0+1+2+3+4+5+6+7+8+9address = 7867357
+0+1+2+3+4+5address = 7867363
+0+1+2+3+4+5address = 7867369
+0+1+2+3+4+5+6+7+8+9+10+11address = 7867381
+0+1+2+3+4+5+6+7+8+9+10+11+12+13address = 7867343


What this is telling me is that the pointer advances as expected with the
is.get().  That is, the count of +1, etc. is the number of characters added to
the pointer's previous value to get (hopefully) the next pointer address. 
Except until the last field, the fourteen character "heading [deg]".  In that
case the pointer makes some odd jump, going backward (!), as we'd expect
7867381 + 14 = 7867395.

This stream:


    delimited_stream is (isp,
                         (delim_table.empty () ? whitespace + "\r\n" :
delims),
                         max_lookahead, buf_size);


isn't behaving nicely.  The max_lookahead is 3, and the buf_size is 80.  (I
recall somewhere else there being a buffer size of 4096...but don't take that
as being of some significance, as I don't quite understand the implication of
buf_size.)

I can see what is wrong.  See the delims passed into this delimited stream? 
Later in testing the ch = is.get() character with is_delim(ch), it's those
delims (a C++ std::string) that are looked for.  Going into that is()
instantiation is only ";".  So this delimited_stream doesn't recognize the
new-line character as a delimiter.  It's just another character, so the
delimiter stream keeps reading until hitting another ";" character.  There
must be some odd relationship between line length and buf_size that causes the
pointer to advance to some strange place in the next line for the next
textscan().  Note: I think that even though the col_headers looks to be
reading the "header [deg]" properly, I think it's not and somehow the new-line
character-plus (i.e., "\n5.2500000000000") is dropped somewhere along the way
when converted to cell-string.

So, as a little test, let's try putting ";\n" in for the delimiters in the
test code, i.e., textscan(file, formatSpec, 1, 'Delimiter', ";\n"):


+0+1+2+3+4+5+6+7+8address = 7866201
+0+1+2+3+4+5+6+7+8+9address = 7866211
+0+1+2+3+4+5+6+7+8+9address = 7866221
+0+1+2+3+4+5address = 7866227
+0+1+2+3+4+5address = 7866233
+0+1+2+3+4+5+6+7+8+9+10+11address = 7866245
+0+1+2+3+4+5+6+7+8+9+10+11+12+13address = 7866259


OK, now things look proper, i.e., 7866245 + 14 = 7866259.  Unfortunately, the
result still isn't quite correct:


octave:16> logLine
logLine = 
{
  [1,1] = 0
  [1,2] =  44
  [1,3] =  10
  [1,4] = 0
  [1,5] = 0
  [1,6] = 0
  [1,7] =  44.998
}


Better!  But the first entry isn't 5.25.  Again, some strange interaction with
the new-line character and placing it back into the stream, maybe?

That's where I am.  On the trail, I think, but only close so far.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50619>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]