octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #34734] problems with latest strread (newlines


From: Philip Nienhuis
Subject: [Octave-bug-tracker] [bug #34734] problems with latest strread (newlines, spaces and commas)
Date: Sun, 06 Nov 2011 22:02:37 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Update of bug #34734 (project octave):

                  Status:             In Progress => Patch Submitted        

    _______________________________________________________

Follow-up Comment #4:

After having looked at this in more detail, I'm afraid I'll have to give up on
this one.

This is due to the way strread has been made by Eric Chassande and later
Jaroslav & Soren:
the input text string is split up into a cell array along delimiters; later
the data is parsed, assuming that the "data columns" correspond to cells at
regular intervals in the cell array, where the interval (periodicity) equals
the number of data columns.

Now, ML seems to have two (or more?) levels of delimiters:
(1) Whitespace (always for numeric fields) - but see (3) below;
(2) Delimiters (which for text fields replace whitespace, but for numeric
fields seem to augment whitespace)
(3) Moreover it turns out that ML doesn't actually uses whitespace or
delimiters for numeric fields, it rather just quits interpreting a numeric
field if the next character doesn't fit into a number template; it then
assumes the next character is part of the next field (e.g., look at C(2)):

>> C = textscan ('1a 2 3 , 4 5, , 6', '%d%s', 'delimiter', ',', 'whitespace',
'');
>> a = C(1); a{:}
ans =
           1
           4
           0
>> b = C(2); b{:}
ans = 
    'a 2 3 '
    ' 5'
    ' 6'
    
(ML's strread behaves identically)

ML's interpretation scheme simply cannot be implemented robustly in the Octave
implementation. I see no way that two types of "field delimiters" can be
invoked, especially if they do not line up vertically; Octave's scheme will
break then because mixed data type columns may get "out-of-phase. The above
example is a nice illustration.

ML's way of interpeting data can only be mimicked by a routine that linearly
ploughs through a text file and, at each end-of-field, determines what to do
based on the individual format conversion specifier in turn.

I've prepared a changeset based on the rough patch I gave earlier, augmented
by Ben's patches for the texinfo header + some overdue patches I submitted a
while ago in bug #33971.
As to current strread, I'm afraid that is far as we can stretch its
performance.

Hopefully jwe's upcoming compiled textscan will be more powerful.

Ben will you please review this changeset? TIA

(file #24315)
    _______________________________________________________

Additional Item Attachment:

File name: strread.patch                  Size:6 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?34734>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]