octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #60138] Dataframe package incorrectly parses c


From: Tasos Papastylianou
Subject: [Octave-bug-tracker] [bug #60138] Dataframe package incorrectly parses csv entries containing commas inside quotes
Date: Sat, 27 Feb 2021 11:21:04 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0

URL:
  <https://savannah.gnu.org/bugs/?60138>

                 Summary: Dataframe package incorrectly parses csv entries
containing commas inside quotes
                 Project: GNU Octave
            Submitted by: tpapastylianou
            Submitted on: Sat 27 Feb 2021 04:21:02 PM UTC
                Category: Octave Forge Package
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: Incorrect Result
                  Status: None
             Assigned to: None
         Originator Name: Tasos Papastylianou
        Originator Email: 
             Open/Closed: Open
                 Release: other
         Discussion Lock: Any
        Operating System: Any

    _______________________________________________________

Details:

This bug is in reference to this stackoverflow post:
https://stackoverflow.com/q/66389166/4183191

There is another bug in that question, which is already being dealt with by
#56263, namely that the second column is inappropriately truncated if it only
contains strings. Apparently this was fixed in the dev repo a few years back,
but still not released. This report is not for this bug.

This report is for another bug resulting from the same csv data. I reproduce
the offending data below:


"TIME","GEO","UNIT","S_ADJ","NA_ITEM","Value","Flag and Footnotes"
"1995Q1","Greece","Chain linked volumes, index 2010=100","Seasonally and
calendar adjusted data","Gross domestic product at market prices","72.5",""
"1995Q2","Greece","Chain linked volumes, index 2010=100","Seasonally and
calendar adjusted data","Gross domestic product at market prices","73.2",""


Suppose one attempts to load this file, e.g. as


D = dataframe( 'data.csv' );


As you can see, the above should result in a field UNIT with value: "Chain
linked volumes, index 2010=100"
However dataframe incorrectly treats the comma here as a delimiter, and
assigns the second part of this element to the S_ADJ field instead.

As a workaround, I note that the cell2csv function from the io package deals
with this correctly instead.

Therefore:


dataframe( cell2csv( 'data.csv' ) );


works as expected (barring bug #56263, that is).





    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60138>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]