pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggestions


From: Ben Pfaff
Subject: Re: Suggestions
Date: Thu, 29 Jan 2009 22:09:42 -0800
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)

Rémi Dewitte <address@hidden> writes:

> On Thu, Jan 29, 2009 at 06:40, Ben Pfaff <address@hidden> wrote:
>
>     Rémi Dewitte <address@hidden> writes:
>    
>     > Working with pspp-0.6.1 I am glad it works fine. Nevertheless I
>     encountered
>     > two minor issues for which I have patched a bit pspp.
>     >
>     > First one is the ability to import CSV file with DOS endlines. I don't
>     know
>     > whether it is the right place to trim the '\r'.
>    
>     This is not the right place to do this.
>
> You might give me some clues...

Sure, I was just in a hurry when I wrote that.

Here is my suggested substitute fix.  I have not had a chance to
test that it works.  Will you test it and report your results?

Reviews welcome from everyone else too, of course.

Thanks!

commit f4cc711051121873dd2e11436b10dd829094bdb9
Author: Ben Pfaff <address@hidden>
Date:   Thu Jan 29 22:01:27 2009 -0800

    Accept LF, CR LF, and LF as new-line sequences in data files.
    
    Until now, PSPP has used the host operating system's idea of the
    new-line sequence when reading data files and other text files.
    This means that, when a file with CR LF line ends is read on an OS
    that uses LF as new-line (e.g. an MS-DOS file on Unix), each line
    appears to have a CR at the the end.  This commit fixes the
    problem, by normalizing the new-line sequence at time of reading.
    
    This commit eliminates a performance optimization from
    ds_read_line(), because the getdelim() function that it used cannot
    be made to stop reading at one of two different delimiters.  If
    this causes a real performance regression, then the getndelim2
    function from gnulib could be used to restore the optimization.
    
    Thanks to Rémi Dewitte <address@hidden> for pointing out the problem
    and providing an initial patch.

commit 70a46fb66ae0de5e312c4fc007bddf65e8ea5ac9
Author: Ben Pfaff <address@hidden>
Date:   Thu Jan 29 22:08:43 2009 -0800

    Accept LF, CR LF, and LF as new-line sequences in data files.
    
    Until now, PSPP has used the host operating system's idea of the
    new-line sequence when reading data files and other text files.
    This means that, when a file with CR LF line ends is read on an OS
    that uses LF as new-line (e.g. an MS-DOS file on Unix), each line
    appears to have a CR at the the end.  This commit fixes the
    problem, by normalizing the new-line sequence at time of reading.
    
    This commit eliminates a performance optimization from
    ds_read_line(), because the getdelim() function that it used cannot
    be made to stop reading at one of two different delimiters.  If
    this causes a real performance regression, then the getndelim2
    function from gnulib could be used to restore the optimization.
    
    Thanks to Rémi Dewitte <address@hidden> for pointing out the problem
    and providing an initial patch.

diff --git a/src/libpspp/str.c b/src/libpspp/str.c
index d082672..f054c9e 100644
*** a/src/libpspp/str.c
--- b/src/libpspp/str.c
***************
*** 1,5 ****
  /* PSPP - a program for statistical analysis.
!    Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
  
     This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
--- 1,5 ----
  /* PSPP - a program for statistical analysis.
!    Copyright (C) 1997-9, 2000, 2006, 2009 Free Software Foundation, Inc.
  
     This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
***************
*** 1190,1237 ****
    return st->ss.string;
  }
  
! /* Appends to ST a newline-terminated line read from STREAM, but
!    no more than MAX_LENGTH characters.
!    Newline is the last character of ST on return, if encountering
!    a newline was the reason for terminating.
!    Returns true if at least one character was read from STREAM
!    and appended to ST, false if no characters at all were read
!    before an I/O error or end of file was encountered (or
!    MAX_LENGTH was 0). */
  bool
  ds_read_line (struct string *st, FILE *stream, size_t max_length)
  {
!   if (!st->ss.length && max_length == SIZE_MAX)
!     {
!       size_t capacity = st->capacity ? st->capacity + 1 : 0;
!       ssize_t n = getline (&st->ss.string, &capacity, stream);
!       if (capacity)
!         st->capacity = capacity - 1;
!       if (n > 0)
!         {
!           st->ss.length = n;
!           return true;
!         }
!       else
!         return false;
!     }
!   else
      {
!       size_t length;
  
!       for (length = 0; length < max_length; length++)
          {
!           int c = getc (stream);
!           if (c == EOF)
!             break;
! 
!           ds_put_char (st, c);
!           if (c == '\n')
!             return true;
          }
! 
!       return length > 0;
      }
  }
  
  /* Removes a comment introduced by `#' from ST,
--- 1190,1231 ----
    return st->ss.string;
  }
  
! /* Reads characters from STREAM and appends them to ST, stopping
!    after MAX_LENGTH characters, after appending a newline, or
!    after an I/O error or end of file was encountered, whichever
!    comes first.  Returns true if at least one character was added
!    to ST, false if no characters were read before an I/O error or
!    end of file (or if MAX_LENGTH was 0).
! 
!    This function accepts LF, CR LF, and CR sequences as new-line,
!    and translates each of them to a single '\n' new-line
!    character in ST. */
  bool
  ds_read_line (struct string *st, FILE *stream, size_t max_length)
  {
!   size_t length;
! 
!   for (length = 0; length < max_length; length++)
      {
!       int c = getc (stream);
!       if (c == EOF)
!         break;
  
!       if (c == '\r')
          {
!           c = getc (stream);
!           if (c != '\n')
!             {
!               ungetc (c, stream);
!               c = '\n';
!             }
          }
!       ds_put_char (st, c);
!       if (c == '\n')
!         return true;
      }
+ 
+   return length > 0;
  }
  
  /* Removes a comment introduced by `#' from ST,

-- 
Ben Pfaff 
http://benpfaff.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]