[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing CSV files
From: |
Ken Anderson |
Subject: |
Re: Parsing CSV files |
Date: |
Thu, 02 Oct 2003 09:30:34 -0400 |
This is an interesting approach, however it assumes that each field is followed
by a delimiter. In the CSV format that EXCEL uses, the end of a field is also
indicated by the end of line. Also, in EXCEL, a field that contains a
delimiter will be wrapped in double quotes, like "this, and that", and a double
quote is escaped by doubling it.
Here's an approach i use:
(define (csv-read port delimiter cell-action row-action)
(define (!)
(let ((c (read-char port)))
c))
(define k1 (lambda () (state (!))))
(define k2 (lambda () (row-action k1)))
(define (give-cell b k) (cell-action (list->string (reverse b)) k))
(define (state c)
(cond ((eqv? c delimiter) (cell-action "" k1))
((eqv? c #\") (state-string (!) '()))
((eqv? c #\newline) (row-action k1))
((eof-object? c) #t)
(else (state-any c '()))))
(define (state-string c b)
(cond ((eqv? c #\") (state-string-quote (!) b))
((not (eof-object? c)) (state-string (!) (cons c b)))))
(define (state-string-quote c b)
(cond ((eqv? c #\") (state-string c (cons c b))) ; Escaped double quote.
((eqv? c delimiter) (give-cell b k1))
((eqv? c #\newline) (give-cell b k2))
((eof-object? c) (give-cell b k2))
(else (error "Single double quote at unexpected place."))))
(define (state-any c b)
(cond ((eqv? c delimiter) (give-cell b k1))
((eqv? c #\newline) (give-cell b k2))
((eof-object? c) (give-cell b k2))
(else (state-any (!) (cons c b)))))
(state (!)))
This uses continuation passing style to separate the parsing from what
the user does with each cell and row.
(cell-action value k) is called with a value of the next cell and a
continuation, k to resume the computation.
(row-action k) is called at the end of a row, also with a continuation.
The state... procedures are a tail recursive finite state machine.
Here's an example of converting a csv file to a string of HTML:
(define (csv->html port)
(let ((result '("<html><table><tr>")))
(csv-read port #\,
(lambda (value k)
(set! result (cons "</td>" (cons value (cons "<td>" result))))
(k))
(lambda (k)
(set! result (cons "</tr><tr>" result))
(k)))
(apply string-append (reverse (cons "</html>" result)))))
k
At 11:23 PM 9/30/2003 +0200, Wolfgang Jaehrling wrote:
>Hi there!
>
>For those of you who want to read some interesting code, here is a
>program to parse a file in CSV (Comma Separated Value) format. I
>think it shows how one should use Scheme, but some might say it goes a
>bit too far... (and I'd like to receive comments on this topic.)
>
>Note `READ-TABLE' can be called with the source port as argument, or
>without an argument to use the current input port.
>
>;; Reading a table from a port where it resides in CSV format.
>;; Copyright (C) 2003 Wolfgang Jährling <address@hidden>
>;;
>;; This program is free software; you can redistribute it and/or modify
>;; it under the terms of the GNU General Public License as published by
>;; the Free Software Foundation; either version 2 of the License, or
>;; (at your option) any later version.
>;;
>;; This program is distributed in the hope that it will be useful,
>;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>;; GNU General Public License for more details.
>
>(define field-delimiter #\,)
>
>;; Return a procedure that calls CONSUMER with three arguments: The
>;; value returned by the PRODUCER applied to the procedures arguments,
>;; a list that is initially empty, and a thunk to restart this process
>;; with the value given by the PRODUCER added at the beginning of the
>;; list given to the CONSUMER.
>(define (collectrec producer consumer)
> (lambda args
> (letrec ((loop (lambda (lst)
> (let ((x (apply producer args)))
> (consumer x lst (lambda ()
> (loop (cons x lst))))))))
> (loop '()))))
>
>;; Read and return a field, that ends with the configured delimiter
>;; character, or return false at the end of a line, or the eof-object
>;; at end of file.
>(define read-field
> (collectrec read-char
> (lambda (c chars loop)
> (cond ((eof-object? c) c)
> ((char=? c field-delimiter)
> (apply string (reverse chars)))
> ((char=? c #\newline) #f)
> (else (loop))))))
>
>;; Read a line and split it up into a list of fields which gets
>;; returned, or false at the end of the file.
>(define read-row
> (collectrec read-field
> (lambda (f fields loop)
> (cond ((not f) (reverse fields))
> ((eof-object? f) #f)
> (else (loop))))))
>
>;; Read a table and return it as a list of rows, each row being a list
>;; of fields, which are strings.
>(define read-table
> (collectrec read-row
> (lambda (r rows loop)
> (if (not r)
> (reverse rows)
> (loop)))))
>
>;;;; End of code. ;;;;
>
>Cheers,
>GNU/Wolfgang
>
>--
>(define eq? (lambda (x y) #t)) ;; How could it be otherwise?
>
>
>_______________________________________________
>Guile-user mailing list
>address@hidden
>http://mail.gnu.org/mailman/listinfo/guile-user
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Parsing CSV files,
Ken Anderson <=