pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/18] rewrite PSPP lexer


From: John Darrington
Subject: Re: [PATCH 00/18] rewrite PSPP lexer
Date: Sun, 20 Mar 2011 09:58:03 +0000
User-agent: Mutt/1.5.18 (2008-05-17)

I had a look through all these patches and don't see any major 
problems.


Minor issues:
* I see that a lot of new files have the copyright year as 2010, 2011.
 As I understand it, the years listed are supposed to be those in which
 the changes were released (where "released" means made available to the
 public in any way), not necessarily the years in which the code was
 written.  So if this is the first time you're made these changes public,
 then 2010 is inappropriate.

* In light of recent discussions, the case insensitive hash should perhaps
 use the C locale toupper instead of the locale dependent one.  On the other
 hand the particular use case might require locale dependency. - On the
 third hand all sorts of things might break, if an string is placed into a
 hash table in oen locale, and lookup is performed in another, all hell 
 could break loose.  I think this deserves further consideration.

* In my opinion the test in i18n-test.c is tautological.  It's not testing
 anything except that you didn't make any typos in the recode_string_len
 function.  Maybe a short table of example strings with their expected lengths 
would
 be more useful?

* In the comment within encoding-guesser, there is the word "other;" which
 seems to be erroneous.


J'

On Sat, Mar 19, 2011 at 05:09:46PM -0700, Ben Pfaff wrote:
     These commits, taken together, rip out the PSPP lexer and replace
     it by one that is more flexible, easier to use, and better at handling
     character encodings.
     
     Ben Pfaff (18):
       data-reader: Remove unreachable "return" statements.
       file-name: Do not make output files line-buffered in fn_open().
       sys-file-reader: Refactor to clean up character encoding support.
       output: New function text_item_create_nocopy().
       str: New function ss_realloc().
       str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to
         ds_chomp_byte().
       str: New functions for checking for and removing string suffixes.
       hash-functions: New function hash_case_bytes().
       i18n: New function uc_name().
       i18n: New function recode_string_len().
       i18n: New functions for truncating strings in an arbitrary encoding.
       identifier: Rename token_type_to_string() and make a new version.
       i18n: New functions and data structure for obtaining encoding info.
       encoding-guesser: New library to guess the encoding of a text file.
       u8-istream: New library for reading a text file and recoding to
         UTF-8.
       segment: New library for low-level phase of lexical syntax analysis.
       scan: New library for high-level PSPP syntax lexical analysis.
       lexer: Reimplement for better testability and internationalization.
     
      NEWS                                               |   46 +-
      Smake                                              |    8 +-
      doc/dev/concepts.texi                              |   25 +-
      doc/flow-control.texi                              |   33 +-
      doc/invoking.texi                                  |   20 +-
      doc/language.texi                                  |   96 +-
      doc/utilities.texi                                 |   69 +-
      perl-module/PSPP.xs                                |   14 +-
      perl-module/t/Pspp.t                               |    4 +-
      src/data/automake.mk                               |    1 +
      src/data/dictionary.c                              |  174 +-
      src/data/dictionary.h                              |   14 +-
      src/data/file-handle-def.c                         |    8 +-
      src/data/file-name.c                               |   11 +-
      src/data/gnumeric-reader.h                         |    8 +-
      src/data/identifier.c                              |  124 +-
      src/data/identifier.h                              |   12 +-
      src/data/identifier2.c                             |  133 ++
      src/data/mrset.c                                   |   33 +-
      src/data/mrset.h                                   |    7 +-
      src/data/por-file-reader.c                         |   25 +-
      src/data/por-file-writer.c                         |    5 +-
      src/data/procedure.c                               |   19 +
      src/data/procedure.h                               |    4 +-
      src/data/sys-file-reader.c                         | 2084 
+++++++++++---------
      src/data/sys-file-reader.h                         |    2 +-
      src/data/sys-file-writer.c                         |   20 +-
      src/data/variable.c                                |  139 +-
      src/data/variable.h                                |    7 +-
      src/data/vector.c                                  |   15 +-
      src/data/vector.h                                  |    4 +-
      src/language/automake.mk                           |    6 -
      src/language/command.c                             |  158 +-
      src/language/command.def                           |   11 +-
      src/language/control/automake.mk                   |    3 +-
      src/language/control/do-if.c                       |   12 +-
      src/language/control/loop.c                        |    4 +-
      src/language/control/repeat.c                      |  714 +++-----
      src/language/control/temporary.c                   |    6 +-
      src/language/data-io/combine-files.c               |   21 +-
      src/language/data-io/data-list.c                   |    5 +-
      src/language/data-io/data-parser.c                 |    9 +-
      src/language/data-io/data-reader.c                 |   58 +-
      src/language/data-io/file-handle.q                 |   29 +-
      src/language/data-io/get-data.c                    |    8 +-
      src/language/data-io/inpt-pgm.c                    |   30 +-
      src/language/data-io/save-translate.c              |    2 +
      src/language/data-io/trim.c                        |    5 +-
      src/language/dictionary/apply-dictionary.c         |   11 +-
      src/language/dictionary/attributes.c               |   69 +-
      src/language/dictionary/missing-values.c           |   44 +-
      src/language/dictionary/modify-variables.c         |    4 +-
      src/language/dictionary/mrsets.c                   |   27 +-
      src/language/dictionary/numeric.c                  |   12 +-
      src/language/dictionary/rename-variables.c         |    3 +-
      src/language/dictionary/split-file.c               |    4 +-
      src/language/dictionary/sys-file-info.c            |   18 +-
      src/language/dictionary/value-labels.c             |   22 +-
      src/language/dictionary/variable-label.c           |   17 +-
      src/language/dictionary/vector.c                   |   13 +-
      src/language/dictionary/weight.c                   |    4 +-
      src/language/expressions/parse.c                   |   34 +-
      src/language/expressions/private.h                 |    5 +-
      src/language/lexer/automake.mk                     |    8 +
      src/language/lexer/include-path.c                  |   89 +
      .../msg-ui.h => language/lexer/include-path.h}     |   20 +-
      src/language/lexer/lexer.c                         | 2143 
+++++++++++---------
      src/language/lexer/lexer.h                         |  158 ++-
      src/language/lexer/q2c.c                           |    6 +-
      src/language/lexer/scan.c                          |  596 ++++++
      src/language/lexer/scan.h                          |   93 +
      src/language/lexer/segment.c                       | 1631 +++++++++++++++
      src/language/lexer/segment.h                       |  122 ++
      src/language/lexer/token.c                         |  173 ++
      src/language/{prompt.h => lexer/token.h}           |   36 +-
      src/language/lexer/value-parser.c                  |    2 +-
      src/language/lexer/variable-parser.c               |   17 +-
      src/language/lexer/variable-parser.h               |   10 +-
      src/language/prompt.c                              |   75 -
      src/language/stats/aggregate.c                     |   23 +-
      src/language/stats/autorecode.c                    |    3 +-
      src/language/stats/descriptives.c                  |   23 +-
      src/language/stats/flip.c                          |    8 +-
      src/language/stats/frequencies.q                   |   25 +-
      src/language/stats/npar.c                          |   77 +-
      src/language/stats/rank.q                          |   12 +-
      src/language/stats/sort-cases.c                    |    2 +-
      src/language/syntax-file.c                         |  144 --
      src/language/syntax-file.h                         |   25 -
      src/language/syntax-string-source.c                |  151 --
      src/language/syntax-string-source.h                |   33 -
      src/language/tests/format-guesser-test.c           |    2 +-
      src/language/tests/moments-test.c                  |    2 +-
      src/language/tests/paper-size.c                    |    2 +-
      src/language/utilities/cache.c                     |    4 +-
      src/language/utilities/cd.c                        |    8 +-
      src/language/utilities/date.c                      |    4 +-
      src/language/utilities/host.c                      |   14 +-
      src/language/utilities/include.c                   |  198 +-
      src/language/utilities/permissions.c               |   13 +-
      src/language/utilities/set.q                       |   13 +-
      src/language/utilities/title.c                     |   92 +-
      src/language/xforms/compute.c                      |    6 +-
      src/language/xforms/count.c                        |   26 +-
      src/language/xforms/fail.c                         |    8 +-
      src/language/xforms/recode.c                       |   45 +-
      src/language/xforms/sample.c                       |    4 +-
      src/language/xforms/select-if.c                    |    2 +-
      src/libpspp/automake.mk                            |   10 +-
      src/libpspp/encoding-guesser.c                     |  289 +++
      src/libpspp/encoding-guesser.h                     |  126 ++
      src/libpspp/getl.c                                 |  271 ---
      src/libpspp/getl.h                                 |  113 -
      src/libpspp/hash-functions.c                       |   18 +-
      src/libpspp/hash-functions.h                       |    1 +
      src/libpspp/i18n.c                                 |  345 ++++-
      src/libpspp/i18n.h                                 |   87 +-
      src/libpspp/message.c                              |  118 +-
      src/libpspp/message.h                              |   23 +-
      src/libpspp/msg-locator.c                          |   87 -
      src/libpspp/msg-locator.h                          |   34 -
      src/{ui/terminal/msg-ui.c => libpspp/prompt.c}     |   41 +-
      src/{language => libpspp}/prompt.h                 |   17 +-
      src/libpspp/str.c                                  |   54 +-
      src/libpspp/str.h                                  |   11 +-
      src/libpspp/u8-istream.c                           |  475 +++++
      src/libpspp/u8-istream.h                           |   45 +
      src/output/driver.c                                |   56 +-
      src/output/text-item.c                             |   12 +-
      src/output/text-item.h                             |    3 +-
      src/ui/gui/automake.mk                             |    2 -
      src/ui/gui/comments-dialog.c                       |   13 +-
      src/ui/gui/executor.c                              |   19 +-
      src/ui/gui/executor.h                              |    4 +-
      src/ui/gui/main.c                                  |   39 +-
      src/ui/gui/psppire-data-window.c                   |   16 +-
      src/ui/gui/psppire-dict.c                          |    9 +-
      src/ui/gui/psppire-syntax-window.c                 |   16 +-
      src/ui/gui/psppire-syntax-window.h                 |    1 -
      src/ui/gui/psppire-var-store.c                     |    7 +-
      src/ui/gui/psppire.c                               |   34 +-
      src/ui/gui/psppire.h                               |    6 +-
      src/ui/gui/syntax-editor-source.c                  |  130 --
      src/ui/gui/syntax-editor-source.h                  |   34 -
      src/ui/gui/text-data-import-dialog.c               |    4 +-
      src/ui/source-init-opts.c                          |   20 +-
      src/ui/source-init-opts.h                          |    4 +-
      src/ui/terminal/automake.mk                        |   13 +-
      src/ui/terminal/main.c                             |  113 +-
      src/ui/terminal/read-line.h                        |   31 -
      src/ui/terminal/terminal-opts.c                    |   58 +-
      src/ui/terminal/terminal-opts.h                    |    8 +-
      src/ui/terminal/{read-line.c => terminal-reader.c} |  310 ++--
      .../repeat.h => ui/terminal/terminal-reader.h}     |   11 +-
      tests/automake.mk                                  |   43 +
      tests/data/data-in.at                              |   64 +-
      tests/data/sys-file-reader.at                      |  295 +--
      tests/dissect-sysfile.c                            |    8 +-
      tests/language/control/do-repeat.at                |  101 +-
      tests/language/data-io/data-list.at                |    6 +-
      tests/language/data-io/get.at                      |    2 -
      tests/language/data-io/inpt-pgm.at                 |    6 +-
      tests/language/data-io/print.at                    |    2 +-
      tests/language/dictionary/missing-values.at        |    2 +-
      tests/language/dictionary/sys-file-info.at         |    4 +-
      tests/language/expressions/evaluate.at             |    9 +-
      tests/language/expressions/parse.at                |    2 +-
      tests/language/lexer/lexer.at                      |   43 +
      tests/language/lexer/q2c.at                        |    2 +-
      tests/language/lexer/scan-test.c                   |  217 ++
      tests/language/lexer/scan.at                       |  818 ++++++++
      tests/language/lexer/segment-test.c                |  318 +++
      tests/language/lexer/segment.at                    | 1070 ++++++++++
      tests/language/stats/aggregate.at                  |    4 +-
      tests/language/stats/rank.at                       |    6 +-
      tests/language/utilities/insert.at                 |   29 +-
      tests/libpspp/encoding-guesser-test.c              |  102 +
      tests/libpspp/encoding-guesser.at                  |  143 ++
      tests/libpspp/i18n-test.c                          |   68 +-
      tests/libpspp/i18n.at                              |  125 +-
      tests/libpspp/u8-istream-test.c                    |  126 ++
      tests/libpspp/u8-istream.at                        |  142 ++
      182 files changed, 12211 insertions(+), 5364 deletions(-)
      create mode 100644 src/data/identifier2.c
      create mode 100644 src/language/lexer/include-path.c
      rename src/{ui/terminal/msg-ui.h => language/lexer/include-path.h} (66%)
      create mode 100644 src/language/lexer/scan.c
      create mode 100644 src/language/lexer/scan.h
      create mode 100644 src/language/lexer/segment.c
      create mode 100644 src/language/lexer/segment.h
      create mode 100644 src/language/lexer/token.c
      copy src/language/{prompt.h => lexer/token.h} (50%)
      delete mode 100644 src/language/prompt.c
      delete mode 100644 src/language/syntax-file.c
      delete mode 100644 src/language/syntax-file.h
      delete mode 100644 src/language/syntax-string-source.c
      delete mode 100644 src/language/syntax-string-source.h
      create mode 100644 src/libpspp/encoding-guesser.c
      create mode 100644 src/libpspp/encoding-guesser.h
      delete mode 100644 src/libpspp/getl.c
      delete mode 100644 src/libpspp/getl.h
      delete mode 100644 src/libpspp/msg-locator.c
      delete mode 100644 src/libpspp/msg-locator.h
      rename src/{ui/terminal/msg-ui.c => libpspp/prompt.c} (58%)
      rename src/{language => libpspp}/prompt.h (69%)
      create mode 100644 src/libpspp/u8-istream.c
      create mode 100644 src/libpspp/u8-istream.h
      delete mode 100644 src/ui/gui/syntax-editor-source.c
      delete mode 100644 src/ui/gui/syntax-editor-source.h
      delete mode 100644 src/ui/terminal/read-line.h
      rename src/ui/terminal/{read-line.c => terminal-reader.c} (53%)
      rename src/{language/control/repeat.h => ui/terminal/terminal-reader.h} 
(77%)
      create mode 100644 tests/language/lexer/scan-test.c
      create mode 100644 tests/language/lexer/scan.at
      create mode 100644 tests/language/lexer/segment-test.c
      create mode 100644 tests/language/lexer/segment.at
      create mode 100644 tests/libpspp/encoding-guesser-test.c
      create mode 100644 tests/libpspp/encoding-guesser.at
      create mode 100644 tests/libpspp/u8-istream-test.c
      create mode 100644 tests/libpspp/u8-istream.at
     
     -- 
     1.7.2.3
     
     
     _______________________________________________
     pspp-dev mailing list
     address@hidden
     http://lists.gnu.org/mailman/listinfo/pspp-dev

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]