pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 00/18] rewrite PSPP lexer


From: Ben Pfaff
Subject: [PATCH 00/18] rewrite PSPP lexer
Date: Sat, 19 Mar 2011 17:09:46 -0700

These commits, taken together, rip out the PSPP lexer and replace
it by one that is more flexible, easier to use, and better at handling
character encodings.

Ben Pfaff (18):
  data-reader: Remove unreachable "return" statements.
  file-name: Do not make output files line-buffered in fn_open().
  sys-file-reader: Refactor to clean up character encoding support.
  output: New function text_item_create_nocopy().
  str: New function ss_realloc().
  str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to
    ds_chomp_byte().
  str: New functions for checking for and removing string suffixes.
  hash-functions: New function hash_case_bytes().
  i18n: New function uc_name().
  i18n: New function recode_string_len().
  i18n: New functions for truncating strings in an arbitrary encoding.
  identifier: Rename token_type_to_string() and make a new version.
  i18n: New functions and data structure for obtaining encoding info.
  encoding-guesser: New library to guess the encoding of a text file.
  u8-istream: New library for reading a text file and recoding to
    UTF-8.
  segment: New library for low-level phase of lexical syntax analysis.
  scan: New library for high-level PSPP syntax lexical analysis.
  lexer: Reimplement for better testability and internationalization.

 NEWS                                               |   46 +-
 Smake                                              |    8 +-
 doc/dev/concepts.texi                              |   25 +-
 doc/flow-control.texi                              |   33 +-
 doc/invoking.texi                                  |   20 +-
 doc/language.texi                                  |   96 +-
 doc/utilities.texi                                 |   69 +-
 perl-module/PSPP.xs                                |   14 +-
 perl-module/t/Pspp.t                               |    4 +-
 src/data/automake.mk                               |    1 +
 src/data/dictionary.c                              |  174 +-
 src/data/dictionary.h                              |   14 +-
 src/data/file-handle-def.c                         |    8 +-
 src/data/file-name.c                               |   11 +-
 src/data/gnumeric-reader.h                         |    8 +-
 src/data/identifier.c                              |  124 +-
 src/data/identifier.h                              |   12 +-
 src/data/identifier2.c                             |  133 ++
 src/data/mrset.c                                   |   33 +-
 src/data/mrset.h                                   |    7 +-
 src/data/por-file-reader.c                         |   25 +-
 src/data/por-file-writer.c                         |    5 +-
 src/data/procedure.c                               |   19 +
 src/data/procedure.h                               |    4 +-
 src/data/sys-file-reader.c                         | 2084 +++++++++++---------
 src/data/sys-file-reader.h                         |    2 +-
 src/data/sys-file-writer.c                         |   20 +-
 src/data/variable.c                                |  139 +-
 src/data/variable.h                                |    7 +-
 src/data/vector.c                                  |   15 +-
 src/data/vector.h                                  |    4 +-
 src/language/automake.mk                           |    6 -
 src/language/command.c                             |  158 +-
 src/language/command.def                           |   11 +-
 src/language/control/automake.mk                   |    3 +-
 src/language/control/do-if.c                       |   12 +-
 src/language/control/loop.c                        |    4 +-
 src/language/control/repeat.c                      |  714 +++-----
 src/language/control/temporary.c                   |    6 +-
 src/language/data-io/combine-files.c               |   21 +-
 src/language/data-io/data-list.c                   |    5 +-
 src/language/data-io/data-parser.c                 |    9 +-
 src/language/data-io/data-reader.c                 |   58 +-
 src/language/data-io/file-handle.q                 |   29 +-
 src/language/data-io/get-data.c                    |    8 +-
 src/language/data-io/inpt-pgm.c                    |   30 +-
 src/language/data-io/save-translate.c              |    2 +
 src/language/data-io/trim.c                        |    5 +-
 src/language/dictionary/apply-dictionary.c         |   11 +-
 src/language/dictionary/attributes.c               |   69 +-
 src/language/dictionary/missing-values.c           |   44 +-
 src/language/dictionary/modify-variables.c         |    4 +-
 src/language/dictionary/mrsets.c                   |   27 +-
 src/language/dictionary/numeric.c                  |   12 +-
 src/language/dictionary/rename-variables.c         |    3 +-
 src/language/dictionary/split-file.c               |    4 +-
 src/language/dictionary/sys-file-info.c            |   18 +-
 src/language/dictionary/value-labels.c             |   22 +-
 src/language/dictionary/variable-label.c           |   17 +-
 src/language/dictionary/vector.c                   |   13 +-
 src/language/dictionary/weight.c                   |    4 +-
 src/language/expressions/parse.c                   |   34 +-
 src/language/expressions/private.h                 |    5 +-
 src/language/lexer/automake.mk                     |    8 +
 src/language/lexer/include-path.c                  |   89 +
 .../msg-ui.h => language/lexer/include-path.h}     |   20 +-
 src/language/lexer/lexer.c                         | 2143 +++++++++++---------
 src/language/lexer/lexer.h                         |  158 ++-
 src/language/lexer/q2c.c                           |    6 +-
 src/language/lexer/scan.c                          |  596 ++++++
 src/language/lexer/scan.h                          |   93 +
 src/language/lexer/segment.c                       | 1631 +++++++++++++++
 src/language/lexer/segment.h                       |  122 ++
 src/language/lexer/token.c                         |  173 ++
 src/language/{prompt.h => lexer/token.h}           |   36 +-
 src/language/lexer/value-parser.c                  |    2 +-
 src/language/lexer/variable-parser.c               |   17 +-
 src/language/lexer/variable-parser.h               |   10 +-
 src/language/prompt.c                              |   75 -
 src/language/stats/aggregate.c                     |   23 +-
 src/language/stats/autorecode.c                    |    3 +-
 src/language/stats/descriptives.c                  |   23 +-
 src/language/stats/flip.c                          |    8 +-
 src/language/stats/frequencies.q                   |   25 +-
 src/language/stats/npar.c                          |   77 +-
 src/language/stats/rank.q                          |   12 +-
 src/language/stats/sort-cases.c                    |    2 +-
 src/language/syntax-file.c                         |  144 --
 src/language/syntax-file.h                         |   25 -
 src/language/syntax-string-source.c                |  151 --
 src/language/syntax-string-source.h                |   33 -
 src/language/tests/format-guesser-test.c           |    2 +-
 src/language/tests/moments-test.c                  |    2 +-
 src/language/tests/paper-size.c                    |    2 +-
 src/language/utilities/cache.c                     |    4 +-
 src/language/utilities/cd.c                        |    8 +-
 src/language/utilities/date.c                      |    4 +-
 src/language/utilities/host.c                      |   14 +-
 src/language/utilities/include.c                   |  198 +-
 src/language/utilities/permissions.c               |   13 +-
 src/language/utilities/set.q                       |   13 +-
 src/language/utilities/title.c                     |   92 +-
 src/language/xforms/compute.c                      |    6 +-
 src/language/xforms/count.c                        |   26 +-
 src/language/xforms/fail.c                         |    8 +-
 src/language/xforms/recode.c                       |   45 +-
 src/language/xforms/sample.c                       |    4 +-
 src/language/xforms/select-if.c                    |    2 +-
 src/libpspp/automake.mk                            |   10 +-
 src/libpspp/encoding-guesser.c                     |  289 +++
 src/libpspp/encoding-guesser.h                     |  126 ++
 src/libpspp/getl.c                                 |  271 ---
 src/libpspp/getl.h                                 |  113 -
 src/libpspp/hash-functions.c                       |   18 +-
 src/libpspp/hash-functions.h                       |    1 +
 src/libpspp/i18n.c                                 |  345 ++++-
 src/libpspp/i18n.h                                 |   87 +-
 src/libpspp/message.c                              |  118 +-
 src/libpspp/message.h                              |   23 +-
 src/libpspp/msg-locator.c                          |   87 -
 src/libpspp/msg-locator.h                          |   34 -
 src/{ui/terminal/msg-ui.c => libpspp/prompt.c}     |   41 +-
 src/{language => libpspp}/prompt.h                 |   17 +-
 src/libpspp/str.c                                  |   54 +-
 src/libpspp/str.h                                  |   11 +-
 src/libpspp/u8-istream.c                           |  475 +++++
 src/libpspp/u8-istream.h                           |   45 +
 src/output/driver.c                                |   56 +-
 src/output/text-item.c                             |   12 +-
 src/output/text-item.h                             |    3 +-
 src/ui/gui/automake.mk                             |    2 -
 src/ui/gui/comments-dialog.c                       |   13 +-
 src/ui/gui/executor.c                              |   19 +-
 src/ui/gui/executor.h                              |    4 +-
 src/ui/gui/main.c                                  |   39 +-
 src/ui/gui/psppire-data-window.c                   |   16 +-
 src/ui/gui/psppire-dict.c                          |    9 +-
 src/ui/gui/psppire-syntax-window.c                 |   16 +-
 src/ui/gui/psppire-syntax-window.h                 |    1 -
 src/ui/gui/psppire-var-store.c                     |    7 +-
 src/ui/gui/psppire.c                               |   34 +-
 src/ui/gui/psppire.h                               |    6 +-
 src/ui/gui/syntax-editor-source.c                  |  130 --
 src/ui/gui/syntax-editor-source.h                  |   34 -
 src/ui/gui/text-data-import-dialog.c               |    4 +-
 src/ui/source-init-opts.c                          |   20 +-
 src/ui/source-init-opts.h                          |    4 +-
 src/ui/terminal/automake.mk                        |   13 +-
 src/ui/terminal/main.c                             |  113 +-
 src/ui/terminal/read-line.h                        |   31 -
 src/ui/terminal/terminal-opts.c                    |   58 +-
 src/ui/terminal/terminal-opts.h                    |    8 +-
 src/ui/terminal/{read-line.c => terminal-reader.c} |  310 ++--
 .../repeat.h => ui/terminal/terminal-reader.h}     |   11 +-
 tests/automake.mk                                  |   43 +
 tests/data/data-in.at                              |   64 +-
 tests/data/sys-file-reader.at                      |  295 +--
 tests/dissect-sysfile.c                            |    8 +-
 tests/language/control/do-repeat.at                |  101 +-
 tests/language/data-io/data-list.at                |    6 +-
 tests/language/data-io/get.at                      |    2 -
 tests/language/data-io/inpt-pgm.at                 |    6 +-
 tests/language/data-io/print.at                    |    2 +-
 tests/language/dictionary/missing-values.at        |    2 +-
 tests/language/dictionary/sys-file-info.at         |    4 +-
 tests/language/expressions/evaluate.at             |    9 +-
 tests/language/expressions/parse.at                |    2 +-
 tests/language/lexer/lexer.at                      |   43 +
 tests/language/lexer/q2c.at                        |    2 +-
 tests/language/lexer/scan-test.c                   |  217 ++
 tests/language/lexer/scan.at                       |  818 ++++++++
 tests/language/lexer/segment-test.c                |  318 +++
 tests/language/lexer/segment.at                    | 1070 ++++++++++
 tests/language/stats/aggregate.at                  |    4 +-
 tests/language/stats/rank.at                       |    6 +-
 tests/language/utilities/insert.at                 |   29 +-
 tests/libpspp/encoding-guesser-test.c              |  102 +
 tests/libpspp/encoding-guesser.at                  |  143 ++
 tests/libpspp/i18n-test.c                          |   68 +-
 tests/libpspp/i18n.at                              |  125 +-
 tests/libpspp/u8-istream-test.c                    |  126 ++
 tests/libpspp/u8-istream.at                        |  142 ++
 182 files changed, 12211 insertions(+), 5364 deletions(-)
 create mode 100644 src/data/identifier2.c
 create mode 100644 src/language/lexer/include-path.c
 rename src/{ui/terminal/msg-ui.h => language/lexer/include-path.h} (66%)
 create mode 100644 src/language/lexer/scan.c
 create mode 100644 src/language/lexer/scan.h
 create mode 100644 src/language/lexer/segment.c
 create mode 100644 src/language/lexer/segment.h
 create mode 100644 src/language/lexer/token.c
 copy src/language/{prompt.h => lexer/token.h} (50%)
 delete mode 100644 src/language/prompt.c
 delete mode 100644 src/language/syntax-file.c
 delete mode 100644 src/language/syntax-file.h
 delete mode 100644 src/language/syntax-string-source.c
 delete mode 100644 src/language/syntax-string-source.h
 create mode 100644 src/libpspp/encoding-guesser.c
 create mode 100644 src/libpspp/encoding-guesser.h
 delete mode 100644 src/libpspp/getl.c
 delete mode 100644 src/libpspp/getl.h
 delete mode 100644 src/libpspp/msg-locator.c
 delete mode 100644 src/libpspp/msg-locator.h
 rename src/{ui/terminal/msg-ui.c => libpspp/prompt.c} (58%)
 rename src/{language => libpspp}/prompt.h (69%)
 create mode 100644 src/libpspp/u8-istream.c
 create mode 100644 src/libpspp/u8-istream.h
 delete mode 100644 src/ui/gui/syntax-editor-source.c
 delete mode 100644 src/ui/gui/syntax-editor-source.h
 delete mode 100644 src/ui/terminal/read-line.h
 rename src/ui/terminal/{read-line.c => terminal-reader.c} (53%)
 rename src/{language/control/repeat.h => ui/terminal/terminal-reader.h} (77%)
 create mode 100644 tests/language/lexer/scan-test.c
 create mode 100644 tests/language/lexer/scan.at
 create mode 100644 tests/language/lexer/segment-test.c
 create mode 100644 tests/language/lexer/segment.at
 create mode 100644 tests/libpspp/encoding-guesser-test.c
 create mode 100644 tests/libpspp/encoding-guesser.at
 create mode 100644 tests/libpspp/u8-istream-test.c
 create mode 100644 tests/libpspp/u8-istream.at

-- 
1.7.2.3




reply via email to

[Prev in Thread] Current Thread [Next in Thread]