[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 00/18] rewrite PSPP lexer
From: |
John Darrington |
Subject: |
Re: [PATCH 00/18] rewrite PSPP lexer |
Date: |
Tue, 22 Mar 2011 11:00:50 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
To build this, I had to add the gnulib module memrchr to Smake.
J'
On Sat, Mar 19, 2011 at 05:09:46PM -0700, Ben Pfaff wrote:
These commits, taken together, rip out the PSPP lexer and replace
it by one that is more flexible, easier to use, and better at handling
character encodings.
Ben Pfaff (18):
data-reader: Remove unreachable "return" statements.
file-name: Do not make output files line-buffered in fn_open().
sys-file-reader: Refactor to clean up character encoding support.
output: New function text_item_create_nocopy().
str: New function ss_realloc().
str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to
ds_chomp_byte().
str: New functions for checking for and removing string suffixes.
hash-functions: New function hash_case_bytes().
i18n: New function uc_name().
i18n: New function recode_string_len().
i18n: New functions for truncating strings in an arbitrary encoding.
identifier: Rename token_type_to_string() and make a new version.
i18n: New functions and data structure for obtaining encoding info.
encoding-guesser: New library to guess the encoding of a text file.
u8-istream: New library for reading a text file and recoding to
UTF-8.
segment: New library for low-level phase of lexical syntax analysis.
scan: New library for high-level PSPP syntax lexical analysis.
lexer: Reimplement for better testability and internationalization.
NEWS | 46 +-
Smake | 8 +-
doc/dev/concepts.texi | 25 +-
doc/flow-control.texi | 33 +-
doc/invoking.texi | 20 +-
doc/language.texi | 96 +-
doc/utilities.texi | 69 +-
perl-module/PSPP.xs | 14 +-
perl-module/t/Pspp.t | 4 +-
src/data/automake.mk | 1 +
src/data/dictionary.c | 174 +-
src/data/dictionary.h | 14 +-
src/data/file-handle-def.c | 8 +-
src/data/file-name.c | 11 +-
src/data/gnumeric-reader.h | 8 +-
src/data/identifier.c | 124 +-
src/data/identifier.h | 12 +-
src/data/identifier2.c | 133 ++
src/data/mrset.c | 33 +-
src/data/mrset.h | 7 +-
src/data/por-file-reader.c | 25 +-
src/data/por-file-writer.c | 5 +-
src/data/procedure.c | 19 +
src/data/procedure.h | 4 +-
src/data/sys-file-reader.c | 2084
+++++++++++---------
src/data/sys-file-reader.h | 2 +-
src/data/sys-file-writer.c | 20 +-
src/data/variable.c | 139 +-
src/data/variable.h | 7 +-
src/data/vector.c | 15 +-
src/data/vector.h | 4 +-
src/language/automake.mk | 6 -
src/language/command.c | 158 +-
src/language/command.def | 11 +-
src/language/control/automake.mk | 3 +-
src/language/control/do-if.c | 12 +-
src/language/control/loop.c | 4 +-
src/language/control/repeat.c | 714 +++-----
src/language/control/temporary.c | 6 +-
src/language/data-io/combine-files.c | 21 +-
src/language/data-io/data-list.c | 5 +-
src/language/data-io/data-parser.c | 9 +-
src/language/data-io/data-reader.c | 58 +-
src/language/data-io/file-handle.q | 29 +-
src/language/data-io/get-data.c | 8 +-
src/language/data-io/inpt-pgm.c | 30 +-
src/language/data-io/save-translate.c | 2 +
src/language/data-io/trim.c | 5 +-
src/language/dictionary/apply-dictionary.c | 11 +-
src/language/dictionary/attributes.c | 69 +-
src/language/dictionary/missing-values.c | 44 +-
src/language/dictionary/modify-variables.c | 4 +-
src/language/dictionary/mrsets.c | 27 +-
src/language/dictionary/numeric.c | 12 +-
src/language/dictionary/rename-variables.c | 3 +-
src/language/dictionary/split-file.c | 4 +-
src/language/dictionary/sys-file-info.c | 18 +-
src/language/dictionary/value-labels.c | 22 +-
src/language/dictionary/variable-label.c | 17 +-
src/language/dictionary/vector.c | 13 +-
src/language/dictionary/weight.c | 4 +-
src/language/expressions/parse.c | 34 +-
src/language/expressions/private.h | 5 +-
src/language/lexer/automake.mk | 8 +
src/language/lexer/include-path.c | 89 +
.../msg-ui.h => language/lexer/include-path.h} | 20 +-
src/language/lexer/lexer.c | 2143
+++++++++++---------
src/language/lexer/lexer.h | 158 ++-
src/language/lexer/q2c.c | 6 +-
src/language/lexer/scan.c | 596 ++++++
src/language/lexer/scan.h | 93 +
src/language/lexer/segment.c | 1631 +++++++++++++++
src/language/lexer/segment.h | 122 ++
src/language/lexer/token.c | 173 ++
src/language/{prompt.h => lexer/token.h} | 36 +-
src/language/lexer/value-parser.c | 2 +-
src/language/lexer/variable-parser.c | 17 +-
src/language/lexer/variable-parser.h | 10 +-
src/language/prompt.c | 75 -
src/language/stats/aggregate.c | 23 +-
src/language/stats/autorecode.c | 3 +-
src/language/stats/descriptives.c | 23 +-
src/language/stats/flip.c | 8 +-
src/language/stats/frequencies.q | 25 +-
src/language/stats/npar.c | 77 +-
src/language/stats/rank.q | 12 +-
src/language/stats/sort-cases.c | 2 +-
src/language/syntax-file.c | 144 --
src/language/syntax-file.h | 25 -
src/language/syntax-string-source.c | 151 --
src/language/syntax-string-source.h | 33 -
src/language/tests/format-guesser-test.c | 2 +-
src/language/tests/moments-test.c | 2 +-
src/language/tests/paper-size.c | 2 +-
src/language/utilities/cache.c | 4 +-
src/language/utilities/cd.c | 8 +-
src/language/utilities/date.c | 4 +-
src/language/utilities/host.c | 14 +-
src/language/utilities/include.c | 198 +-
src/language/utilities/permissions.c | 13 +-
src/language/utilities/set.q | 13 +-
src/language/utilities/title.c | 92 +-
src/language/xforms/compute.c | 6 +-
src/language/xforms/count.c | 26 +-
src/language/xforms/fail.c | 8 +-
src/language/xforms/recode.c | 45 +-
src/language/xforms/sample.c | 4 +-
src/language/xforms/select-if.c | 2 +-
src/libpspp/automake.mk | 10 +-
src/libpspp/encoding-guesser.c | 289 +++
src/libpspp/encoding-guesser.h | 126 ++
src/libpspp/getl.c | 271 ---
src/libpspp/getl.h | 113 -
src/libpspp/hash-functions.c | 18 +-
src/libpspp/hash-functions.h | 1 +
src/libpspp/i18n.c | 345 ++++-
src/libpspp/i18n.h | 87 +-
src/libpspp/message.c | 118 +-
src/libpspp/message.h | 23 +-
src/libpspp/msg-locator.c | 87 -
src/libpspp/msg-locator.h | 34 -
src/{ui/terminal/msg-ui.c => libpspp/prompt.c} | 41 +-
src/{language => libpspp}/prompt.h | 17 +-
src/libpspp/str.c | 54 +-
src/libpspp/str.h | 11 +-
src/libpspp/u8-istream.c | 475 +++++
src/libpspp/u8-istream.h | 45 +
src/output/driver.c | 56 +-
src/output/text-item.c | 12 +-
src/output/text-item.h | 3 +-
src/ui/gui/automake.mk | 2 -
src/ui/gui/comments-dialog.c | 13 +-
src/ui/gui/executor.c | 19 +-
src/ui/gui/executor.h | 4 +-
src/ui/gui/main.c | 39 +-
src/ui/gui/psppire-data-window.c | 16 +-
src/ui/gui/psppire-dict.c | 9 +-
src/ui/gui/psppire-syntax-window.c | 16 +-
src/ui/gui/psppire-syntax-window.h | 1 -
src/ui/gui/psppire-var-store.c | 7 +-
src/ui/gui/psppire.c | 34 +-
src/ui/gui/psppire.h | 6 +-
src/ui/gui/syntax-editor-source.c | 130 --
src/ui/gui/syntax-editor-source.h | 34 -
src/ui/gui/text-data-import-dialog.c | 4 +-
src/ui/source-init-opts.c | 20 +-
src/ui/source-init-opts.h | 4 +-
src/ui/terminal/automake.mk | 13 +-
src/ui/terminal/main.c | 113 +-
src/ui/terminal/read-line.h | 31 -
src/ui/terminal/terminal-opts.c | 58 +-
src/ui/terminal/terminal-opts.h | 8 +-
src/ui/terminal/{read-line.c => terminal-reader.c} | 310 ++--
.../repeat.h => ui/terminal/terminal-reader.h} | 11 +-
tests/automake.mk | 43 +
tests/data/data-in.at | 64 +-
tests/data/sys-file-reader.at | 295 +--
tests/dissect-sysfile.c | 8 +-
tests/language/control/do-repeat.at | 101 +-
tests/language/data-io/data-list.at | 6 +-
tests/language/data-io/get.at | 2 -
tests/language/data-io/inpt-pgm.at | 6 +-
tests/language/data-io/print.at | 2 +-
tests/language/dictionary/missing-values.at | 2 +-
tests/language/dictionary/sys-file-info.at | 4 +-
tests/language/expressions/evaluate.at | 9 +-
tests/language/expressions/parse.at | 2 +-
tests/language/lexer/lexer.at | 43 +
tests/language/lexer/q2c.at | 2 +-
tests/language/lexer/scan-test.c | 217 ++
tests/language/lexer/scan.at | 818 ++++++++
tests/language/lexer/segment-test.c | 318 +++
tests/language/lexer/segment.at | 1070 ++++++++++
tests/language/stats/aggregate.at | 4 +-
tests/language/stats/rank.at | 6 +-
tests/language/utilities/insert.at | 29 +-
tests/libpspp/encoding-guesser-test.c | 102 +
tests/libpspp/encoding-guesser.at | 143 ++
tests/libpspp/i18n-test.c | 68 +-
tests/libpspp/i18n.at | 125 +-
tests/libpspp/u8-istream-test.c | 126 ++
tests/libpspp/u8-istream.at | 142 ++
182 files changed, 12211 insertions(+), 5364 deletions(-)
create mode 100644 src/data/identifier2.c
create mode 100644 src/language/lexer/include-path.c
rename src/{ui/terminal/msg-ui.h => language/lexer/include-path.h} (66%)
create mode 100644 src/language/lexer/scan.c
create mode 100644 src/language/lexer/scan.h
create mode 100644 src/language/lexer/segment.c
create mode 100644 src/language/lexer/segment.h
create mode 100644 src/language/lexer/token.c
copy src/language/{prompt.h => lexer/token.h} (50%)
delete mode 100644 src/language/prompt.c
delete mode 100644 src/language/syntax-file.c
delete mode 100644 src/language/syntax-file.h
delete mode 100644 src/language/syntax-string-source.c
delete mode 100644 src/language/syntax-string-source.h
create mode 100644 src/libpspp/encoding-guesser.c
create mode 100644 src/libpspp/encoding-guesser.h
delete mode 100644 src/libpspp/getl.c
delete mode 100644 src/libpspp/getl.h
delete mode 100644 src/libpspp/msg-locator.c
delete mode 100644 src/libpspp/msg-locator.h
rename src/{ui/terminal/msg-ui.c => libpspp/prompt.c} (58%)
rename src/{language => libpspp}/prompt.h (69%)
create mode 100644 src/libpspp/u8-istream.c
create mode 100644 src/libpspp/u8-istream.h
delete mode 100644 src/ui/gui/syntax-editor-source.c
delete mode 100644 src/ui/gui/syntax-editor-source.h
delete mode 100644 src/ui/terminal/read-line.h
rename src/ui/terminal/{read-line.c => terminal-reader.c} (53%)
rename src/{language/control/repeat.h => ui/terminal/terminal-reader.h}
(77%)
create mode 100644 tests/language/lexer/scan-test.c
create mode 100644 tests/language/lexer/scan.at
create mode 100644 tests/language/lexer/segment-test.c
create mode 100644 tests/language/lexer/segment.at
create mode 100644 tests/libpspp/encoding-guesser-test.c
create mode 100644 tests/libpspp/encoding-guesser.at
create mode 100644 tests/libpspp/u8-istream-test.c
create mode 100644 tests/libpspp/u8-istream.at
--
1.7.2.3
_______________________________________________
pspp-dev mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/pspp-dev
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature
- [PATCH 02/18] file-name: Do not make output files line-buffered in fn_open()., (continued)
- [PATCH 02/18] file-name: Do not make output files line-buffered in fn_open()., Ben Pfaff, 2011/03/19
- [PATCH 12/18] identifier: Rename token_type_to_string() and make a new version., Ben Pfaff, 2011/03/19
- [PATCH 08/18] hash-functions: New function hash_case_bytes()., Ben Pfaff, 2011/03/19
- [PATCH 11/18] i18n: New functions for truncating strings in an arbitrary encoding., Ben Pfaff, 2011/03/19
- [PATCH 17/18] scan: New library for high-level PSPP syntax lexical analysis., Ben Pfaff, 2011/03/19
- [PATCH 15/18] u8-istream: New library for reading a text file and recoding to UTF-8., Ben Pfaff, 2011/03/19
- [PATCH 16/18] segment: New library for low-level phase of lexical syntax analysis., Ben Pfaff, 2011/03/19
- [PATCH 03/18] sys-file-reader: Refactor to clean up character encoding support., Ben Pfaff, 2011/03/19
- Re: [PATCH 00/18] rewrite PSPP lexer, John Darrington, 2011/03/20
- Re: [PATCH 00/18] rewrite PSPP lexer,
John Darrington <=