[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 00/18] rewrite PSPP lexer
From: |
John Darrington |
Subject: |
Re: [PATCH 00/18] rewrite PSPP lexer |
Date: |
Sun, 20 Mar 2011 09:58:03 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
I had a look through all these patches and don't see any major
problems.
Minor issues:
* I see that a lot of new files have the copyright year as 2010, 2011.
As I understand it, the years listed are supposed to be those in which
the changes were released (where "released" means made available to the
public in any way), not necessarily the years in which the code was
written. So if this is the first time you're made these changes public,
then 2010 is inappropriate.
* In light of recent discussions, the case insensitive hash should perhaps
use the C locale toupper instead of the locale dependent one. On the other
hand the particular use case might require locale dependency. - On the
third hand all sorts of things might break, if an string is placed into a
hash table in oen locale, and lookup is performed in another, all hell
could break loose. I think this deserves further consideration.
* In my opinion the test in i18n-test.c is tautological. It's not testing
anything except that you didn't make any typos in the recode_string_len
function. Maybe a short table of example strings with their expected lengths
would
be more useful?
* In the comment within encoding-guesser, there is the word "other;" which
seems to be erroneous.
J'
On Sat, Mar 19, 2011 at 05:09:46PM -0700, Ben Pfaff wrote:
These commits, taken together, rip out the PSPP lexer and replace
it by one that is more flexible, easier to use, and better at handling
character encodings.
Ben Pfaff (18):
data-reader: Remove unreachable "return" statements.
file-name: Do not make output files line-buffered in fn_open().
sys-file-reader: Refactor to clean up character encoding support.
output: New function text_item_create_nocopy().
str: New function ss_realloc().
str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to
ds_chomp_byte().
str: New functions for checking for and removing string suffixes.
hash-functions: New function hash_case_bytes().
i18n: New function uc_name().
i18n: New function recode_string_len().
i18n: New functions for truncating strings in an arbitrary encoding.
identifier: Rename token_type_to_string() and make a new version.
i18n: New functions and data structure for obtaining encoding info.
encoding-guesser: New library to guess the encoding of a text file.
u8-istream: New library for reading a text file and recoding to
UTF-8.
segment: New library for low-level phase of lexical syntax analysis.
scan: New library for high-level PSPP syntax lexical analysis.
lexer: Reimplement for better testability and internationalization.
NEWS | 46 +-
Smake | 8 +-
doc/dev/concepts.texi | 25 +-
doc/flow-control.texi | 33 +-
doc/invoking.texi | 20 +-
doc/language.texi | 96 +-
doc/utilities.texi | 69 +-
perl-module/PSPP.xs | 14 +-
perl-module/t/Pspp.t | 4 +-
src/data/automake.mk | 1 +
src/data/dictionary.c | 174 +-
src/data/dictionary.h | 14 +-
src/data/file-handle-def.c | 8 +-
src/data/file-name.c | 11 +-
src/data/gnumeric-reader.h | 8 +-
src/data/identifier.c | 124 +-
src/data/identifier.h | 12 +-
src/data/identifier2.c | 133 ++
src/data/mrset.c | 33 +-
src/data/mrset.h | 7 +-
src/data/por-file-reader.c | 25 +-
src/data/por-file-writer.c | 5 +-
src/data/procedure.c | 19 +
src/data/procedure.h | 4 +-
src/data/sys-file-reader.c | 2084
+++++++++++---------
src/data/sys-file-reader.h | 2 +-
src/data/sys-file-writer.c | 20 +-
src/data/variable.c | 139 +-
src/data/variable.h | 7 +-
src/data/vector.c | 15 +-
src/data/vector.h | 4 +-
src/language/automake.mk | 6 -
src/language/command.c | 158 +-
src/language/command.def | 11 +-
src/language/control/automake.mk | 3 +-
src/language/control/do-if.c | 12 +-
src/language/control/loop.c | 4 +-
src/language/control/repeat.c | 714 +++-----
src/language/control/temporary.c | 6 +-
src/language/data-io/combine-files.c | 21 +-
src/language/data-io/data-list.c | 5 +-
src/language/data-io/data-parser.c | 9 +-
src/language/data-io/data-reader.c | 58 +-
src/language/data-io/file-handle.q | 29 +-
src/language/data-io/get-data.c | 8 +-
src/language/data-io/inpt-pgm.c | 30 +-
src/language/data-io/save-translate.c | 2 +
src/language/data-io/trim.c | 5 +-
src/language/dictionary/apply-dictionary.c | 11 +-
src/language/dictionary/attributes.c | 69 +-
src/language/dictionary/missing-values.c | 44 +-
src/language/dictionary/modify-variables.c | 4 +-
src/language/dictionary/mrsets.c | 27 +-
src/language/dictionary/numeric.c | 12 +-
src/language/dictionary/rename-variables.c | 3 +-
src/language/dictionary/split-file.c | 4 +-
src/language/dictionary/sys-file-info.c | 18 +-
src/language/dictionary/value-labels.c | 22 +-
src/language/dictionary/variable-label.c | 17 +-
src/language/dictionary/vector.c | 13 +-
src/language/dictionary/weight.c | 4 +-
src/language/expressions/parse.c | 34 +-
src/language/expressions/private.h | 5 +-
src/language/lexer/automake.mk | 8 +
src/language/lexer/include-path.c | 89 +
.../msg-ui.h => language/lexer/include-path.h} | 20 +-
src/language/lexer/lexer.c | 2143
+++++++++++---------
src/language/lexer/lexer.h | 158 ++-
src/language/lexer/q2c.c | 6 +-
src/language/lexer/scan.c | 596 ++++++
src/language/lexer/scan.h | 93 +
src/language/lexer/segment.c | 1631 +++++++++++++++
src/language/lexer/segment.h | 122 ++
src/language/lexer/token.c | 173 ++
src/language/{prompt.h => lexer/token.h} | 36 +-
src/language/lexer/value-parser.c | 2 +-
src/language/lexer/variable-parser.c | 17 +-
src/language/lexer/variable-parser.h | 10 +-
src/language/prompt.c | 75 -
src/language/stats/aggregate.c | 23 +-
src/language/stats/autorecode.c | 3 +-
src/language/stats/descriptives.c | 23 +-
src/language/stats/flip.c | 8 +-
src/language/stats/frequencies.q | 25 +-
src/language/stats/npar.c | 77 +-
src/language/stats/rank.q | 12 +-
src/language/stats/sort-cases.c | 2 +-
src/language/syntax-file.c | 144 --
src/language/syntax-file.h | 25 -
src/language/syntax-string-source.c | 151 --
src/language/syntax-string-source.h | 33 -
src/language/tests/format-guesser-test.c | 2 +-
src/language/tests/moments-test.c | 2 +-
src/language/tests/paper-size.c | 2 +-
src/language/utilities/cache.c | 4 +-
src/language/utilities/cd.c | 8 +-
src/language/utilities/date.c | 4 +-
src/language/utilities/host.c | 14 +-
src/language/utilities/include.c | 198 +-
src/language/utilities/permissions.c | 13 +-
src/language/utilities/set.q | 13 +-
src/language/utilities/title.c | 92 +-
src/language/xforms/compute.c | 6 +-
src/language/xforms/count.c | 26 +-
src/language/xforms/fail.c | 8 +-
src/language/xforms/recode.c | 45 +-
src/language/xforms/sample.c | 4 +-
src/language/xforms/select-if.c | 2 +-
src/libpspp/automake.mk | 10 +-
src/libpspp/encoding-guesser.c | 289 +++
src/libpspp/encoding-guesser.h | 126 ++
src/libpspp/getl.c | 271 ---
src/libpspp/getl.h | 113 -
src/libpspp/hash-functions.c | 18 +-
src/libpspp/hash-functions.h | 1 +
src/libpspp/i18n.c | 345 ++++-
src/libpspp/i18n.h | 87 +-
src/libpspp/message.c | 118 +-
src/libpspp/message.h | 23 +-
src/libpspp/msg-locator.c | 87 -
src/libpspp/msg-locator.h | 34 -
src/{ui/terminal/msg-ui.c => libpspp/prompt.c} | 41 +-
src/{language => libpspp}/prompt.h | 17 +-
src/libpspp/str.c | 54 +-
src/libpspp/str.h | 11 +-
src/libpspp/u8-istream.c | 475 +++++
src/libpspp/u8-istream.h | 45 +
src/output/driver.c | 56 +-
src/output/text-item.c | 12 +-
src/output/text-item.h | 3 +-
src/ui/gui/automake.mk | 2 -
src/ui/gui/comments-dialog.c | 13 +-
src/ui/gui/executor.c | 19 +-
src/ui/gui/executor.h | 4 +-
src/ui/gui/main.c | 39 +-
src/ui/gui/psppire-data-window.c | 16 +-
src/ui/gui/psppire-dict.c | 9 +-
src/ui/gui/psppire-syntax-window.c | 16 +-
src/ui/gui/psppire-syntax-window.h | 1 -
src/ui/gui/psppire-var-store.c | 7 +-
src/ui/gui/psppire.c | 34 +-
src/ui/gui/psppire.h | 6 +-
src/ui/gui/syntax-editor-source.c | 130 --
src/ui/gui/syntax-editor-source.h | 34 -
src/ui/gui/text-data-import-dialog.c | 4 +-
src/ui/source-init-opts.c | 20 +-
src/ui/source-init-opts.h | 4 +-
src/ui/terminal/automake.mk | 13 +-
src/ui/terminal/main.c | 113 +-
src/ui/terminal/read-line.h | 31 -
src/ui/terminal/terminal-opts.c | 58 +-
src/ui/terminal/terminal-opts.h | 8 +-
src/ui/terminal/{read-line.c => terminal-reader.c} | 310 ++--
.../repeat.h => ui/terminal/terminal-reader.h} | 11 +-
tests/automake.mk | 43 +
tests/data/data-in.at | 64 +-
tests/data/sys-file-reader.at | 295 +--
tests/dissect-sysfile.c | 8 +-
tests/language/control/do-repeat.at | 101 +-
tests/language/data-io/data-list.at | 6 +-
tests/language/data-io/get.at | 2 -
tests/language/data-io/inpt-pgm.at | 6 +-
tests/language/data-io/print.at | 2 +-
tests/language/dictionary/missing-values.at | 2 +-
tests/language/dictionary/sys-file-info.at | 4 +-
tests/language/expressions/evaluate.at | 9 +-
tests/language/expressions/parse.at | 2 +-
tests/language/lexer/lexer.at | 43 +
tests/language/lexer/q2c.at | 2 +-
tests/language/lexer/scan-test.c | 217 ++
tests/language/lexer/scan.at | 818 ++++++++
tests/language/lexer/segment-test.c | 318 +++
tests/language/lexer/segment.at | 1070 ++++++++++
tests/language/stats/aggregate.at | 4 +-
tests/language/stats/rank.at | 6 +-
tests/language/utilities/insert.at | 29 +-
tests/libpspp/encoding-guesser-test.c | 102 +
tests/libpspp/encoding-guesser.at | 143 ++
tests/libpspp/i18n-test.c | 68 +-
tests/libpspp/i18n.at | 125 +-
tests/libpspp/u8-istream-test.c | 126 ++
tests/libpspp/u8-istream.at | 142 ++
182 files changed, 12211 insertions(+), 5364 deletions(-)
create mode 100644 src/data/identifier2.c
create mode 100644 src/language/lexer/include-path.c
rename src/{ui/terminal/msg-ui.h => language/lexer/include-path.h} (66%)
create mode 100644 src/language/lexer/scan.c
create mode 100644 src/language/lexer/scan.h
create mode 100644 src/language/lexer/segment.c
create mode 100644 src/language/lexer/segment.h
create mode 100644 src/language/lexer/token.c
copy src/language/{prompt.h => lexer/token.h} (50%)
delete mode 100644 src/language/prompt.c
delete mode 100644 src/language/syntax-file.c
delete mode 100644 src/language/syntax-file.h
delete mode 100644 src/language/syntax-string-source.c
delete mode 100644 src/language/syntax-string-source.h
create mode 100644 src/libpspp/encoding-guesser.c
create mode 100644 src/libpspp/encoding-guesser.h
delete mode 100644 src/libpspp/getl.c
delete mode 100644 src/libpspp/getl.h
delete mode 100644 src/libpspp/msg-locator.c
delete mode 100644 src/libpspp/msg-locator.h
rename src/{ui/terminal/msg-ui.c => libpspp/prompt.c} (58%)
rename src/{language => libpspp}/prompt.h (69%)
create mode 100644 src/libpspp/u8-istream.c
create mode 100644 src/libpspp/u8-istream.h
delete mode 100644 src/ui/gui/syntax-editor-source.c
delete mode 100644 src/ui/gui/syntax-editor-source.h
delete mode 100644 src/ui/terminal/read-line.h
rename src/ui/terminal/{read-line.c => terminal-reader.c} (53%)
rename src/{language/control/repeat.h => ui/terminal/terminal-reader.h}
(77%)
create mode 100644 tests/language/lexer/scan-test.c
create mode 100644 tests/language/lexer/scan.at
create mode 100644 tests/language/lexer/segment-test.c
create mode 100644 tests/language/lexer/segment.at
create mode 100644 tests/libpspp/encoding-guesser-test.c
create mode 100644 tests/libpspp/encoding-guesser.at
create mode 100644 tests/libpspp/u8-istream-test.c
create mode 100644 tests/libpspp/u8-istream.at
--
1.7.2.3
_______________________________________________
pspp-dev mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/pspp-dev
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature
- [PATCH 13/18] i18n: New functions and data structure for obtaining encoding info., (continued)
- [PATCH 13/18] i18n: New functions and data structure for obtaining encoding info., Ben Pfaff, 2011/03/19
- [PATCH 06/18] str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to ds_chomp_byte()., Ben Pfaff, 2011/03/19
- [PATCH 02/18] file-name: Do not make output files line-buffered in fn_open()., Ben Pfaff, 2011/03/19
- [PATCH 12/18] identifier: Rename token_type_to_string() and make a new version., Ben Pfaff, 2011/03/19
- [PATCH 08/18] hash-functions: New function hash_case_bytes()., Ben Pfaff, 2011/03/19
- [PATCH 11/18] i18n: New functions for truncating strings in an arbitrary encoding., Ben Pfaff, 2011/03/19
- [PATCH 17/18] scan: New library for high-level PSPP syntax lexical analysis., Ben Pfaff, 2011/03/19
- [PATCH 15/18] u8-istream: New library for reading a text file and recoding to UTF-8., Ben Pfaff, 2011/03/19
- [PATCH 16/18] segment: New library for low-level phase of lexical syntax analysis., Ben Pfaff, 2011/03/19
- [PATCH 03/18] sys-file-reader: Refactor to clean up character encoding support., Ben Pfaff, 2011/03/19
- Re: [PATCH 00/18] rewrite PSPP lexer,
John Darrington <=
- Re: [PATCH 00/18] rewrite PSPP lexer, John Darrington, 2011/03/22