[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSPP goals

From: Ben Pfaff
Subject: Re: PSPP goals
Date: Tue, 02 Aug 2005 22:37:35 -0700
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

John Darrington <address@hidden> writes:

> On Tue, Aug 02, 2005 at 09:58:25AM -0700, Ben Pfaff wrote:
>      Jason Stover and I met over lunch yesterday and talked over some
>      of the goals for PSPP.  
> Sorry I couldn't join you.  It's a bit too far to come during lunchtime :)

Jason and I are normally a few thousand miles apart in any case,
but he happened to be on vacation in the area and stopped by
Stanford over lunch.

>      Over the long term, we also wish to provide support to developers who
>      wish to extend PSPP with new statistical procedures, by supplying the
>      following:
>       * Easy-to-use support for parsing language syntax.  Currently,
>                parsing is done by writing "recursive descent" code by hand,
>                with some support for automated parsing of the most common
>                constructs.  We wish to improve the situation by supplying a
>                more complete and flexible parser generator.
> I'd like to abstract this one level further, so that two other
> birds can be killed with the same stone (or at least have the
> framework in place).  Namely, the "friendly" textual and
> graphical interfaces.
> If we think about it carefully, it should be possible to devise
> some kind of 'pspp language description format' (PLDF).  Then
> we can produce suitable tools which can generate:
>  a) the parser ;
>  b) the command line completion code ;
>  c) the structure of the GUI elements.
>  d) ... other things that we haven't thought of yet.

So...  I have spent a long time thinking about this problem.  The
parsing is the easiest part of the whole process.  It can be done
with a simple LR parser generator.  In fact that part is already
done--I've written LR parser generators before and simply adapted
code for the purpose.  Lexical analysis is somewhat
context-sensitive but that's not too much of a challenge either.

The tricky parts are after parsing.  The first tricky part is
constraints.  Many PSPP commands have constraints that cannot be
easily expressed as a part of a context-free grammar.  Some
subcommands may be expressed at most once; others, at least once;
still others, exactly once.  Sometimes the presence of one
subcommand means that another is required, or prohibited.
Sometimes there are ordering constraints.  And so on...  It is
probably not an unmanageable problem, but there are many
possibilities.  It is difficult to draw the line between what
should be expressed grammatically and what should be checked
explicitly in C code later.

The second tricky part is providing the results of the parse to
the procedure.  Traditionally, this would be done with a parse
tree, but this doesn't work very well for PSPP syntax: the code
to extract the command's meaning from the parse tree is, in my
experiments, at least as long and annoying to write as a
hand-written parser.  So there has to be some other way to do it.
I've experimented with several ideas and haven't come up with one
that's quite right.  This is actually the real barrier.  I
haven't come up anything that makes me happy.  If you have
suggestions, please provide them.

The parser uses an extended BNF syntax:

    CAPS            keywords and fixed identifiers
    [brackets]      optional
    {repeated}      tokens inside repeated one or more times
    [{repeated}]    tokens inside repeated zero or more times
    {a; b}          `b' repeated one or more times, separated by `a'
    'x'             Operator terminals.
    #name           Special terminals.
    a | b           Either `a' or `b'.

Here's an example grammar for the DESCRIPTIVES command.

    var-list := {#var [TO #var]} | ALL.
    str-var-list := {#str-var [TO #str-var]} | ALL.
    num-var-list := {#num-var [TO #num-var]} | ALL.
    abstract-var-list := {#id [TO #id]} | ALL.

    Namespace descriptives:: {
      command := {'/'; subcommand}.

      subcommand := [VARIABLES '='] {[',']; (#var [TO #var] | ALL) ['(' #id 

      subcommand := MISSING '=' (VARIABLE | LISTWISE) [INCLUDE].

      subcommand := SAVE.

      subcommand := FORMAT '=' {format}.
      format := LABELS | NOLABELS.
      format := NOINDEX | INDEX.
      format := LINE | SERIAL.

      subcommand := STATISTICS '=' [{statistics}].
      statistics := DEFAULT | ALL.
      statistics := MEAN | STDDEV | MIN | MAX.
      statistics := VARIANCE | SUM | RANGE.
      statistics := SKEWNESS | KURTOSIS.

      subcommand := SORT ['='] [sort-stat] ['(' (A | D) ')'].
      sort-stat := MEAN | SEMEAN | STDDEV | VARIANCE | KURTOSIS.
      sort-stat := SKEWNESS | RANGE | MIN | MAX | SUM | NAME.

      Within subcommand: 
            Enumeration \statistic
            Prefix \STAT_

      Within command {
        Once VARIABLES.
        Exclusive LABELS*.
        Exclusive NOINDEX*.
        Exclusive LINE*.
        Max-once SAVE, SORT.

>       * Eventually, a plug-in interface for procedures.  Over the
>                short term, the interface between the PSPP core and
>                statistical procedures is evolving quickly enough that a
>                plug-in model does not make sense.  Over the long term, it
>                may make sense to introduce plug-ins.
> On the subject of plug-ins, I'd like to see plug-ins for importing/exporting 
> system 
> and portable files.  I think this is something that would be beneficial in 
> the short term.
> A fully fledged GUI is a long way off, but if we have a plugin that can read 
> a gnumeric 
> file then that'll open up PSPP to all those command line phobic users out 
> there.
> Similarly, if we put [dps]fm-{read,write}.c into a separate library, then 
> it'll be 
> straightforward to create a gnumeric plugin to import pspp files into 
> gnumeric (or any 
> other application).

This would not be too difficult, I think.  It's just a matter of
"To the engineer, the world is a toy box full of sub-optimized and
 feature-poor toys."
--Scott Adams

reply via email to

[Prev in Thread] Current Thread [Next in Thread]