[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSPP conference call notes.

From: Ben Pfaff
Subject: Re: PSPP conference call notes.
Date: Thu, 01 Jun 2006 12:15:49 -0700
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)

John Darrington <address@hidden> writes:

> GUI 
> === 
> So it seems that the next release will feature the GUI.  Thinking
> about it last night, I'm of the opinion that, in its first public
> appearance, it should favour stability rather than features.
> So I propose that it be released as a "System/Portable file editor".
> Getting it to actually run procedures is another ball game, which can
> start after the next release.

This is OK by me.

> On this basis, I think that the end of 2006 is a reasonable release
> date to aim for.

Well, OK.  At first when I saw this I thought that it sounded
pretty far off, but really it's probably a realistic goal.

> Output Subsystem
> ================
> Intermediate format such as HDLF for output would be nice.  It would
> be essential if the GUI is to have interactive "pivot tables".

That's HDF5, for what it's worth.

> Caching/Saving of Models
> ========================
> This would make the syntax a lot more powerful (and faster) if done
> properly.

Jason and I spent quite a long time talking about this after we
hung up the phone.  I'll let him give the details, if he likes.

> RAD framework
> =============
> Some kind of framework to make development easier for statisticians
> who know only a modicum of programming.  As I see it there are three
> aspects to this:
> 1. Something to build the syntax parser.  I suppose q2c was supposed to
>    do that, but we're all aware of it's limitations.

q2c was never meant to be general-purpose.  I wrote it as a quick
hack to make some common kinds of things easier to parse.  I
don't think that it's really extensible into something

I have written a parser generator that is general-purpose enough
to parse the SPSS grammar.  The problem is the format of the
output.  A parser generator most directly outputs a parse tree,
and usually that's just fine, because the parse tree is pretty
easy to transform into the format that the compiler (or other
tool) wants.  Also, there's usually a fairly limited number of
nonterminals; for example, a grammar for C or Pascal would
normally have a few hundred at most.

But this isn't the case for parsing SPSS syntax.  The grammar is
irregular, which means that each individual command needs its own
set of grammar rules.  You can easily output a parse tree, but
extracting the information that the command actually wants to see
from the parse tree takes a lot of work: in fact, in my
experiments it took almost as much code as a hand-written parser
(!).  This may mean that I just haven't come up yet with a good
way to represent the parse tree, or a good API to extract
information from it.  At one time I spent a lot of time thinking
about those two problems, but I didn't come up with a good

In case anyone is curious, here's a sample syntax description in
the format that my parser generator would accept.  It's a lot
like a yacc format grammar, with the addition of [] for optional
elements and {a; b} for lists of one or more "a"s that are
separated by "b"s.  There's also some syntax for specifying how
many times various elements may occur, but I never quite figured
out the best way to do that.

    -- -*- pap -*-

    var-list := {#var [TO #var]} | ALL.
    str-var-list := {#str-var [TO #str-var]} | ALL.
    num-var-list := {#num-var [TO #num-var]} | ALL.
    abstract-var-list := {#id [TO #id]} | ALL.

    Namespace descriptives:: {
      command := {'/'; subcommand}.

      subcommand := [VARIABLES '='] {[',']; (#var [TO #var] | ALL) ['(' #id 

      subcommand := MISSING '=' (VARIABLE | LISTWISE) [INCLUDE].

      subcommand := SAVE.

      subcommand := FORMAT '=' {format}.
      format := LABELS | NOLABELS.
      format := NOINDEX | INDEX.
      format := LINE | SERIAL.

      subcommand := STATISTICS '=' [{statistics}].
      statistics := DEFAULT | ALL.
      statistics := MEAN | STDDEV | MIN | MAX.
      statistics := VARIANCE | SUM | RANGE.
      statistics := SKEWNESS | KURTOSIS.

      subcommand := SORT ['='] [sort-stat] ['(' (A | D) ')'].
      sort-stat := MEAN | SEMEAN | STDDEV | VARIANCE | KURTOSIS.
      sort-stat := SKEWNESS | RANGE | MIN | MAX | SUM | NAME.

      Within subcommand: 
            Enumeration \statistic
            Prefix \STAT_

      Within command {
        Once VARIABLES.
        Exclusive LABELS*.
        Exclusive NOINDEX*.
        Exclusive LINE*.
        Max-once SAVE, SORT.

> 2. Something to build the output tables.  A glade type application
>    springs to mind.  But it would be a lot of work to do.

I've also thought about this a bit.  It honestly hadn't occurred
to me that an interactive application would be a good way to do
it.  My tendency is always to think about inventing languages to
describe things.

> 3. An easy way to access the raw data, without being exposed to the
>    internals of casefiles etc.

Here are some of the things that I took away from the
conversation, in addition to what you say:

- Need casefile bookmarking/cloning.  Should be easy; I'll take
  care of it soon.

  This will enable RANK.

- Should add log file support and support for sending errors to
  the listing file.  Both should be easy now.

- Should add a command to change the width or type of variables,
  to enable the GUI to do these things.

- Need some work to make Mac OS system files work.  John has an
  example system file (can you file a bug report and attach it?).

Ben Pfaff 
email: address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]