pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lexer woes


From: John Darrington
Subject: Re: Lexer woes
Date: Wed, 24 Sep 2008 15:38:54 +0800
User-agent: Mutt/1.5.13 (2006-08-11)

On Tue, Sep 23, 2008 at 09:19:26PM -0700, Ben Pfaff wrote:

     I also think this would be nice to have.  It is a problem that I
     did some work on a few years ago.  It is not too hard to write a
     parser generator that accepts a context-free grammar for PSPP
     syntax and outputs C code to parse it, and in fact I did most of
     the work necessary for that.

Really? As I understand it, the example that I've given means that the
grammar is not context-free.
     
     The tricky part, which was what stymied me at the time, is in
     fact how you pass the resulting parse tree back to the command
     that wants it in a useful form.  If you do it in any of the ways
     that were obvious to me, it takes a lot of code to traverse the
     parse tree, verify its semantics, and translate it into a form
     that is useful for further processing.  In the cases that I
     looked at, it takes about as much code to do this, in fact, as it
     does to write a parser by hand.  And that is not much of a win.
     
     But I have some newer ideas now that might make it much easier.
     If you have time to work on this and you want to hear some of my
     ideas, or to look over the work-in-progress parser generator code
     that I wrote, then please say so.

It's unlikely that I'll have time to work on it, at least in the near
future.  But I would be interested to see how you did it, so I guess
the OR condition is satisfied.
     
     > But back to the current issue, parsing the K-W as three tokens, whilst
     > will work for the purpose of syntax verification, obviously falls down
     > in the bigger picture.  The obvious solution would have been to allow
     > '-' as  a valid character in the T_ID token.  However this means that
     > constructs like
     >
     >  COMPUTE X=Y/K-W.
     >
     > suddenly get misinterpreted.  But so far as I can see, there are only
     > a few special places in spss syntax where algebraic expressions like
     > that can occur (in an IF, LOOP, COMPUTE, RECODE and a few others). I
     > wonder if it might not be a better solution to throw the lexer into a
     > different mode when an expression is expected.  Obviously there will
     > be complications (like when to switch back to non-expression mode).
     
     I do not think that this is the right solution to this particular
     problem.  Keywords that include '-' are very rare, but the use of
     identifiers in other circumstance is very common.  We would have
     to add special cases for variable names, file handle names,
     vector names, etc. to disallow the use of '-', and we would gain
     very little.
     

OK.  I'll look at some of the options you suggested.  Thanks.

J'

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]