help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lisp parser skeleton for bison proof of concept working


From: Tim Josling
Subject: Lisp parser skeleton for bison proof of concept working
Date: Sat, 05 Jan 2008 21:17:02 +1100

Hi,

I think I mentioned on one of the bison lists a while ago I was thinking
of using bison as a parser generator for a lisp program, which is part
of a COBOL front-end for GCC.

The proof of concept is now working. It is a bit rough around the edges
bit I think it's at a point where it's worth discussing the design and
options.

Contents
1. Have a look!
2. Challenges.
3. Options.
4. Conclusion.

Have a look
-----------
The code is all available for browsing from here:

http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/

The files of interest are

Skeleton for current cvs head (2.3a):
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/bison-parser-skel-v2.3a.m4?revision=1.1&view=markup

Skeleton for V2.3
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/bison-parser-skel-v2.3.m4?revision=1.1&view=markup

Parse definition for simplified calc toy application:
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/calc-lisp-parser.lispy?revision=1.1&view=markup

Input file for testing calc:
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/calc-test-input.txt?revision=1.1&view=markup

Makefile (you may need to change the location of your bison executable)
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/Makefile?revision=1.1&view=markup

Command file that pretends to be m4 to fix the file name up and to save
the m4 input file, for debugging purposes. It also allows fine control
over the M4 options.
http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/bison_fake_m4.x?revision=1.1&view=markup

This compiles and builds on Ubuntu linux.You can get all the code using
cvs by following the instructions here
http://sourceforge.net/cvs/?group_id=5709

Note the project documentation is out of date. I will be updating it in
the next few days.

Challenges
-----------

1. The use of m4 and the skeletons makes it viable to use bison in this
way, whereas with Flex, with hand coded output, it was really not
possible. M4 makes things a little fragile and harder to debug but there
is no obvious alternative. 

2. Version 2.3a is quite a lot friendlier to those who want to roll
their own skeleton that 2.3. Even so, back-porting to 2.3 only took a
couple of hours.

3. Given the differences between 2.3 and 2.3a it is not possible to have
one skeleton that supports both, as far as I could tell. This means that
I had to detect the version number, as follows

> export BISON_VERSION=`if ${BISON} --version|grep "2.3a" >/dev/null; \>
> then echo -n "2.3a"; \
> else if ${BISON} --version|grep "2.3" >/dev/null; then echo -n "2.3"; \
> else echo -n "_unknown_bison_version"; false; fi; fi`

Any suggestions how to do this better?

4. The fake m4 program was needed for V2.3 to allow skeletons not in the
official directory, or relative to the official directory. V2.3 prefixes
your skeleton file name with the default skeleton directory. Thus
-S /home/hacker/skel1.m4
=> 
/use/share/bison//home/hacker/skel1.m4
This problem is fixed in 2.3a, which is excellent.

5. C-isms. Bison scans the actions, assuming they are C code or other
dialects of C (Java/C++). This is not lisp compatible. 

I put a temporary fix in my actions where I embed a C comment start and
end in Lisp comments, as follows

{
#|/*|#
lisp code can go here as long as there is no "*/" embedded
#|*/|#
}

I think the longer term fix for this is to have the language table drive
a different versions of the scanners or logic within them.

All three scanners (*.l) are impacted.

6. V2.3 generates actions which are C code and which is hard to convert
to lisp code using M4 only. In the process I lose the ability to
directly specify package names, though I can use macros to get around
this. This problem us fixed in v2.3a. The main problem is where it says 

case 27: 
{
...code...
        break;
}
Getting rid of the colon but being selective is the problem. Is there an
m4 way to say 
s/:$//?

7. (Speculation) The bison table structure is quite complex and is by
now to some extent a legacy of earlier times when memory was in very
short supply. It could probably be simplified, I suspect, at a very
affordable cost in memory. This is just speculation because I don't know
how big a simplistic hash table populated to 75% would be. If I end up
not using bison I will know because this would be the approach I would
use writing my own LR1/LALR parser generator.

8. (This one is purely a lisp issue and bison cannot really help). Some
versions of Lisp lack an efficient case macro (equivalent of C switch)
because other alternatives exist. So I actually set up the addresses of
the actions in an array. This means the actions all have to be
functions. The good thing about this is that you could just put the
address of the function in the action, but on the other hand this
divorces the action code from the location in the grammar. If you want
to include the actions literally in the grammar. This means wrapping
them in a literal function definition which looks like this #'(lambda
(parm) code ...)/ As always with lisp, you can use macros to remove
anything repetitive or ugly.

9. I have not implemented all the options fully eg pure versus not pure.
Effectively all the parser are pure because Lisp closures make this
basically costless.

Next Steps
----------

I see three alternatives for a lisp parser 

1. Code from scratch. Get out the dragon book and code from scratch.
This would be about 1,000 lines of Lisp code including driver based on
my experience writing a lexer.

2. Use the approach above. Depending on feedback I could finish this
off, add documentation and fixing the rough edges and submit to include
as part of bison.

3. Bison tables approach. Write a skeleton that just outputs the tables,
in C format, with a user supplied driver. The driver would have access
to all the tables and could output them in any desired format, such as
Lisp (or other languages too). 

The main issues with this approach are

a) Two levels of code generation are being used. No way to keep the line
numbers. 
b) What to do about actions. One approach would be to have each action
look something like this
{ printf(" ... lisp code ...); }. The lisp code would be embedded in C
strings. Then have the skeleton run through all the actions 
for (i=0; i<n; i++) switch () {}...
I have done this sort of thing before with no real problems.

Conclusion
----------

Probably my inclination is to use the skeleton I have written. If there
is interest I will look at polishing it for inclusion in bison in due
course.

Failing that, option 3 is most appealing because it involves the fewest
changes to bison... just a simple skeleton which would be a variant of
yacc.c. 

Any thoughts or comments are most welcome.

Tim Josling







reply via email to

[Prev in Thread] Current Thread [Next in Thread]