help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Two-pass parser or AST with Bison?


From: Evan Lavelle
Subject: Re: Two-pass parser or AST with Bison?
Date: Tue, 06 Jan 2009 09:57:34 +0000
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

I had this problem some years ago, when I was first using Bison. I had a simple language, which I could handle in a single parser pass; this was easy, and I could do everything in the Bison actions.

I then added some functionality, which was primarily forward-referencing of externals (mainly function calls, like you). I fixed this quickly by adding a second Bison pass; the parser re-scanned the original source code (I didn't bother saving and re-scanning flex tokens, as you're suggesting).

(2) (for two-pass:)
    Don't want to surround every single rule with an if like this:
    E = E '+' E {
        if(global.pass==1) {
            // do nothing in first pass
        } else {
            code_append(OP_ADD);
        }
    }

I had a lot of this, but I mainly put it in the C++ code called from the actions, where it was a trivial addition - if you're in the wrong pass, you just return. This keeps the grammar clean. Alternatively, you might also consider writing two grammars - in the first pass, you use a simplified grammar which ignores the contents of functions, and which just extracts function names and parameters.

If you're just forward-referencing function names, then two Bison parses is *much*, *much* easier than creating and processing an AST. In my case, the language got more complicated over time, and I needed a third pass. It was only then that I started using an AST - the parser pass created the AST, and subsequent passes manipulated the AST (there were eventually 6 passes).

Using an AST for the first time is hard work and there's a steep learning curve (I used the C++ AST library from ANTLR to make this easier). You should also remember that writing an AST-based compiler is a completely different programming paradigm from a simple parsing exercise. The former is all about data and manipulating data structures; the latter is just actions and functions.

(1) (for the AST:)
Don't want to spell out every single AST node class explicitly, but rather, if possible, derive these from the grammar?

I don't think that would be desirable (even if it's possible). As a programmer, you'll be constantly modifying, adding, or removing node classes to make life easier. For example, some passes might want a new node class because they've simplified part of the tree and want to replace it with an equivalent node, where that node has no equivalent in the source code. Eventually, your AST will transform to something which looks nothing like your source code, so why should it uses node types which are defined by the source?

Good luck -

Evan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]