help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bison & flex on start conditions


From: Steve Murphy
Subject: Re: Bison & flex on start conditions
Date: Mon, 16 Jan 2006 09:46:48 -0700

On Mon, 2006-01-16 at 08:56 +0000, Evan Lavelle wrote:
> Henrik Sorensen wrote:
> 
> > For the pl1 problem, I cheated a bit with flex, and when certain conditions 
> > are met I simply save all the token met, and then change them to what I 
> > need 
> > before returning the tokens to bison. If you are interested in more feel 
> > free 
> > to contact me, or better look at the code (pl1gcc.sourceforge.net).
> 
> This is great if you can do it, but it means that you're doing the 
> parsing in lex ('when certain conditions are met'). If your analysis is 
> so complex that you can't do it in lex, then you really need to do it in 
> bison, where you run into the potential look-ahead problem. What makes 
> the problem even worse is that you're not even guaranteed that there 
> *is* a look-ahead token: sometimes there is, sometimes there isn't.
> 
> Evan

Bingo!

There are times you can set things up in flex to recognize the start
condition. That's trivial! But when you have to store up a lot of
material in flex, and then return tokens in any of N different ways,
then yes, you are implementing the grammar in the flex scanner to some
degree. Really, in many context-dependant grammars, the grammar is the
pudding in which the different contexts are embedded, and the grammar is
best and simplest place to specify which set of tokens it would like to
see parsed. And because flex and bison have no way to communicate on
this issue, we end up going to sometimes ridiculous (and most likely
buggy) lengths to work around the problem.

Let's suppose you have 3 different grammars, where two of them are
embedded in a third. Let's get semi-formal here. Let's say the embedded
grammars are A and B. And C is the grammar that embeds them. Let's
further suppose that the methods of embedding them vary. Let's say that
some cases, C includes A inside parenthesis, and other times, multiple A
grammars are in parens, seperated by semicolons. And further, B grammars
can appear almost anywhere, including inside A grammars, say wrapped in
{}. There are separate yacc/lex grammars for A and B. All you might want
to do is just gather the strings that compose A and B grammars, and call
the subgrammar parsers on the strings. So you want flex to return a
single token containing the subgrammar text. But let's also assume that
A and B are very similar to C (at least, in some places).

No, you can't just merge B and C into A. They are used in other places
independently of C, and it'd make a real mess to explode them into A.
Even if you wanted to explode them, you really can't. A uses operators,
B doesn't, but the same chars can be included in its tokens. Nope, all 3
grammars have entirely different scanner rules for forming tokens.

And you can't scrap the language and start over (although it'd be nice).

It'd be real cool if the grammar and the scanner could work together on
a context shift. The grammar could "retract" the lookaheads, and ask the
scanner to unput their contents, set the new start condition for the
scanner, and then reget the look-aheads, if they are still necessary.

Now, I see where bison couldn't really know exactly what to ask flex to
unput. That would have to be written by the programmer, or passed with
every token it got, and may have to include the linefeeds and spaces
that would otherwise have been skipped, as one context's garbage may be
another context's gold.

Bison, on the other hand, could at least give access the the list of
look-head tokens at any point in the grammar, so some function could be
written and called by the programmer to unput all that back into flex.

And some directive could be made available to bison grammars, to retract
and re-read the current look-ahead tokens, but all the complexities of
exactly how this might land you in a different parser state than the one
the directives are/were located in are beyond my current ken, if it even
matters much. If all goes well, you should end in the state you wanted
to be at any rate, I guess.

Am I dreaming wildly impossible dreams?

murf


-- 
Steve Murphy <address@hidden>
Electronic Tools Company





reply via email to

[Prev in Thread] Current Thread [Next in Thread]