help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Odd parser behaviour


From: Tim Van Holder
Subject: Re: Odd parser behaviour
Date: Mon, 25 Sep 2006 10:34:49 +0200
User-agent: Thunderbird 1.5.0.7 (Windows/20060909)

Heiko Wundram wrote:
> Am Montag, 25. September 2006 08:49 schrieben Sie:
>> <snip grammar info>
>>
>> Except that it actually takes the entire "opt_C opt_D E" path - if an A
>> is seen, it will reduce opt_C and opt_D then error out on the "E" rule,
>> while it should have reduced up to the "bar" nterm, and only try to
>> error out then (except that foo's action would have YYACCEPTed the
>> grammar before it reaches the point at which an error should be
>> produced).
> 
> How do you expect it to reduce to A before erroring out? There's absolutely 
> no 
> reduction path that justifies
> 
> foo *-> A

I don't expect it to reduce foo, I expect it to reduce bar - it's the
foo rule I expect to error out on an unexpected token (and that error
would be avoided by the YYACCEPT).

> as the only sensible (meaning, justified by a rule) reduction at the point 
> you're referring to is opt_C and opt_D; these are also only reduced because 
> of bison's grammar compression algorithm, because actually the parser should 
> already error out not finding any matching lookahead in state 1.

So without the grammar compression, it would see that the next lookahead
is not valid for any form of xyzzy, and start reducing up to bar before
giving the "unexpected token" error?  Then that's the behaviour I want!

Perhaps this compression should not occur in cases where the lookahead
can come from beyond the end of the %start rule?  It also adversely
affects error reporting - it would say that the grammar expected E when
it did not really expect E any more than C D or B.

I just can't see any case where the current behaviour is beneficial,
which is why I'd want it changed. If there is a good reason for having
the parser behave somewhat counterintuitively in these cases, then fine,
I have no problem rewriting the grammar with some duplicated action code
to avoid the optional leading components (using xyzzy -> CD?E | DE | E,
which is wholly equivalent to C?D?E, things work fine).  But without a
clear benefit, I don't see why I should have to.

[I now see Hans Aberg's reply - so ok, it's an artifact of LALR(1), I
 can live with that - and I hereby second any motion to introduce that
 LR(1) option]

> Bison reduces everything as far as possible before erroring out (that's a 
> general properly of LALR(1)-parsers), but here, there's absolutely no rule 
> for it to do what you want, so the error occurs on the shift on the expected 
> E.
> 
> Maybe, you should change the grammar to  look like the following (which has a 
> path to justify "foo *-> A"):
> 
> bar -> A star_xyzzy
> 
> star_xyzzy ->
>            -> star_xyzzy xyzzy
> 
> That will reduce an erraneous string to A before erroring out.

But that would mean accepting "A(B|C?D?E)*" instead of "A(B|C?D?E)+",
which is not acceptable - "A" and "AF" should both error out, while
"ABCEB" and "ABCDEF" should not.

If seeing a portion of the actual grammar where I encountered the
behaviour would be clearer, let me know and I'll post it.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]