help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strange behavior


From: Hans Aberg
Subject: Re: Strange behavior
Date: Wed, 31 Mar 2004 01:10:46 +0200

At 09:41 +0200 2004/03/30, Laurent Deniau wrote:
>> I think your question was answered by Tim Van Holder: Due to limitation in
>> lookahead in the LALR(1) parser generator algorithm, one has to fiddle
>> around with the grammar to find a form that works.
>
>Well, I was maybe not very clear in my question but basically I was
>asking why two equivalent rules are not considered as equivalent by
>bison. In both cases, no more lookahead is needed, but it is true that
>more states are created in the second case. But from my point of view it
>is a completely different point. By the way, this happened only for one
>optional rule over more than 20 used and if I try to reproduce it for a
>smaller grammar, the problem vanishes. And looking at the verbose output
>shows clearly that the conflits are not justified, and the behavior is
>correct since the optional rules are standing alone. Otherwise I would
>have post my question to help-bison, not bug-bison.

The thing is that Bison is using LALR(1), which is a cutdown of LR(1) by
achieved compressing certain states. It may fail when LR(1) succeeds (there
are examples of this in the Bison manual, sec. 5.7, "Mysterious
reduce/reduce conflicts"). Also, when an error occurs, LALR(1) may enter a
few more actions relative LR(1) before the error is detected (but it will
not scan for more tokens). In addition (in EBNF), a* b* and empty | a+ | b+
| a+ b+ are equivalent used in grammars, but not when actions are added (as
in Bison).

If you do not get more clarifications in the Bison lists, you may try your
question in the newsgroup comp.compilers, where there are more people good
at working grammars by hand.

>>>>This is not really a Bison question, but a question of implementing a
>>>>language using the LALR(1) parsing algorithm. As your language is
>>>>well-known, it might be worthwhile asking around in the newsgroup
>>>>comp.compilers, or in some of the C newsgroups to see if somebody already
>>>>has done it. Some C compilers might already have it (GNU C?).
>>>
>>>My question has nothing to do with the C99 grammar. It was just to fix
>>>the background, nothing more.
>>
>>
>> So therefore, if the implemented language is known, it can be worth to look
>> around for tricks that may be required to implement it.
>
>I do not understand your point. What do you mean by the language is
>known? The norm of the C99 provides a "descriptive" grammar of the
>language but it is neither normative (only informative) neither it
>works. You have to modify it deeply to make it working with a LALR(1)
>parser because it has contexts.

Right. For every pair (language, parsing algorithm), one will have to
develop a special techniques. Since C99 has been out for a while, and
LALR(1) is quite common, it seems likely that somebody has made an
implementation. Check if GCC has a .y file (it may not though).

>But again, this has nothing to do with
>my question. My question is about bison.

Bison just applies LALR(1). I do not think it is likely it is buggy there.
But try another LALR(1) parser generator and see if you get the same error
messages.

So, most likely, your question is about coping with LALR(1). The simplest
way is to fiddle around with it to get a working version, and then forget
about it.

>> The strings are only exported to the parser to
>> a table used for error message printouts, which has strings in it anyway.
>
>But you still need to write the rule with LP and RP.

No, use the strings in the rule. This makes the grammar much more readable:

>direct_abstract_declarator
>       : LP abstract_declarator RP

Instead of this, write:

%token LP "("
%token RP ")"
...
%%
...
  direct_abstract_declarator:
      "(" abstract_declarator ")"

Then, except for parser error messages, "(" and ")" are only used
internally in Bison. These strings do not have to be what the lexer scans
for, as it will return RP and LP.

In fact, it would be better if one could separate the grammar strings from
the parser error message strings, but you can't do that yet.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]