bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multistart: free choice of the start symbol


From: Akim Demaille
Subject: Re: multistart: free choice of the start symbol
Date: Tue, 29 Sep 2020 19:20:04 +0200


> Le 27 sept. 2020 à 20:46, Rici Lake <ricilake@gmail.com> a écrit :
> 
> Many parser generators do have the option to parse from various roots. One
> interesting case is ANTLR, which provides methods for parsing from *every*
> non-terminal (with names generated from the non-terminal).

Well, that's "cheating" (as you pointed out farther in your message):
ANTLR implements an recursive descent parser, i.e., its very technique
consists in emitting one parsing function per non-terminal.  So actually,
I expect that all the LL generators support the free choice of the start
symbol.

Bison generates LR parsers.  That does not apply.

> Although the
> vast majority of these interfaces will never be used, it turns out to be
> extremely convenient for debugging grammars (and for didactic purposes,
> such as drawing small parse trees). In ANTLR, these interfaces have little
> or no cost, since it fundamentally produces recursive descent parser
> anyway, but it might still be reasonable to allow "%start *" for parser
> debugging.
> 
> Of course, in a C code generator, you most certainly wouldn't want to
> generate dozens (or hundreds) of unused interfaces, so this kind of feature
> would be better implemented by a general call which took a non-terminal
> enumerator as an argument. But that would require that the returned value
> type be the same regardless of non-terminal, which effectively reduces to
> the YYSTYPE union (or whatever it happens to be).
> 
> OK, it's not necessarily a great idea to design a production interface
> around a feature only used for debugging.

Exactly :)  Reading this sentence reminds me of one of my favorite
scenes in Oceans' 1[0-9]: https://www.youtube.com/watch?v=tcRvN2gtPiw

This feature, "start *", would generate quite larger automata.

In the case of Bison's own grammar, I get 450 states (that only x3,
I was expecting more) *and* additional conflicts (because Bison is
still using LALR for its grammar, so you can still have "subautomata"
that share states).

What I did not anticipate though, is that it crashes when generating
canonical LR on that grammar.  However, I not not yet investigated
the impact of my changes in IELR and canonical LR, so that a TODO.

Using LR, "%start *" should be safe.  You do have a point here.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]