help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interactive continuation prompting


From: Christian Schoenebeck
Subject: Re: Interactive continuation prompting
Date: Sun, 29 Jun 2014 13:13:03 +0200
User-agent: KMail

On Saturday 28 June 2014 23:59:44 Ron Burk wrote:
> IMO, if the user has to read the code generated by the parser generator,
> then it has already failed to deliver on its fundamental premise. 

Yep. But currently you have to read the code, at least if you want to use 
Bison generated parsers for purposes that go beyond the scope of initial 
prototypes.

And currently you even have to understand Bison's internals if you want to be 
able to deal with its warnings/errors correctly.

> Tying the
> lexer and parser together eliminates the possibility for one of the few
> opportunities for parallelism in parsing: running the lexer and parser in
> parallel (not that many modern generators make that easy). CPUs
> stopped getting faster some time ago, the only remaining benefit from
> Moore's law is in parallelism. I scratch my head every time I see the
> repeated meme that lexing and parsing were separated due to some
> "historical" artifact. The only way I can see this meme continuing
> to reproduce is a failure to grasp the actual motivations (modularity,
> separation of concerns, efficiency, and ability to prove correctness)
> that drove this division of labor, none of which are "historical"
> in nature.

We were absolutely aware about all the motivations you enumerated above, which 
constitute the reasons of Bison's current design and that it shall continue to 
exist in its current design. And I absolutely agree with that.

However, from the frequency of user inquiries on this and similar subjects, 
you may also notice, that the main practical requirements of users for a 
parser generator are completely different than the main motivations which lead 
to Bison's current design.

If you mention parallelism, then you have very high performance in mind. In 
practice however, even primitive, less efficient parsers are absolutely 
sufficient. Plus I doubt that parallelizing lexer and parser as two separate 
tasks gives you a huge performance boost. And that high performance feature 
and theoretical aspects of proving correctness come with a very high price:

1. Lack of being able to access high level informations, which is a very 
   frequent requirement in practice.
2. Cumbersome definition of grammar and lexer source files, which is one of 
   the main reasons why still most of the developers out there still use hand 
   written parsers.

And in many people's eyes, this price is too high.

About the "historical" notion: this argument is frequently used *not* to argue 
about modularity, but to argue the precise form of Bison's modularity, which 
simply exists in the form it was designed with early yacc and lex, which in 
turn were designed in a typical 70s design concept, which is not seen as being 
appropriate anymore.

About "separation of concerns": that's an implementation decision, not a 
positive use case feature.

> You're always free to return characters as tokens and
> "unite" lexing and parsing in most any parser generator,
> but when I look at the problems this "solves", the solutions
> seem to usually just be pushing context sensitivity from
> one place to another. Maybe you know of some cases
> where lexing with the parser is an unqualified improvement.

There is a flaw in the assumption that you can actually put a type 3 grammar 
"lexer" separately in front of the type 2 grammar "parser" in all cases. In 
practice you often want to use regular expressions which shall only match in a 
certain grammar rule context, which is not possible with Bison. And that's one 
of the situations where working with Bison gets dirty.

> Anyone contemplating eliminating the modularity of lexing
> should carefully study the experience of ANTLR.  Since lexing and
> parsing have fairly different needs, it generates a never-ending source
> of confusion for users (another way for a generator to fail on its
> fundamental premise, IMO).

Like locking a child into a room because it could hurt himself? ;-) Sure a 
tied lexer / parser design would create new problems and would require new 
error sources to be prompted to the user. But that's not an argument to 
discard it from the start.
But even though there would be new sources of problems, I am convinced that 
such a tied design would lower the entry level motivation of users to actually 
use a parser generator instead of still using hand written parsers.

> >           Being able to access
> >           those informations conveniently at runtime, is far more
> >           important
>  >         
>  >         today than saving some kB of application size.
> 
> As always, depends on the user's needs.
> Sometimes people forget that in the modern Intel architecture,
> being able to fit in (tiny) L1 cache can mean a LARGE performance
> difference. Whenever someone says a few KB difference
> doesn't matter, it always makes me wonder if they've benchmarked
> any code in a tight loop that does and doesn't fit in L1.
> Small tables ain't about the memory any more (except, there's
> always a new, smaller, more memory-starved embedded platform
> coming along!), it's about the relatively extraordinary slowness
> of cache misses.

Once again, I absolutely understand your concerns here as well and agree with 
you. But the vast majority of users do not have such a requirement for high 
performance when it comes to parsers. And those few users who do have such a 
high performance requirement, can still use Bison in its current design and 
implementation. It won't vanish.

> Not personally interested in the how-to, because I cringe at
> the thought of tying external code to the internals of a code-generating
> tool. 

You are not able to solve such issues without external code ATM. And there is 
no way to fix this in Bison's implementation without a complete redesign.

> But I would be interested in understanding what you were
> trying to accomplish and exactly what the problems were. Perhaps
> there's an opportunity to extend traditional parser generator notation
> in a helpful way. I Googled a bit based on the terms you were
> using to see if you had already written on this subject, but was
> unable to find anything detailed about the problems you were
> solving.

Not sure what I shall add here of what has not been mentioned yet. The 
requirements are always the same, most prominently i.e.:

1. Checking the state of current input at any time (complete, incomplete, 
   syntax error).
2. In case of syntax errors, identifying the precise parts which are correct 
   and the parts which are incorrect in the input.
3. At any time accessing the informations of possible continuation in the 
   current parse tree, based on the current input. Used i.e. for auto 
   completion features in code editors and shells
4. Showing verbose error messages on syntax errors. ATM Bison can only 
   generate very simple error messages, which mostly are too primitive for end 
   users to understand and to solve their input mistakes.

and when actually working on the parser input files:

5. Efficient visualization of conflicts / ambiguities in the language 
   definition. Which could be realized, if the parser generator would have a 
   comprehensive API.

> > no interest on GNU side to design a modern parser generator.
> 
> GNU bison has enormous room for improvement. OTOH,
> what I identify with the adjective "modern" in your phrase is:
> 
>    * No guarantee the generated parser accepts the specified language!
>    * Space and speed requirements unpredictable
>       (might take minutes to parse a few hundred characters!)
>    * Error messages just as bad as Bison.
>    * Grammar problem diagnoses just as primitive as Bison.

I have not even made concrete suggestions how to design and implement it, and 
you think you can already identify negative consequences? I think you are 
probably too focused at Bison's current design if you really think that your 
negative expectations above would be true.

> I hope Bison continues to improve, but I do not hope it moves
> in all the same directions that "modern" parser generators
> seem to be moving.

And that's also not something I would have suggested. Bison makes sense to 
exist in its current form and continuing dong so. But it also makes sense to 
create another, new parser generator which breaks with Bison's design 
concepts.

Best regards,
Christian Schoenebeck



reply via email to

[Prev in Thread] Current Thread [Next in Thread]