[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Interactive continuation prompting
From: |
Christian Schoenebeck |
Subject: |
Re: Interactive continuation prompting |
Date: |
Sun, 29 Jun 2014 13:13:03 +0200 |
User-agent: |
KMail |
On Saturday 28 June 2014 23:59:44 Ron Burk wrote:
> IMO, if the user has to read the code generated by the parser generator,
> then it has already failed to deliver on its fundamental premise.
Yep. But currently you have to read the code, at least if you want to use
Bison generated parsers for purposes that go beyond the scope of initial
prototypes.
And currently you even have to understand Bison's internals if you want to be
able to deal with its warnings/errors correctly.
> Tying the
> lexer and parser together eliminates the possibility for one of the few
> opportunities for parallelism in parsing: running the lexer and parser in
> parallel (not that many modern generators make that easy). CPUs
> stopped getting faster some time ago, the only remaining benefit from
> Moore's law is in parallelism. I scratch my head every time I see the
> repeated meme that lexing and parsing were separated due to some
> "historical" artifact. The only way I can see this meme continuing
> to reproduce is a failure to grasp the actual motivations (modularity,
> separation of concerns, efficiency, and ability to prove correctness)
> that drove this division of labor, none of which are "historical"
> in nature.
We were absolutely aware about all the motivations you enumerated above, which
constitute the reasons of Bison's current design and that it shall continue to
exist in its current design. And I absolutely agree with that.
However, from the frequency of user inquiries on this and similar subjects,
you may also notice, that the main practical requirements of users for a
parser generator are completely different than the main motivations which lead
to Bison's current design.
If you mention parallelism, then you have very high performance in mind. In
practice however, even primitive, less efficient parsers are absolutely
sufficient. Plus I doubt that parallelizing lexer and parser as two separate
tasks gives you a huge performance boost. And that high performance feature
and theoretical aspects of proving correctness come with a very high price:
1. Lack of being able to access high level informations, which is a very
frequent requirement in practice.
2. Cumbersome definition of grammar and lexer source files, which is one of
the main reasons why still most of the developers out there still use hand
written parsers.
And in many people's eyes, this price is too high.
About the "historical" notion: this argument is frequently used *not* to argue
about modularity, but to argue the precise form of Bison's modularity, which
simply exists in the form it was designed with early yacc and lex, which in
turn were designed in a typical 70s design concept, which is not seen as being
appropriate anymore.
About "separation of concerns": that's an implementation decision, not a
positive use case feature.
> You're always free to return characters as tokens and
> "unite" lexing and parsing in most any parser generator,
> but when I look at the problems this "solves", the solutions
> seem to usually just be pushing context sensitivity from
> one place to another. Maybe you know of some cases
> where lexing with the parser is an unqualified improvement.
There is a flaw in the assumption that you can actually put a type 3 grammar
"lexer" separately in front of the type 2 grammar "parser" in all cases. In
practice you often want to use regular expressions which shall only match in a
certain grammar rule context, which is not possible with Bison. And that's one
of the situations where working with Bison gets dirty.
> Anyone contemplating eliminating the modularity of lexing
> should carefully study the experience of ANTLR. Since lexing and
> parsing have fairly different needs, it generates a never-ending source
> of confusion for users (another way for a generator to fail on its
> fundamental premise, IMO).
Like locking a child into a room because it could hurt himself? ;-) Sure a
tied lexer / parser design would create new problems and would require new
error sources to be prompted to the user. But that's not an argument to
discard it from the start.
But even though there would be new sources of problems, I am convinced that
such a tied design would lower the entry level motivation of users to actually
use a parser generator instead of still using hand written parsers.
> > Being able to access
> > those informations conveniently at runtime, is far more
> > important
> >
> > today than saving some kB of application size.
>
> As always, depends on the user's needs.
> Sometimes people forget that in the modern Intel architecture,
> being able to fit in (tiny) L1 cache can mean a LARGE performance
> difference. Whenever someone says a few KB difference
> doesn't matter, it always makes me wonder if they've benchmarked
> any code in a tight loop that does and doesn't fit in L1.
> Small tables ain't about the memory any more (except, there's
> always a new, smaller, more memory-starved embedded platform
> coming along!), it's about the relatively extraordinary slowness
> of cache misses.
Once again, I absolutely understand your concerns here as well and agree with
you. But the vast majority of users do not have such a requirement for high
performance when it comes to parsers. And those few users who do have such a
high performance requirement, can still use Bison in its current design and
implementation. It won't vanish.
> Not personally interested in the how-to, because I cringe at
> the thought of tying external code to the internals of a code-generating
> tool.
You are not able to solve such issues without external code ATM. And there is
no way to fix this in Bison's implementation without a complete redesign.
> But I would be interested in understanding what you were
> trying to accomplish and exactly what the problems were. Perhaps
> there's an opportunity to extend traditional parser generator notation
> in a helpful way. I Googled a bit based on the terms you were
> using to see if you had already written on this subject, but was
> unable to find anything detailed about the problems you were
> solving.
Not sure what I shall add here of what has not been mentioned yet. The
requirements are always the same, most prominently i.e.:
1. Checking the state of current input at any time (complete, incomplete,
syntax error).
2. In case of syntax errors, identifying the precise parts which are correct
and the parts which are incorrect in the input.
3. At any time accessing the informations of possible continuation in the
current parse tree, based on the current input. Used i.e. for auto
completion features in code editors and shells
4. Showing verbose error messages on syntax errors. ATM Bison can only
generate very simple error messages, which mostly are too primitive for end
users to understand and to solve their input mistakes.
and when actually working on the parser input files:
5. Efficient visualization of conflicts / ambiguities in the language
definition. Which could be realized, if the parser generator would have a
comprehensive API.
> > no interest on GNU side to design a modern parser generator.
>
> GNU bison has enormous room for improvement. OTOH,
> what I identify with the adjective "modern" in your phrase is:
>
> * No guarantee the generated parser accepts the specified language!
> * Space and speed requirements unpredictable
> (might take minutes to parse a few hundred characters!)
> * Error messages just as bad as Bison.
> * Grammar problem diagnoses just as primitive as Bison.
I have not even made concrete suggestions how to design and implement it, and
you think you can already identify negative consequences? I think you are
probably too focused at Bison's current design if you really think that your
negative expectations above would be true.
> I hope Bison continues to improve, but I do not hope it moves
> in all the same directions that "modern" parser generators
> seem to be moving.
And that's also not something I would have suggested. Bison makes sense to
exist in its current form and continuing dong so. But it also makes sense to
create another, new parser generator which breaks with Bison's design
concepts.
Best regards,
Christian Schoenebeck