help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interactive continuation prompting


From: Ron Burk
Subject: Re: Interactive continuation prompting
Date: Sat, 28 Jun 2014 14:59:44 -0700

You make many excellent points that I want to disagree with :-)

>         1. Strong separation between "lexer" and "parser" has its historical
>           reasons, but it makes things often quite problematic. There are ways
>           to deal with that of course, but it makes the .y /.l files (and 
> C/C++
>           ontop of the parser) often hard to read.

IMO, if the user has to read the code generated by the parser generator,
then it has already failed to deliver on its fundamental premise. Tying the
lexer and parser together eliminates the possibility for one of the few
opportunities for parallelism in parsing: running the lexer and parser in
parallel (not that many modern generators make that easy). CPUs
stopped getting faster some time ago, the only remaining benefit from
Moore's law is in parallelism. I scratch my head every time I see the
repeated meme that lexing and parsing were separated due to some
"historical" artifact. The only way I can see this meme continuing
to reproduce is a failure to grasp the actual motivations (modularity,
separation of concerns, efficiency, and ability to prove correctness)
that drove this division of labor, none of which are "historical"
in nature.

You're always free to return characters as tokens and
"unite" lexing and parsing in most any parser generator,
but when I look at the problems this "solves", the solutions
seem to usually just be pushing context sensitivity from
one place to another. Maybe you know of some cases
where lexing with the parser is an unqualified improvement.

Anyone contemplating eliminating the modularity of lexing
should carefully study the experience of ANTLR.  Since lexing and
parsing have fairly different needs, it generates a never-ending source
of confusion for users (another way for a generator to fail on its
fundamental premise, IMO). The options seem to be: unified syntax/semantics,
in which case you must make either parsing or lexing specifications
more cumbersome, since different default behavior is desirable in each
situation, OR just unified syntax (ANTLR-ish) in which case the user
has to suffer with syntax whose semantics varies depending on
the context (lexer/parser). Count the number of ANTLR questions
generated by confusion over whether a rule is a "parser rule" or
a "lexer rule" and how they differ (even though they kinda
look the same). Then multiply by the hours wasted posing
and answering such repetitive confusion.
It's an impressive number.

>           Being able to access
>           those informations conveniently at runtime, is far more important
 >         today than saving some kB of application size.

As always, depends on the user's needs.
Sometimes people forget that in the modern Intel architecture,
being able to fit in (tiny) L1 cache can mean a LARGE performance
difference. Whenever someone says a few KB difference
doesn't matter, it always makes me wonder if they've benchmarked
any code in a tight loop that does and doesn't fit in L1.
Small tables ain't about the memory any more (except, there's
always a new, smaller, more memory-starved embedded platform
coming along!), it's about the relatively extraordinary slowness
of cache misses.

Obviously, speed doesn't matter for many projects. OTOH,
you can again turn to the ANTLR mailing lists and discover
the expected non-zero minority of users who discovered
to their chagrin that speed did eventually matter to them.

> If anyone is interested, I could write an article with demo source code

Not personally interested in the how-to, because I cringe at
the thought of tying external code to the internals of a code-generating
tool. But I would be interested in understanding what you were
trying to accomplish and exactly what the problems were. Perhaps
there's an opportunity to extend traditional parser generator notation
in a helpful way. I Googled a bit based on the terms you were
using to see if you had already written on this subject, but was
unable to find anything detailed about the problems you were
solving.

> no interest on GNU side to design a modern parser generator.

GNU bison has enormous room for improvement. OTOH,
what I identify with the adjective "modern" in your phrase is:

   * No guarantee the generated parser accepts the specified language!
   * Space and speed requirements unpredictable
      (might take minutes to parse a few hundred characters!)
   * Error messages just as bad as Bison.
   * Grammar problem diagnoses just as primitive as Bison.

I hope Bison continues to improve, but I do not hope it moves
in all the same directions that "modern" parser generators
seem to be moving.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]