[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing a language with optional spaces
From: |
Christian Schoenebeck |
Subject: |
Re: Parsing a language with optional spaces |
Date: |
Tue, 14 Jul 2020 13:31:04 +0200 |
On Montag, 13. Juli 2020 07:56:57 CEST Akim Demaille wrote:
> > Le 12 juil. 2020 à 19:47, Christian Schoenebeck
<schoenebeck@crudebyte.com> a écrit :
> >> Additionally,the programmer must fully understand how
> >> and when the parser handles tokens, otherwise subtle bugsmay be
> >> introduced.
> >
> > I don't see what they exactly had in mind here for that to claim.
>
> Bison is "lazy" wrt fetching tokens, in the sense that if it can decide
> what the next action is without looking at the lookahead, it won't fetch
> one. Besides, when it can, it merges "groups of similar reductions" under
> the label of "default reduction".
>
> So on situations that are similar, the parser may have fetched the lookahead
> before the reduction(s), or after. Hence if you want tight coupling bw the
> scanner and the parser, you need to be aware of this and know when this
> will happen.
>
> That's what lr.default-reduction controls.
> (https://www.gnu.org/software/bison/manual/html_node/Default-Reductions.html
> )
Ok, that's specific to LALR.
For the 'married' parser, in the form imagined by me, there would be no
lookahead token, as a lookahead token implies a context-unaware scanner.
Instead, when the parser would get to a state where it had to decide between
reducing or shifting (depending on a next token), it would not decide and fork
the parser state instead, in order to allow context-specific tokenization.
> > Note that the latter example would of course [require to] work differently
>
> > than the common greedy and context unaware scanner:
> Yes, indeed. That's something which Joel detailed in his PhD. Note
> though that PSLR targets deterministic parsing. Bison would (first)
> target that, not GLR. But contributions are always welcome :)
The longer I think about it, the more I wonder whether it wouldn't make sense
to make a clear and radical cut with the past when developing a 'married'
parser to allow unfolding its full potential.
I mean I look at what you are working on Akim, and AFAICS the majority of time
you have to spend on maintaining (i.e. not breaking) backward compatibility
for ancient use cases. Most of Bison's conceptional design and internals are
still based on requirements and opinions from around 1980 which simply no
longer exist or proofed wrong.
Besides of what we already discussed, I think it would make sense to get rid
of a bunch of other things that Bison still retains:
1. No longer multiple parser types, only one: a somewhat GLR-alike parser like
discussed as real superset of LALR, but without lookahead token (so not
really LALR based under the hood). Theoretically there is no disadvantage
by doing so, because if a developer feeds it with a LALR(1) capable
grammer, it would still end up having the same parsing
complexity=efficiency as an old-school LALR would do, probably just with
slightly higher memory footprint.
Advantages though: much more user friendly and powerful, and a lot less
code to maintain.
2. No longer raw C tables as pregenerated parser and scanner tables, instead
of such C-tables: an intermediate, dynamic in-memory model with high-level
API allowing to access and modify grammar and patterns on-the-fly.
Advantage: it would allow self-extensible languages (e.g. an input that
adds keywords, patterns, drops or adds grammar rules while being parsed).
And implementation specific things would finally clearly be isolated by
introducing such an API -> Easier to maintain.
Disadvantages: Probably none, except of consuming slightly more memory
for
meta information.
3. No longer aggressive compression of parser states (which is somewhat
already implied by [2.]), because such compressions lead to important
information loss when it comes to grammar debugging or auto completion
features and other user relevant purposes. The motivation that lead to such
compressions (very low RAM space) no longer exist in like >99% of use
cases
today.
So these concepts are probably too radical for trying to still fit them into
Bison's current (3.x) design.
> > Akim, question regarding Bison's GLR parser ...
> >
> > https://www.gnu.org/software/bison/manual/html_node/Simple-GLR-Parsers.html
> > :
> >> In general, a GLR parser can take quadratic or cubic worst-case time, and
> >> the current Bison parser even takes exponential time and space for some
> >> grammars. In practice, this rarely happens, and for many grammars it is
> >> possible to prove that it cannot happen.
> >
> > ... why isn't it O(n^3)? Or more specifically, which common GLR fork
> > optimization(s) are yet missing?
>
> This is fuzzy in my memory :( It might refer to what follows.
>
> Parsers such as SGLR have stacks which are graphs, while Bison's is
> always a tree. All GLR parsers manage to share the past of concurrent
> stacks, they don't have a set of separate stacks. Bison's GLR stacks,
> of course, share their common past, it's stored only once. I believe
> SGLR is also able to share the common "future", so their stacks are
> no longer trees. This allows SGLR to keep decent performances when parsing
> ambiguous grammars and you want to fetch the resulting parse forest.
Ok, I read it as Bison's current GLR parser probably not having seen much
attention in the past.
BTW, from my perspective a set of parser stacks (with shared past) is IMO
still a tree, however a tree with 3 dimensions instead of 2. A 'graph' for me
is something much more generic, as a graph allows much more complex structures
(with less restrictions) like traversing to any other state, which is not
possible in a tree.
Best regards,
Christian Schoenebeck
- Re: Parsing a language with optional spaces, (continued)
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/09
- Re: Parsing a language with optional spaces, uxio prego, 2020/07/11
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/12
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/13
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/14
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/14
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/18
- Re: Parsing a language with optional spaces,
Christian Schoenebeck <=
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/18
- Re: Parsing a language with optional spaces, Christian Schoenebeck, 2020/07/20
- Re: Parsing a language with optional spaces, Akim Demaille, 2020/07/26
- Re: Parsing a language with optional spaces, John P. Hartmann, 2020/07/26
- Re: Which lexer do people use?, uxio prego, 2020/07/06
- Re: Which lexer do people use?, Akim Demaille, 2020/07/06
Re: Which lexer do people use?, Derek Clegg, 2020/07/04