guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about PEG parsing


From: Robin Templeton
Subject: Re: Questions about PEG parsing
Date: Mon, 01 Nov 2021 20:44:01 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Vivien Kraus via General Guile related discussions <guile-user@gnu.org>
writes:

> There are two ways to make a parser with guile: the undocumented
> LALR(1) parser generator, for which I don’t know how to reuse grammar
> elements, and the PEG parser.

The lalr module isn't documented in the Guile manual itself, but has
some upstream documentation and examples (linked from the info node):

https://github.com/schemeway/lalr-scm/

> The manual shows an example for a PEG parser. According to wikipedia,
> which is cited in the manual itself, PEG can have exponential worst-
> case complexity. However, I can’t figure out an example that would
> produce this behavior. Could you show me one? Does Guile do anything to
> help this, like using memoization as in the packrat parser?

A completely naïve implementation would probably have exponential time
complexity due to the possibility of unlimited backtracking. Guile's PEG
parser has a "cache" submodule, so I assume it implements some form of
memoization based on Bryan Ford's packrat parsing technique. (Though I
haven't actually read the code in question)

> Another thing is that the parser does not seem to accept production
> rules, and instead returns a compressed tree with some nodes lumped
> together or ignored depending on the grammar extension <--, <- or <. It
> seems impossible to ignore part of a rule, that’s why the tutorial uses
> a dedicated NL token:

ISTR finding an apparent bug in either Guile's PEG parser generator, or
in my code attempting to use it; perhaps this is the same or a related
problem. I'll see if I can dig up the grammar I was using.

Regards,
Robin

P.S. I wrote a Scheme packrat parser shortly after reading Ford's
packrat parsing papers, which I'd be happy to share if you'd like to try
an alternative to Guile's native support. It's not technically
linear-time because it uses alists for memoization (which had better
performance characteristics for my grammars and inputs than, e.g., hash
tables), but that could be changed easily if needed. Its grammars look
like this (the syntax is hopefully obvious, except perhaps that ':' is a
sequence with the final argument being a Scheme expression producing the
result):

--8<---------------cut here---------------start------------->8---
(define (digit->number d)
  (case d
    ((#\0) 0) ((#\1) 1) ((#\2) 2) ((#\3) 3) ((#\4) 4)
    ((#\5) 5) ((#\6) 6) ((#\7) 7) ((#\8) 8) ((#\9) 9)))

(define-grammar arith additive
  (rule additive 
    (: (bind left multiplicative)
       (bind right (? (: #\+ (bind foo additive) foo)))
       (+ left (or right 0))))
  (rule multiplicative
    (: (bind left primary)
       (bind right (? (: #\* (bind bar multiplicative) bar)))
       (* left (or right 1))))
  (rule primary
    (/ number parenthesized))
  (rule number
    (: (bind ds (+ digit))
       (let loop ((ds ds) (n 0))
         (if (null? ds)
             n
             (loop (cdr ds) (+ (* n 10) (car ds)))))))
  (rule digit
    (: (bind d (/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
       (digit->number d)))
  (rule parenthesized
    (: #\(
       (bind value additive)
       #\)
       value)))
--8<---------------cut here---------------end--------------->8---

-- 
Inteligenta persono lernas la lingvon Esperanton rapide kaj facile.
Esperanto estas moderna, kultura lingvo por la mondo. Simpla, fleksebla,
belsona, Esperanto estas la praktika solvo de la problemo de universala
interkompreno. Lernu la interlingvon Esperanton!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]