[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Questions about PEG parsing

From: Vivien Kraus
Subject: Questions about PEG parsing
Date: Sun, 24 Oct 2021 01:25:44 +0200
User-agent: Evolution 3.34.2

Dear guile users,

There are two ways to make a parser with guile: the undocumented
LALR(1) parser generator, for which I don’t know how to reuse grammar
elements, and the PEG parser.

The manual shows an example for a PEG parser. According to wikipedia,
which is cited in the manual itself, PEG can have exponential worst-
case complexity. However, I can’t figure out an example that would
produce this behavior. Could you show me one? Does Guile do anything to
help this, like using memoization as in the packrat parser?

Another thing is that the parser does not seem to accept production
rules, and instead returns a compressed tree with some nodes lumped
together or ignored depending on the grammar extension <--, <- or <. It
seems impossible to ignore part of a rule, that’s why the tutorial uses
a dedicated NL token:

        passwd <- entry* !.
        entry <-- (! NL .)* NL*
        NL < '\n'

So, I have to create a new terminal token for each delimiter I want to
match, but discard. Is it not possible to do something like this
instead, to ignore only a part of the rule?

        passwd <- entry* !.
        entry <-- (! '\n' .)* ( < ('\n'*))

Finally, the result of a match is a compressed tree, where nodes are
rule names and leaves are matched text. This is not very useful, I
would prefer to be able to specify a production rule. It seems that I
can inject a production, for instance to parse a natural number:

(use-modules (ice-9 peg) (ice-9 match))

(define-peg-string-patterns "DIGIT- <- [0-9]")
(define (DIGIT . args)
  (match (apply DIGIT- args)
    (#f #f)
    ((length value)
     `(,length ,(- (char->integer (string-ref value 0))
                   (char->integer #\0))))))

That way, I can reuse the DIGIT function in another parser:

(define-peg-string-patterns "NATURAL- <- DIGIT+")
(define (NATURAL . args)
  (match (apply NATURAL- args)
    (#f #f)
    ((length (digits ...))
     (let produce ((sum 0) (digits digits))
       (match digits
         (() `(,length ,sum))
         ((significant digits ...)
          (produce (+ (* sum 10) significant) digits)))))))

(peg:tree (match-pattern NATURAL "256"))

Are there downsides? Maybe the production should be pure, if the result
gets memoized?

Best regards,


reply via email to

[Prev in Thread] Current Thread [Next in Thread]