[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Patch for lookaround assertion in regexp
From: |
Stefan Monnier |
Subject: |
Re: Patch for lookaround assertion in regexp |
Date: |
Tue, 14 Feb 2012 13:36:32 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) |
> Implementing a fully general NFA-based regular expression matching
> engine that support submatches is hard. The only two useful
> implementations of which I'm aware are RE2 and Ville Laurikari's TRE,
> both of which are two-clause BSD licensed. Laurikari wrote his thesis
> [2] on the latter. TRE is the better of the two libraries, IMHO,
> because it's single-pass and can work over arbitrary kinds of input
> stream (like characters in a gap buffer). TRE's approximate matching
> support is occasionally useful as well.
I'm familiar with the work, yes. TRE seemed like the best option last
time I looked around.
> That said, I'd actually prefer to head in the other direction and
> allow users to express arbitrarily rich grammars using an extended rx
> syntax.
I think that would be orthogonal: we want regexp support because it's
efficient (yes, our current implementation is super slow in some cases,
but it's also efficient in many important cases).
I also would like a new regexp engine to fix the "backward matching"
problem so that looking-back can work the way most people would expect,
and doesn't need a `greedy' hack. The fact that regexps are symmetric
is a very neat property (operator precedence grammars enjoy the same
property, which is one of the reasons why I chose them as the basis for
SMIE).
> The idea is basically to support parser combinator grammars,
> which can be efficiently matched using a scannerless GRL parser, which
> is O(N^3) time, or a memozing and backtracking "packrat" parser, which
> is O(N) time and O(n) space. The end result would look a bit like Perl
> 6's rules.
While these are algorithmically reasonably efficient, it can be
difficult to make them as efficient as a regexp-only matcher for many
typical regexps. Also it can be difficult to make them work backwards.
IOW I don't think that can replace regexps given the amount of regexps
out there we have to support.
Stefan