[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (error "Stack overflow in regexp matcher") and (?)wrong display of r
From: |
Mattias Engdegård |
Subject: |
Re: (error "Stack overflow in regexp matcher") and (?)wrong display of regexp in backtrace |
Date: |
Sun, 15 Mar 2020 21:06:46 +0100 |
15 mars 2020 kl. 17.57 skrev Alan Mackenzie <address@hidden>:
> I agree (having tried "\\(?:" in place of "\\("), but why? What is
> causing the recursion here? Each of the two groups need only remember
> the latest string matching it. Surely? I'd like some insight into
> this, so as to avoid it happening again.
Each time a capture group is entered, the old extent of that group is pushed
onto the stack so that it can be reloaded if a match failure occurs and the
engine needs to backtrack. This means that removing the unnecessary group is
not an asymptotic improvement in space here, but reduces the constant factor so
that you can match larger strings. It's still O(N) space, N being the string
size.
You could probably rewrite the regexp to improve the constant further by taking
advantage of the fact that the [^\\\n\r] branch matches much more often than
the other. Something like
(rx (* (seq "\\" anything))
(* (+ (not (any "\n\r\\")))
(seq "\\" anything))
(* (not (any "\n\r\\"))))
which could perhaps be trimmed a bit. In any case, removing unnecessary capture
groups everywhere is a cheap and easy improvement to start with. (Another
benefit of rx, by the way -- there tends to be a lot fewer of those.)