[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [elpa] externals/matlab-mode e3e6952d57 1/2: Update imenu to use mat
From: |
Stefan Monnier |
Subject: |
Re: [elpa] externals/matlab-mode e3e6952d57 1/2: Update imenu to use matlab syntax font-lock |
Date: |
Thu, 29 May 2025 09:39:47 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
> The regular expression needed for imenu causes the regex engine to hang.
The regexp-engine hangs when its backtracking-based algorithm hits
a case where there are "too many ways" to fail to match. Sometimes you
can circumvent the problem by tuning the regexp such that it is more
like a DFA or by making it more obvious that some alternatives are
mutually exclusive.
> -(defvar matlab-imenu-generic-expression
> - ;; Using concat to increase indentation and improve readability
> - `(,(list nil (concat
> - "^[[:blank:]]*"
> - "function\\>"
[[:blank:]] and other char-classes are things I recommend to avoid
because they can match some multibyte chars and the regexp-engine tends
to stop trying to to optimize, in such cases. In practice this means
that it will for example fail to notice that `[[:blank:]]` and `f`
cannot match the same character. Here that should be "harmless" (it
will use more stack space but shouldn't cause a "hang").
> - ;; Optional space/tabs, "...", or "... % comment"
> continuation
> - (concat "\\(?:"
> - "[[:blank:]]*"
> - "\\(?:" matlab--ellipsis-to-eol-re "\\)?"
> - "\\)*")
This one looks nasty. I suspect this was the culprit. A string like
" " can match the above regexp in several different ways:
2 times [[:blank:]]
1 time [[:blank:]] + 1 time [[:blank:]]
[...]
0 times [[:blank:]] + 1 time [[:blank:]] + 1 time [[:blank]] + 0 times
[[:blank:]]
[...]
and Emacs's regexp matcher will have to try them all.
I'd rewrite it as
(concat "\\(?:"
"[ \t]*"
"\\(?:" matlab--ellipsis-to-eol-re "\\)*[ \t]*"
"\\)")
which makes it closer to a DFA.
> - "[\\.[:space:]\n\r]*"
That adds to the previous problem since this can also match part
of " ", so there are yet more ways to match the text.
So I'd rewrite it as
"\\(?:[\\.\n\r][\\. \t\n\r]*\\)?"
since the previous part of the regexp has already matched the preceding
[ \t]*.
BTW, why do we allow backslashes and dots here (showing my ignorance of
Matlab syntax)?
Stefan
- Re: [elpa] externals/matlab-mode e3e6952d57 1/2: Update imenu to use matlab syntax font-lock,
Stefan Monnier <=