emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [elpa] externals/matlab-mode e3e6952d57 1/2: Update imenu to use mat


From: Stefan Monnier
Subject: Re: [elpa] externals/matlab-mode e3e6952d57 1/2: Update imenu to use matlab syntax font-lock
Date: Thu, 29 May 2025 09:39:47 -0400
User-agent: Gnus/5.13 (Gnus v5.13)

>     The regular expression needed for imenu causes the regex engine to hang.

The regexp-engine hangs when its backtracking-based algorithm hits
a case where there are "too many ways" to fail to match.  Sometimes you
can circumvent the problem by tuning the regexp such that it is more
like a DFA or by making it more obvious that some alternatives are
mutually exclusive.

> -(defvar matlab-imenu-generic-expression
> -  ;; Using concat to increase indentation and improve readability
> -  `(,(list nil (concat
> -                "^[[:blank:]]*"
> -                "function\\>"

[[:blank:]] and other char-classes are things I recommend to avoid
because they can match some multibyte chars and the regexp-engine tends
to stop trying to to optimize, in such cases.  In practice this means
that it will for example fail to notice that `[[:blank:]]` and `f`
cannot match the same character.  Here that should be "harmless" (it
will use more stack space but shouldn't cause a "hang").

> -                ;; Optional space/tabs, "...", or "... % comment" 
> continuation
> -                (concat "\\(?:"
> -                        "[[:blank:]]*"
> -                        "\\(?:" matlab--ellipsis-to-eol-re "\\)?"
> -                        "\\)*")

This one looks nasty.  I suspect this was the culprit.  A string like
"  " can match the above regexp in several different ways:

    2 times [[:blank:]]
    1 time [[:blank:]] + 1 time [[:blank:]]
    [...]
    0 times [[:blank:]] + 1 time [[:blank:]] + 1 time [[:blank]] + 0 times 
[[:blank:]]
    [...]

and Emacs's regexp matcher will have to try them all.

I'd rewrite it as

    (concat "\\(?:"
            "[ \t]*"
            "\\(?:" matlab--ellipsis-to-eol-re "\\)*[ \t]*"
            "\\)")

which makes it closer to a DFA.

> -                "[\\.[:space:]\n\r]*"

That adds to the previous problem since this can also match part
of "  ", so there are yet more ways to match the text.
So I'd rewrite it as

    "\\(?:[\\.\n\r][\\. \t\n\r]*\\)?"

since the previous part of the regexp has already matched the preceding
[ \t]*.

BTW, why do we allow backslashes and dots here (showing my ignorance of
Matlab syntax)?


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]