bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53680: Endless loop in peculiar case of string-match and string-matc


From: Mattias Engdegård
Subject: bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
Date: Tue, 1 Feb 2022 13:56:10 +0100

> (string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
>  

The diagnostics by Lars and Andreas is correct. Let's look at it more closely, 
first translating the regexp to rx for ease of reasoning, and see if we can 
make it work:

(rx (* (in "\t\r "))
    "implements"
    (+ (in "\t\r "))
    (+ (group
        (* (in "\t\r "))
        (+ (in "0-9A-Za-z" "\\_"))
        (? ",")))
    (* (in "\t\r "))
    "{" eol)

The first line is meaningless since it can match the empty string, but you 
probably want to anchor the start of "implements" so that it doesn't match 
"house_implements". Let's also drop the capture group, and we get:

(rx symbol-start "implements"
    (+ (in "\t\r "))
    (+ (* (in "\t\r "))
       (+ (in "0-9A-Za-z" "\\_"))
       (? ","))
    (* (in "\t\r "))
    "{" eol)

You clearly want to match a non-empty sequence of 'words' separated with 
whitespace and/or commas, but the pattern is ambiguous --  all inter-word 
separators are optional. Let's make them mandatory:

 (rx symbol-start "implements"
     ;; mandatory whitespace
     (+ (in "\t\r "))
     ;; then a word
     (+ (in "0-9A-Za-z" "\\_"))
     ;; then maybe more words, each prefixed by spaces or comma
     (* (+ (in "\t\r ,"))   ; fast and loose
        (+ (in "0-9A-Za-z" "\\_")))
     ;; finally whitespace before the curly bracket
     (* (in "\t\r "))
     "{" eol)

which is reasonably efficient, since all ambiguity is now gone: the regexp can 
(almost) only match in one way.

Note the "fast and loose" pattern where we accept any number of spaces or 
commas. Here it depends on your grammar but if you want exactly one comma 
separating each word, that subexpression would be something like

  (* (in "\t\r "))
  ","
  (* (in "\t\r "))

instead.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]