bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53680: Endless loop in peculiar case of string-match and string-matc


From: Christian Johansson
Subject: bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
Date: Tue, 1 Feb 2022 14:12:19 +0100

Alright, thanks for helping me find the correct regexp, this issue only 
appeared on some peculiar strings

> 1 feb. 2022 kl. 13:56 skrev Mattias Engdegård <mattiase@acm.org>:
> 
> 
>> 
>> (string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
>>  
> 
> The diagnostics by Lars and Andreas is correct. Let's look at it more 
> closely, first translating the regexp to rx for ease of reasoning, and see if 
> we can make it work:
> 
> (rx (* (in "\t\r "))
>    "implements"
>    (+ (in "\t\r "))
>    (+ (group
>        (* (in "\t\r "))
>        (+ (in "0-9A-Za-z" "\\_"))
>        (? ",")))
>    (* (in "\t\r "))
>    "{" eol)
> 
> The first line is meaningless since it can match the empty string, but you 
> probably want to anchor the start of "implements" so that it doesn't match 
> "house_implements". Let's also drop the capture group, and we get:
> 
> (rx symbol-start "implements"
>    (+ (in "\t\r "))
>    (+ (* (in "\t\r "))
>       (+ (in "0-9A-Za-z" "\\_"))
>       (? ","))
>    (* (in "\t\r "))
>    "{" eol)
> 
> You clearly want to match a non-empty sequence of 'words' separated with 
> whitespace and/or commas, but the pattern is ambiguous --  all inter-word 
> separators are optional. Let's make them mandatory:
> 
> (rx symbol-start "implements"
>     ;; mandatory whitespace
>     (+ (in "\t\r "))
>     ;; then a word
>     (+ (in "0-9A-Za-z" "\\_"))
>     ;; then maybe more words, each prefixed by spaces or comma
>     (* (+ (in "\t\r ,"))   ; fast and loose
>        (+ (in "0-9A-Za-z" "\\_")))
>     ;; finally whitespace before the curly bracket
>     (* (in "\t\r "))
>     "{" eol)
> 
> which is reasonably efficient, since all ambiguity is now gone: the regexp 
> can (almost) only match in one way.
> 
> Note the "fast and loose" pattern where we accept any number of spaces or 
> commas. Here it depends on your grammar but if you want exactly one comma 
> separating each word, that subexpression would be something like
> 
>  (* (in "\t\r "))
>  ","
>  (* (in "\t\r "))
> 
> instead.
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]