bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#37659: rx additions: anychar, unmatchable, unordered-or


From: Mattias Engdegård
Subject: bug#37659: rx additions: anychar, unmatchable, unordered-or
Date: Tue, 11 Feb 2020 13:57:27 +0100

22 okt. 2019 kl. 19.33 skrev Paul Eggert <eggert@cs.ucla.edu>:

> Moreover, if greed is the longstanding tradition for regexp-opt, shouldn't 
> plain "or" be greedy, to be consistent with other operators?

Having second thoughts, I've come to believe that Paul may have been right 
after all. We might just as well let plain 'or' (alias '|') match as much as 
possible when it is able to do so. In particular, we should guarantee that this 
will happen when all arguments are strings, as used to be the case.

Initially I thought it was a bug that (or "a" "ab") was optimised into "ab?" on 
the grounds that this made the behaviour unpredictable: when matching the 
string "abc", (or "a" "ab") matched "ab", whereas (or "a" "ab" space) would 
match "a". However, the current 'fixed' code isn't necessarily more useful.

Since the change was introduced in Emacs 27 which has not yet been released, I 
suggest the attached patch for emacs-27. It reverts the use of regexp-opt with 
KEEP-ORDER = t. What do you think? It would solve the problem without 
introducing new constructs, and without running the risk of introducing subtle 
errors in existing rx expressions.

(In fact, if we do not do this in Emacs 27, we'd have to add a NEWS entry to 
warn users about the change.)

A further improvement would be to ensure that nested all-string 'or' forms 
would have the same property, and that expansion of user-defined forms would be 
transparent. In other words, that

 (rx-let ((x (or "abc" "de")))
   (rx (or "a" x (or "ab" "def"))))

would be equivalent to

 (rx "abc" "ab" "a" "def" "de")

I'll prepare a patch for this QoI improvement, but the attached patch should be 
required no matter what.

Attachment: 0001-rx-Use-longest-match-for-all-string-or-forms-bug-376.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]